Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 766 – Explicit Priority Choices Among Multiple Indexes

PEP 766 – Explicit Priority Choices Among Multiple Indexes

Author:
Michael Sarahan <msarahan at gmail.com>
Sponsor:
Barry Warsaw <barry at python.org>
PEP-Delegate:
Paul Moore <p.f.moore at gmail.com>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Informational
Topic:
Packaging
Created:
18-Nov-2024
Post-History:
18-Nov-2024

Table of Contents

Abstract

Package resolution is a key part of the Python user experience as the means ofextending Python’s core functionality. The experience of package resolution ismostly taken for granted until someone encounters a situation where the packageinstaller does something they don’t expect. The installer behavior withmultiple indexes has beena common source of unexpected behavior. Through its ubiquity, pip haslong defined the standard expected behavior across other tools in the ecosystem,but Python installers are diverging with respect to how they handle multipleindexes. At the core of this divergence is whether index contents are combinedbefore resolving distributions, or each index is handled individually in order.pip merges all indexes before matching distributions, while uv matchesdistributions on one index before moving on to the next. Each approach hasadvantages and disadvantages. This PEP aims to describe each of thesebehaviors, which are referred to as “version priority” and “index priority”respectively, so that community discussions and troubleshooting can share acommon vocabulary, and so that tools can implement predictable behavior based onthese descriptions.

Motivation

Python package users frequently find themselves in need of specifying an indexor package source other than PyPI. There are many reasons for external indexesto exist:

In most of these cases, it is not desirable to completely forego PyPI. Instead,users generally want PyPI to still be a source of packages, but a lower prioritysource. Unfortunately,pip’s current design precludes this concept of priority.Some Python installer tools have developed alternative ways to handle multipleindexes that incorporate mechanisms to express index priority, such asuvandPDM.

The innovation and the potential for customization is exciting, but it comes atthe risk of further fragmenting the python packaging ecosystem, which is alreadyperceived as one of Python’s weak points. The motivation of this PEP is to encourageinstallers to provide more insight into how they handle multiple indexes, and toprovide a vocabulary that can be common to the broader community.

Specification

“Version priority”

This behavior is characterized by the installer always getting the“best” version of a package, regardless of the index that it comesfrom. “Best” is defined by the installer’s algorithm for optimizingthe various traits of a package, also factoring in user input (such aspreferring only binaries, or no binaries). While installers may differin their optimization criteria and user options, the general trait thatall version priority installers share is that the indexcontents are collated prior to candidate selection.

Version priority is most useful when all configured indexes are equally trustedand well-behaved regarding the distribution interchangeability assumption.Mirrors are especially well-behaved in this regard. That interchangeabilityassumption is what makes comparing distributions of a given package meaningful.Without it, the installer is no longer comparing “apples to apples.” Inpractice, it is common for different indexes to have files that have differentcontents than other indexes, such as builds for special hardware, or differingmetadata for the same package. Version priority behavior can lead toundesirable, unexpected outcomes in these cases, and this is whereusersgenerally look for some kind of index priority. Additionally, when there is adifference in trust among indexes, version priority does not provide a way toprefer more trusted indexes over less trusted indexes. This has been exploited bydependency confusion attacks, andPEP 708 was proposed as a way ofhard-coding a notion of trusted external indexes into the index.

The “version priority” name is new, and introduction of new terms should alwaysbe minimized. This PEP looks toward the uv project, which refers toits implementation of the version prioritybehavioras “unsafe-best-match.” Naming is really hard here. On one hand, itisn’t accurate to call pip’s default behavior intrinsically “unsafe.”The addition of possibly malicious indexes is whatintroduces concern with this behavior.PEP 708 added a way to restrictinstallers from drawing packages from unexpected, potentially insecureindexes. On the other hand, the term “best-match” is technicallycorrect, but also misleading. The “best match” varies by user and byapplication. “Best” is technically correct in the sense that it is aglobal optimum according to the match criteria specified above, but thatis not necessarily what is “best” in a user’s eyes. “Version priority”is a proposed term that avoids the concerns with the uv terminology,while approximating the behavior in the most user-identifiable way thatpackages are compared.

“Index priority”

In index priority, the resolver finds candidates for each index, one at a time.The resolver proceeds to subsequent indexes only if the current package requesthas no viable candidates. Index priority does not combine indexes into oneglobal, flat namespace. Because indexes are searched in order, the package froman earlier index will be preferred over a package from a later index,regardless of whether the later index had a better match with the installer’soptimization criteria. For a given installer, the optimization criteria andselection algorithm should be the same for both index priority and versionpriority. It is only the treatment of multiple indexes that differs: alltogether for version priority, and individually for index priority.

The order of specification of indexes determines their priority in thefinding process. As a result, the way that installers load the indexconfiguration must be predictable and reproducible. This PEP does not prescribeany particular mechanism, other than to say that installers should providea way of ordering their collection of sources. Installers should alsoideally provide optional debugging output that provides insight intowhich index is being considered.

Each package’s finder should start at the beginning of the list of indexes, so eachpackage starts over with the index list. In other words, if one package has novalid candidates on the first index, but finds a hit on the second index,subsequent packages should still start their search on the first index, rather thanstarting on the second.

One desirable behavior that the index priority strategy implies is thatthere are no “surprise” updates, where a version bump on alower-priority index wins out over a curated, approved higher-priorityindex. This is related to the security improvement ofPEP 708, wherepackages can restrict the external indexes that distributions can comefrom, but index priority is more configurable by end users. The package installs areonly expected to change when either the higher-priority index or theindex priority configuration change. This stability and predictabilitymakes it more viable to configure indexes as a more persistent property of anenvironment, rather than a one-off argument for one install command.

Cache keys

Because index priority is acknowledging the possibility that different indexesmay have different content for a given package, caching and lockfiles should nowinclude the index from which distributions were downloaded. Without thisaspect, it is possible that after changing the list of configured indexes, thecache or lockfile could provide a similarly-named distribution from alower-priority index. If every index follows the recommended behavior ofproviding identical files across indexes for a given filename, this is not anissue. However, that recommendation is not readily enforceable, and augmentingthe cache key with origin index would be a wise defensive change.

Ways that a request falls through to a lower priority index

  • Package name is not present at all in higher priority index
  • All distributions from higher priority index filtered out due toversion specifier, compatible Python version, platform tag, yanking or otherwise
  • A denylist configuration for the installer specifies that a particular packagename should be ignored on a given index
  • A higher priority index is unreachable (e.g. blocked by firewallrules, temporarily unavailable due to maintenance, other miscellaneousand temporary networking issues). This is a less clear-cut detail thatshould be controllable by users. On one hand, this behavior would leadto less predictable, likely unreproducible results by unexpectedlyfalling through to lower priority indexes. On the other hand, gracefulfallback may be more valuable to some users, especially if they cansafely assume that all of their indexes are equally trusted. pip’sbehavior today is graceful fallback: you see warnings if an index ishaving connection issues, but the installation will proceed with anyother available indexes. Because index priority can convey different trustlevels between indexes, installers that implement index priority shoulddefault to raising errors and aborting on network issues. Installers maychoose to provide a flag to allow fall-through to lower-priority indexes incase of network error.

Treatment within a given index follows existing behavior, but stops atthe bounds of one index and moves on to the next index only after allpriority preferences within the one index are exhausted. This means thatexisting priorities among the unified collection of packages apply toeach index individually before falling through to a lower priorityindex.

There are tradeoffs to make at every level of the optimization criteria:

  • version: index priority will use an older version from a higher-priority indexeven if a newer version is available on another index.
  • wheel vs sdist: Should the installer use an sdist from a higher-priorityindex before trying a wheel from a lower-priority index?
  • more platform-specific wheels before less specific ones: Should theinstaller use less specific wheels from higher-priority indexesbefore using more specific wheels from lower priority indexes?
  • flags such as pip’s--prefer-binary: Should the installer use an sdist from a higherpriority index before considering wheels on a lower priority index?

Installers are free to implement these priorities in different ways forthemselves, but they should document their optimization criteria and how theyhandle fall-through to lower-priority indexes. For example, an installer couldsay that--prefer-binary should not install an sdist unless it had iteratedthrough all configured indexes and found no installable binary candidates.

Mirroring

As described thus far, the index priority scheme breaks the use case of morethan one index url serving the same content. Such mirrors may be used with theintent of ameliorating network issues or otherwise improving reliability. Oneapproach that installers could take to preserve mirroring functionality whileadding index priority would be to add a notion of user-definable index groups,where each index in the group is assumed to be equivalent. This is related toPoetry’s notion of package sources, except that this would allowarbitrary numbers of prioritizable groups, and that this would assume members ofa group to be mirrors. Within each group, content could be combined, or eachmember could be fetched concurrently. The fastest responding index would thenrepresent the group.

Backwards Compatibility

This PEP does not prescribe any changes as mandatory for any installer,so it only introduces compatibility concerns if tools choose to adopt anindex behavior other than the behavior(s) they currently implement.

This PEP’s language does not quite align with existing tools, includingpip and uv. Either this PEP’s language can change during review of this PEP, or ifthis PEP’s language is preferred, other projects could conform to it.The only goal of proposing these terms is to create a central, common vocabularythat makes it easier for users to learn about other installers.

As some tools rely on one or the other behavior, there are some possibleissues that may emerge, where tailoring available resources/packages fora particular behavior may detract from the user experience for peoplewho rely on the other behavior.

  • Different indexes may have different metadata. For example, one cannot assumethat the metadata for package “something” on index “A” has the same dependenciesas “something” on index “B”. This breaks fundamental assumptions of versionpriority, but index priority can handle this. When an installer falls through to alower-priority index in the search order, it implies refreshing the package metadatafrom the new index. This is both an improvement and a complication. It is acomplication in the sense that a cached metadata entry must be keyed by bothpackage name and index url, instead of just package name. It is a potentialimprovement in that different implementation variants of a package can differ independencies as long as their distributions are separated into different indexes.
  • Users may not get updates as they expect when using index priority, because some higher priorityindex has not updated/synchronized with PyPI to get the latestpackages. If the higher priority index has a valid candidate, newerpackages will not be found. This will need to be communicatedverbosely, because it is counter to pip’s well-established behavior.
  • By adding index priority, an installer will improve the predictability ofwhich index will be selected, and index hosts may abuse this as a way of havingsimilarly named files that have different contents. With version priority,this violates the key package interchangeability assumption, and insanity will ensue.Index priority would be more workable, but the situation stillhas great potential for confusion. It would be helpful to develop tools thatsupport installers in identifying these confusing issues. These tools couldoperate independently of the installer process, as a means of validating thesanity of a set of indexes. Depending on the time cost of these tools, theinstallers could run them as part of their process. Users could, of course,ignore the recommendations at their own risk.

Security Implications

Index priority creates a mechanism for users to explicitly specify a trusthierarchy among their indexes. As such, it limits the potential for dependencyconfusion attacks. Index priority was rejected byPEP 708 as a solution fordependency confusion attacks. This PEP requests that the rejection bereconsidered, with index priority serving a different purpose. This PEP isprimarily motivated by the desire to support implementation variants, which isthe subject ofanother discussion that hopefully leads to a PEP.It is not mutually exclusive withPEP 708, nor does it suggest reverting orwithdrawingPEP 708. It is an answer tohow we could allow users to choosewhich index to use at a more fine grained level than “per install”.

For a more thorough discussion of thePEP 708 rejection of indexpriority, please see thediscuss.python.org thread for this PEP.

How to Teach This

At the outset, the goal is not to convert pip or any other tool tochange its default priority behavior. The best way to teach is perhapsto watch message boards, GitHub issue trackers and chat channels,keeping an eye out for problems that index priority could help solve.There areseverallong-standingdiscussionsthatwould be good places tostart advertising the concepts. The topics of the two officiallysupported behaviors need documentation, and we, the authors of thisPEP, would develop these as part of the review period of this PEP.These docs would likely consist of additions across severalindexes, cross-linking the concepts between installers. At aminimum, we expect to add to thePyPUG and topip’sdocumentation.

It will be important for installers to advertise the active behavior, especially inerror messaging, and that will provide ways to provide resources tousers about these behaviors.

uv users are already experiencing index priority. uvdocuments thisbehaviorwell, but it is always possible toimprove thediscoverability of thatdocumentation from the command line,where users will actuallyencounter the unexpectedbehavior.

Reference Implementation

The uv project demonstrates index priority with its default behavior. uvis implemented in Rust, though, so if a reference implementation to a Python-based toolis necessary, we, the authors of this PEP, will provide one. For pip inparticular, we see the implementation plan as something like:

  • For users who don’t use--extra-index-url or--find-links,there will be no change, and no migration is necessary.
  • pip users would be able opt in to the index priority behavior with anew config setting in the CLI and inpip.conf. This proposal does notrecommend any strategy as the default for any installer. It onlyrecommends documenting the strategies that a tool provides.
  • Enable extra info-level output for any pip operation where more thanone index is used. In this output, state the current strategy setting,and a terse summary of implied behavior, as well as a link to docsthat describe the different options
  • Add debugging output that verbosely identifies the index being used ateach step, including where the file is in the configuration hierarchy,and where it is being included (via config file, env var, or CLIflag).
  • Plumb tracking of which index gets used for whichpackage/distribution through the entire pip install process. Storethis information so that it is available to tools likepipfreeze
  • SupplementPEP 751 (lockfiles) with capture of index where apackage/distribution came from

Rejected Ideas

  • Tell users to set up a proxy/mirror, such asdevpiorArtifactory thatserves local files if present, and forwards to another server (PyPI)if no local files match

    This matches the behavior of this proposal very closely, except thatthis method requires hosting some server, and may be inaccessible ornot configurable to users in some environments. It is also importantto consider that for an organization that operates its own index(for overcoming PyPI size restrictions, for example), this does notsolve the need for--extra-index-url or proxy/mirror for endusers. That is, organizations get no improvement from this approachunless they proxy/mirror PyPI as a whole, and get users to configuretheir proxy/mirror as their sole index.

  • Are build tags and/or local version specifiers enough?

    Build tags and local version specifiers will take precedence overpackages without those tags and/or local version specifiers. In a poolof packages, builds that have these additions hosted on a server otherthan PyPI will take priority over packages on PyPI, which rarely usebuild tags, and forbid local version specifiers. This approach isviable when package providers want to provide their own localoverride, such asHPC maintainers who provide optimized builds fortheirusers.It is less viable in some ways, such as build tags not showing up inpipfreeze metadata, andlocal version specifiers not beingallowed onPyPI.There is also significant work entailed in building and maintainingpackage collections with local build tag variants.

    https://discuss.python.org/t/dependency-notation-including-the-index-url/5659/21

  • What aboutPEP 708? Isn’t thatenough?

    PEP 708 is aimed specifically at addressing dependency confusionattacks, and doesn’t address the potential for implementation variantsamong indexes. It is a way of filtering external URLs and encoding anallow-list for external indexes in index metadata. It does not changethe lack of priority or preference among channels that currentlyexists.

  • Namespacing

    Namespacing is a means of specifying a package such that the Pythonusage of the package does not change, but the package installationrestricts where the package comes from.PEP 752 recently proposed a way tomultiplex a package’s owners in a flat package namespace (e.g.PyPI) by reserving prefixes as grouping elements.NPM’s conceptof “scopes” hasbeen raised as another good example of how this might look. This PEPdiffers in that it is targeted to multiple index, not a flat packagenamespace. The net effect is roughly the same in terms of predictablychoosing a particular package source, except that the namespacingapproach relies more on naming packages with these namespace prefixes,whereas this PEP would be less granular, pulling in packages onwhatever higher-priority index the user specifies. The namespacingapproach relies on all configured indexes treating a given namespacesimilarly, which leaves the usual concern that not all configuredindexes are trusted equally. The namespace idea is not incompatiblewith this PEP, but it also does not improve expression of trust ofindexes in the way that this PEP does.

Open Issues

[Any points that are still being decided/discussed.]

Acknowledgements

This work was supported financially by NVIDIA through employment of the author.NVIDIA teammates dramatically improved this PEP with theirinput. Astral Software pioneered the behaviors of index priority and thus laid thefoundation of this document. The pip authors deserve great praise for theirconsistent direction and patient communication of the version priority behavior,especially in the face of contentious security concerns.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0766.rst

Last modified:2024-11-21 20:00:24 GMT


[8]ページ先頭

©2009-2026 Movatter.jp