Package resolution is a key part of the Python user experience as the means ofextending Python’s core functionality. The experience of package resolution ismostly taken for granted until someone encounters a situation where the packageinstaller does something they don’t expect. The installer behavior withmultiple indexes has beena common source of unexpected behavior. Through its ubiquity, pip haslong defined the standard expected behavior across other tools in the ecosystem,but Python installers are diverging with respect to how they handle multipleindexes. At the core of this divergence is whether index contents are combinedbefore resolving distributions, or each index is handled individually in order.pip merges all indexes before matching distributions, while uv matchesdistributions on one index before moving on to the next. Each approach hasadvantages and disadvantages. This PEP aims to describe each of thesebehaviors, which are referred to as “version priority” and “index priority”respectively, so that community discussions and troubleshooting can share acommon vocabulary, and so that tools can implement predictable behavior based onthese descriptions.
Python package users frequently find themselves in need of specifying an indexor package source other than PyPI. There are many reasons for external indexesto exist:
In most of these cases, it is not desirable to completely forego PyPI. Instead,users generally want PyPI to still be a source of packages, but a lower prioritysource. Unfortunately,pip’s current design precludes this concept of priority.Some Python installer tools have developed alternative ways to handle multipleindexes that incorporate mechanisms to express index priority, such asuvandPDM.
The innovation and the potential for customization is exciting, but it comes atthe risk of further fragmenting the python packaging ecosystem, which is alreadyperceived as one of Python’s weak points. The motivation of this PEP is to encourageinstallers to provide more insight into how they handle multiple indexes, and toprovide a vocabulary that can be common to the broader community.
This behavior is characterized by the installer always getting the“best” version of a package, regardless of the index that it comesfrom. “Best” is defined by the installer’s algorithm for optimizingthe various traits of a package, also factoring in user input (such aspreferring only binaries, or no binaries). While installers may differin their optimization criteria and user options, the general trait thatall version priority installers share is that the indexcontents are collated prior to candidate selection.
Version priority is most useful when all configured indexes are equally trustedand well-behaved regarding the distribution interchangeability assumption.Mirrors are especially well-behaved in this regard. That interchangeabilityassumption is what makes comparing distributions of a given package meaningful.Without it, the installer is no longer comparing “apples to apples.” Inpractice, it is common for different indexes to have files that have differentcontents than other indexes, such as builds for special hardware, or differingmetadata for the same package. Version priority behavior can lead toundesirable, unexpected outcomes in these cases, and this is whereusersgenerally look for some kind of index priority. Additionally, when there is adifference in trust among indexes, version priority does not provide a way toprefer more trusted indexes over less trusted indexes. This has been exploited bydependency confusion attacks, andPEP 708 was proposed as a way ofhard-coding a notion of trusted external indexes into the index.
The “version priority” name is new, and introduction of new terms should alwaysbe minimized. This PEP looks toward the uv project, which refers toits implementation of the version prioritybehavioras “unsafe-best-match.” Naming is really hard here. On one hand, itisn’t accurate to call pip’s default behavior intrinsically “unsafe.”The addition of possibly malicious indexes is whatintroduces concern with this behavior.PEP 708 added a way to restrictinstallers from drawing packages from unexpected, potentially insecureindexes. On the other hand, the term “best-match” is technicallycorrect, but also misleading. The “best match” varies by user and byapplication. “Best” is technically correct in the sense that it is aglobal optimum according to the match criteria specified above, but thatis not necessarily what is “best” in a user’s eyes. “Version priority”is a proposed term that avoids the concerns with the uv terminology,while approximating the behavior in the most user-identifiable way thatpackages are compared.
In index priority, the resolver finds candidates for each index, one at a time.The resolver proceeds to subsequent indexes only if the current package requesthas no viable candidates. Index priority does not combine indexes into oneglobal, flat namespace. Because indexes are searched in order, the package froman earlier index will be preferred over a package from a later index,regardless of whether the later index had a better match with the installer’soptimization criteria. For a given installer, the optimization criteria andselection algorithm should be the same for both index priority and versionpriority. It is only the treatment of multiple indexes that differs: alltogether for version priority, and individually for index priority.
The order of specification of indexes determines their priority in thefinding process. As a result, the way that installers load the indexconfiguration must be predictable and reproducible. This PEP does not prescribeany particular mechanism, other than to say that installers should providea way of ordering their collection of sources. Installers should alsoideally provide optional debugging output that provides insight intowhich index is being considered.
Each package’s finder should start at the beginning of the list of indexes, so eachpackage starts over with the index list. In other words, if one package has novalid candidates on the first index, but finds a hit on the second index,subsequent packages should still start their search on the first index, rather thanstarting on the second.
One desirable behavior that the index priority strategy implies is thatthere are no “surprise” updates, where a version bump on alower-priority index wins out over a curated, approved higher-priorityindex. This is related to the security improvement ofPEP 708, wherepackages can restrict the external indexes that distributions can comefrom, but index priority is more configurable by end users. The package installs areonly expected to change when either the higher-priority index or theindex priority configuration change. This stability and predictabilitymakes it more viable to configure indexes as a more persistent property of anenvironment, rather than a one-off argument for one install command.
Because index priority is acknowledging the possibility that different indexesmay have different content for a given package, caching and lockfiles should nowinclude the index from which distributions were downloaded. Without thisaspect, it is possible that after changing the list of configured indexes, thecache or lockfile could provide a similarly-named distribution from alower-priority index. If every index follows the recommended behavior ofproviding identical files across indexes for a given filename, this is not anissue. However, that recommendation is not readily enforceable, and augmentingthe cache key with origin index would be a wise defensive change.
Treatment within a given index follows existing behavior, but stops atthe bounds of one index and moves on to the next index only after allpriority preferences within the one index are exhausted. This means thatexisting priorities among the unified collection of packages apply toeach index individually before falling through to a lower priorityindex.
There are tradeoffs to make at every level of the optimization criteria:
--prefer-binary: Should the installer use an sdist from a higherpriority index before considering wheels on a lower priority index?Installers are free to implement these priorities in different ways forthemselves, but they should document their optimization criteria and how theyhandle fall-through to lower-priority indexes. For example, an installer couldsay that--prefer-binary should not install an sdist unless it had iteratedthrough all configured indexes and found no installable binary candidates.
As described thus far, the index priority scheme breaks the use case of morethan one index url serving the same content. Such mirrors may be used with theintent of ameliorating network issues or otherwise improving reliability. Oneapproach that installers could take to preserve mirroring functionality whileadding index priority would be to add a notion of user-definable index groups,where each index in the group is assumed to be equivalent. This is related toPoetry’s notion of package sources, except that this would allowarbitrary numbers of prioritizable groups, and that this would assume members ofa group to be mirrors. Within each group, content could be combined, or eachmember could be fetched concurrently. The fastest responding index would thenrepresent the group.
This PEP does not prescribe any changes as mandatory for any installer,so it only introduces compatibility concerns if tools choose to adopt anindex behavior other than the behavior(s) they currently implement.
This PEP’s language does not quite align with existing tools, includingpip and uv. Either this PEP’s language can change during review of this PEP, or ifthis PEP’s language is preferred, other projects could conform to it.The only goal of proposing these terms is to create a central, common vocabularythat makes it easier for users to learn about other installers.
As some tools rely on one or the other behavior, there are some possibleissues that may emerge, where tailoring available resources/packages fora particular behavior may detract from the user experience for peoplewho rely on the other behavior.
Index priority creates a mechanism for users to explicitly specify a trusthierarchy among their indexes. As such, it limits the potential for dependencyconfusion attacks. Index priority was rejected byPEP 708 as a solution fordependency confusion attacks. This PEP requests that the rejection bereconsidered, with index priority serving a different purpose. This PEP isprimarily motivated by the desire to support implementation variants, which isthe subject ofanother discussion that hopefully leads to a PEP.It is not mutually exclusive withPEP 708, nor does it suggest reverting orwithdrawingPEP 708. It is an answer tohow we could allow users to choosewhich index to use at a more fine grained level than “per install”.
For a more thorough discussion of thePEP 708 rejection of indexpriority, please see thediscuss.python.org thread for this PEP.
At the outset, the goal is not to convert pip or any other tool tochange its default priority behavior. The best way to teach is perhapsto watch message boards, GitHub issue trackers and chat channels,keeping an eye out for problems that index priority could help solve.There areseverallong-standingdiscussionsthatwould be good places tostart advertising the concepts. The topics of the two officiallysupported behaviors need documentation, and we, the authors of thisPEP, would develop these as part of the review period of this PEP.These docs would likely consist of additions across severalindexes, cross-linking the concepts between installers. At aminimum, we expect to add to thePyPUG and topip’sdocumentation.
It will be important for installers to advertise the active behavior, especially inerror messaging, and that will provide ways to provide resources tousers about these behaviors.
uv users are already experiencing index priority. uvdocuments thisbehaviorwell, but it is always possible toimprove thediscoverability of thatdocumentation from the command line,where users will actuallyencounter the unexpectedbehavior.
The uv project demonstrates index priority with its default behavior. uvis implemented in Rust, though, so if a reference implementation to a Python-based toolis necessary, we, the authors of this PEP, will provide one. For pip inparticular, we see the implementation plan as something like:
--extra-index-url or--find-links,there will be no change, and no migration is necessary.pip.conf. This proposal does notrecommend any strategy as the default for any installer. It onlyrecommends documenting the strategies that a tool provides.pipfreezeThis matches the behavior of this proposal very closely, except thatthis method requires hosting some server, and may be inaccessible ornot configurable to users in some environments. It is also importantto consider that for an organization that operates its own index(for overcoming PyPI size restrictions, for example), this does notsolve the need for--extra-index-url or proxy/mirror for endusers. That is, organizations get no improvement from this approachunless they proxy/mirror PyPI as a whole, and get users to configuretheir proxy/mirror as their sole index.
Build tags and local version specifiers will take precedence overpackages without those tags and/or local version specifiers. In a poolof packages, builds that have these additions hosted on a server otherthan PyPI will take priority over packages on PyPI, which rarely usebuild tags, and forbid local version specifiers. This approach isviable when package providers want to provide their own localoverride, such asHPC maintainers who provide optimized builds fortheirusers.It is less viable in some ways, such as build tags not showing up inpipfreeze metadata, andlocal version specifiers not beingallowed onPyPI.There is also significant work entailed in building and maintainingpackage collections with local build tag variants.
https://discuss.python.org/t/dependency-notation-including-the-index-url/5659/21
PEP 708 is aimed specifically at addressing dependency confusionattacks, and doesn’t address the potential for implementation variantsamong indexes. It is a way of filtering external URLs and encoding anallow-list for external indexes in index metadata. It does not changethe lack of priority or preference among channels that currentlyexists.
Namespacing is a means of specifying a package such that the Pythonusage of the package does not change, but the package installationrestricts where the package comes from.PEP 752 recently proposed a way tomultiplex a package’s owners in a flat package namespace (e.g.PyPI) by reserving prefixes as grouping elements.NPM’s conceptof “scopes” hasbeen raised as another good example of how this might look. This PEPdiffers in that it is targeted to multiple index, not a flat packagenamespace. The net effect is roughly the same in terms of predictablychoosing a particular package source, except that the namespacingapproach relies more on naming packages with these namespace prefixes,whereas this PEP would be less granular, pulling in packages onwhatever higher-priority index the user specifies. The namespacingapproach relies on all configured indexes treating a given namespacesimilarly, which leaves the usual concern that not all configuredindexes are trusted equally. The namespace idea is not incompatiblewith this PEP, but it also does not improve expression of trust ofindexes in the way that this PEP does.
[Any points that are still being decided/discussed.]
This work was supported financially by NVIDIA through employment of the author.NVIDIA teammates dramatically improved this PEP with theirinput. Astral Software pioneered the behaviors of index priority and thus laid thefoundation of this document. The pip authors deserve great praise for theirconsistent direction and patient communication of the version priority behavior,especially in the face of contentious security concerns.
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0766.rst
Last modified:2024-11-21 20:00:24 GMT