Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 752 – Implicit namespaces for package repositories

PEP 752 – Implicit namespaces for package repositories

Author:
Ofek Lev <ofekmeister at gmail.com>,Jarek Potiuk <potiuk at apache.org>
Sponsor:
Barry Warsaw <barry at python.org>
PEP-Delegate:
Dustin Ingram <di at python.org>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
13-Aug-2024
Post-History:
18-Aug-2024,07-Sep-2024

Table of Contents

Abstract

This PEP specifies a way for organizations to reserve package name prefixesfor future uploads.

“Namespaces are one honking great idea – let’s do more ofthose!” -PEP 20

Motivation

The current ecosystem lacks a way for projects with many packages to signal averified pattern of ownership. Such projects fall into two categories.

The first category is projects[1] that want complete control over theirnamespace. A few examples:

  • Major cloud providers like Amazon, Google and Microsoft have a common prefixfor each feature’s corresponding package[3]. For example, most of Google’spackages are prefixed bygoogle-cloud- e.g.google-cloud-compute forusing virtual machines.
  • OpenTelemetry is an open standard forobservability withofficial packages for the core APIs and SDK withcontrib packages to collect data from various sources. All packagesare prefixed byopentelemetry- with child prefixes in the formopentelemetry-<component>-<name>-. The contrib packages live in acentral repository and they are the only ones with the ability to publish.
  • Apache Airflow is a platform to programmaticallyauthor, schedule and monitor workflows. It has providers, where eachprovider package is prefixed byapache-airflow-providers-.

The second category is projects[2] that want to share their namespace suchthat some packages are officially maintained and third-party developers areencouraged to participate by publishing their own. Some examples:

  • Project Jupyter is devoted to the development oftooling for sharing interactive documents. They supportextensionswhich in most cases (and in all cases for officially maintainedextensions) are prefixed byjupyter-.
  • Django is one of the most widely used webframeworks in existence. They have the concept ofreusable apps, whichare commonly installed viathird-party packages that implement a subsetof functionality to extend Django-based websites. These packages are byconvention prefixed bydjango- ordj-.

Such projects are uniquely vulnerable to name-squatting attackswhich can ultimately result independency confusion.

For example, say a new product is released for which monitoring would bevaluable. It would be reasonable to assume thatDatadog would eventually support it as anofficial integration. It takes a nontrivial amount of time to deliver such anintegration due to roadmap prioritization and the time required forimplementation. It would be impossible to reserve the name of every potentialpackage so in the interim an attacker may create a package that appearslegitimate which would execute malicious code at runtime. Not only are usersmore likely to install such packages but doing so taints the perception of theentire project.

AlthoughPEP 708 attempts to address this attack vector, it is specificallyabout the case of multiple repositories being considered during dependencyresolution and does not offer any protection to the aforementioned use cases.

Namespacing also would drastically reduce the incidence oftyposquattingbecause typos would have to be in the prefix itself which isnormalized and likely to be a short, well-known identifier likeaws-. In recent years, typosquatting has become a popular attack vector[4].

Thecurrent protection against typosquatting used by PyPI is to normalizesimilar characters but that is insufficient for these use cases.

Another problem that namespacing would solve is the issue of choosing new namesfor packages following the agreed patterns of naming. Often (this is the casefor Apache Airflow for example), there are public discussions that precedethe decision to create a new package. The decision is based on the agreedname and follow the pattern of the existing packages. If more package names areconsidered during the discussion, all the names have to be reserved via a PyPIinterface before the discussion is public, otherwise the names can be taken byother users. This has happened in the past as explainedin the associateddiscussion.

Rationale

Other package ecosystems have generally solved this problem by taking one oftwo approaches: either minimizing or maximizing backwards compatibility.

  • NPM has the concept ofscoped packages which wereintroduced primarily to combat there being a dearth of available goodpackage names (whether a real or perceived phenomenon). When a user ororganization signs up they are given a scope that matches their name. Forexample, thepackage for usingGoogle Cloud Storage is@google-cloud/storage where@google-cloud/ isthe scope. Regular user accounts (non-organization) may publishunscopedpackages for public use.This approach has the lowest amount of backwards compatibility because everyinstaller and tool has to be modified to account for scopes.
  • NuGet has the concept ofpackage ID prefix reservation which wasintroduced primarily to satisfy users wishing to know where a packagecame from. A package name prefix may be reserved for use by one or moreowners. Every reserved package has a special indicationon its page tocommunicate this. After reservation, any upload with a reserved prefix willfail if the user is not an owner of the prefix. Existing packages that have aprefix that is owned may continue to release as usual. This approach has thehighest amount of backwards compatibility because only modifications toindices like PyPI are required and installers do not need to change.

This PEP specifies the NuGet approach of authorized reservation across a flatnamespace. Any solution that requires new package syntax must be built atop theexisting flat namespace and therefore implicit namespaces acquired via areservation mechanism would be a prerequisite to such explicit namespaces.

Although existing packages matching a reserved namespace would be untouched,preventing future unauthorized uploads and strategically applyingPEP 541takedown requests for malicious cases would reduce risks to users to anegligible level.

Terminology

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to beinterpreted as described inRFC 2119.

Organization
Organizations are entities that own projects and have varioususers associated with them.
Grant
A grant is a reservation of a namespace for a package repository.
Open Namespace
Anopen namespace allows for uploads from any projectowner.
Restricted Namespace
A restricted namespace only allows uploads from an owner of the namespace.
Parent Namespace
A namespace’s parent refers to the namespace without the trailinghyphenated component e.g. the parent offoo-bar isfoo.
Child Namespace
A namespace’s child refers to the namespace with additional trailinghyphenated components e.g.foo-bar is a valid child offoo as isfoo-bar-baz.

Specification

Organizations

Any package repository that allows for the creation of projects (e.g.non-mirrors) MAY offer the concept of organizations[6]. Organizations areentities that own projects and have various users associated with them.

Organizations MAY reserve one or more namespaces. Such reservations neitherconfer ownership nor grant special privileges to existing projects.

Naming

A namespace MUST be avalid project name andnormalized internally e.g.foo.bar would becomefoo-bar.

Semantics

A namespace grant bestows ownership over the following:

  1. A project matching the namespace itself such as the placeholder packagemicrosoft.
  2. Projects that start with the namespace followed by a hyphen. For example,the namespacefoo would match the normalized project namefoo-barbut not the project namefoobar.

Package name matching acts upon thenormalized namespace.

Namespaces are per-package repository and SHALL NOT be shared betweenrepositories. For example, if PyPI has a namespacemicrosoft that is ownedby the company Microsoft, packages starting withmicrosoft- that come fromother non-PyPI mirror repositories do not confer the same level of trust.

Grants MUST NOT overlap. For example, if there is an existing grantforfoo-bar then a new grant forfoo would be forbidden. An overlap isdetermined by comparing thenormalized proposed namespace with thenormalized namespace of every existing root grant. Every comparison must appenda hyphen to the end of the proposed and existing namespace. An overlap isdetected when any existing namespace starts with the proposed namespace.

Uploads

If the name of a package being uploaded matches a reserved namespace and eitherof the following criteria are true:

  • The project does not yet exist.
  • The project is not owned by an organization with an active grant for thenamespace.

Then the upload MUST fail with a 403 HTTP status code.

Open Namespaces

The owner of a grant may choose to allow others the ability to release newprojects with the associated namespace. Doing so MUST allowuploads for new projects matching the namespace from any user.

It is possible for the owner of a namespace to both make it open and allowother organizations to use the grant. In this case, the authorizedorganizations have no special permissions and are equivalent to an open grantwithout ownership.

Hidden Grants

Repositories MAY create hidden grants that are not visible to the public whichprevent their namespaces from being claimed by others. Such grants MUST NOT beopen and SHOULD NOT be exposed in theAPI.

Hidden grants are useful for repositories that wish to enforce uploadrestrictions without the need to expose the namespace to the public.

Repository Metadata

TheJSON API version will be incremented from1.2 to1.3.The following API changes MUST be implemented by repositories that supportthis PEP. Repositories that do not support this PEP MUST NOT implement thesechanges so that consumers of the API are able to determine whether therepository supports this PEP.

Project Detail

Theproject detail response will be modified asfollows.

Thenamespace key MUST benull if the project does not match an activenamespace grant. If the project does match a namespace grant, the value MUST bea mapping with the following keys:

  • prefix: This is the associatednormalized namespace e.g.foo-bar. If the owner of the project owns multiple matching grants thenthis MUST be the namespace with the most number of characters. For example,if the project name matched bothfoo-bar andfoo-bar-baz then thiskey would be the latter.
  • authorized: This is a boolean and will be true if the project owneris an organization and is one of the current owners of the grant. This isuseful for tools that wish to make a distinction between official andcommunity packages.
  • open: This is a boolean indicating whether the namespace isopen.

Namespace Detail

The format of this URL is/namespace/<namespace> where<namespace> isthenormalized namespace. For example, the URL for the namespacefoo.bar would be/namespace/foo-bar.

The response will be a mapping with the following keys:

  • prefix: This is thenormalized version of the namespace e.g.foo-bar.
  • owner: This is the organization that is responsible for the namespace.
  • open: This is a boolean indicating whether the namespace isopen.
  • parent: This is the parent namespace if it exists. For example, if thenamespace isfoo-bar and there is an active grant forfoo, then thiswould be"foo". If there is no parent then this key will benull.
  • children: This is an array of any child namespaces. For example, if thenamespace isfoo and there are active grants forfoo-bar andfoo-bar-baz then this would be["foo-bar","foo-bar-baz"].

Grant Removal

When a reserved namespace becomes unclaimed, repositories MUST set thenamespace key tonull in theAPI.

Namespaces that were previously claimed but are now not SHOULD be eligible forclaiming again by any organization.

Community Buy-in

Representatives from the following organizations have expressed support forthis PEP (with a link to the discussion):

Backwards Compatibility

There are no intrinsic concerns because there is still a flat namespace andinstallers need no modification. Additionally, many projects have alreadychosen to signal a shared purpose with a prefix liketypeshed has done.

Security Implications

  • There is an opportunity to build on top ofPEP 740 andPEP 480 so thatone could prove cryptographically that a specific release came from an ownerof the associated namespace. This PEP makes no effort to describe how thiswill happen other than that work is planned for the future.

How to Teach This

For consumers of packages we will document how metadata is exposed in theAPI and potentially in future note tooling thatsupports utilizing namespaces to provide extra security guarantees duringinstallation.

Reference Implementation

A complete reference implementation of this PEP is available inPR #17691.

Rejected Ideas

Granting Reservations to Users

As package repositories have a flat namespace, allowing any user to reserve anamespace would be untenable not just because there would becontention for a finite resource, but also because no repository has enoughhuman operators to manage the vetting of an arbitrary number of users.

Artifact-level Namespace Association

An earlier version of this PEP proposed that metadata be associated withindividual artifacts at the point of release. This was rejected because ithad the potential to cause confusion for users who would expect the namespaceauthorization guarantee to be at the project level based on current grantsrather than the time at which a given release occurred.

Organization Scoping

The primary motivation for this PEP is to reduce dependency confusion attacksand NPM-style scoping with an allowance of the legacy flat namespace wouldincrease the risk. If documentation instructed a user to installbar in thenamespacefoo then the user must be careful to install@foo/bar and notfoo-bar, or vice versa. The Python packaging ecosystem has normalizationrules for names in order to maximize the ease of communication and this wouldbe a regression.

The runtime environment of Python is also not conducive to scoping. Whereasmultiple versions of the same JavaScript package may coexist, Python onlyallows a single global namespace. Barring major changes to the language itself,this is nearly impossible to change. Additionally, users have come to expectthat the package name is usually the same as what they would import andeliminating the flat namespace would do away with that convention.

Scoping would be particularly affected by organization changes which are boundto happen over time. An organization may change their name due to internalshuffling, an acquisition, or any other reason. Whenever this happens everyproject they own would in effect be renamed which would cause unnecessaryconfusion for users, frequently.

Finally, the disruption to the community would be massive because it wouldrequire an update from every package manager, security scanner, IDE, etc. Newpackages released with the scoping would be incompatible with older tools andwould cause confusion for users along with frustration from maintainers havingto triage such complaints.

Encourage Dedicated Package Repositories

Critically, this imposes a burden on projects to maintain their own infra. Thisis an unrealistic expectation for the vast majority of companies and a completenon-starter for community projects.

This does not help in most cases because the default behavior of most packagemanagers is to use PyPI so users attempting to perform a simplepipinstallwould already be vulnerable to malicious packages.

In this theoretical future every project must document how to add theirrepository to dependency resolution, which would be different for each packagemanager. Few package managers are able to download specific dependencies fromspecific repositories and would require users to use verbose configuration inthe common case.

The ones that do not support this would instead find a given package using anordered enumeration of repositories, leading to dependency confusion.For example, say a user wants two packages from two custom repositoriesXandY. If each repository has both packages but one is malicious onXand the other is malicious onY then the user would be unable to satisfytheir requirements without encountering a malicious package.

Exclusive Reliance on Provenance Assertions

The idea here[5] would be to design a general purpose way for clients to makeprovenance assertions to verify certain properties of dependencies, each withcustom syntax. Some examples:

  • The package was uploaded by a specific organization or user name e.g.pipinstall"azure-loganalyticsfrommicrosoft"
  • The package was uploaded by an owner of a specific domain name e.g.pipinstall"google-cloud-computefromcloud.google.com"
  • The package was uploaded by a user with a specific email address e.g.pipinstall"aws-cdk-libfromcontact@amazon.com"
  • The package matching a namespace was uploaded by an authorized party (thisPEP)

A fundamental downside is that it doesn’t play well with multiplerepositories. For example, say a user wants theazure-loganalytics packageand wants to ensure it comes from the organization namedmicrosoft. IfMicrosoft’s organization name on PyPI ismicrosoft then a package managerthat defaults to PyPI could acceptazure-loganalyticsfrommicrosoft.However, if multiple repositories are used for dependency resolution then theuser would have to specify the repository as part of the definition which isunrealistic for reasons outlined in the dedicated section onasserting package owner names.

Another general weakness with this approach is that a user attempting toperform a simplepipinstall without special syntax, which is the mostcommon scenario, would already be vulnerable to malicious packages. In order toovercome this there would have to be some default trust mechanism, which in allcases would impose certain UX or resolver logic upon every tool.

For example, package managers could be changed such that the first time apackage is installed the user would receive a confirmation prompt displayingthe provenance details. This would be very confusing and noisy, especially fornew users, and would be a breaking UX change for existing users. Many methodsof installation wouldn’t work for this scenario such as running in CI orinstalling from a requirements file where the user would potentially be gettinghundreds of prompts.

One solution to make this less disruptive for users would be to manuallymaintain a list of trustworthy details (organization/user names, domain names,email addresses, etc.). This could be discoverable by packages providingentry points which package managers could learn to detect and whichcorporate environments could install by default. This has the major downside ofnot providing automatic guarantees which would limit the usefulness for theaverage user who is more likely to be affected.

There are two ideas that could be used to provide automatic protection, whichcould be based onPEP 740 attestations or a new mechanism for utilizingthird-party APIs that host the metadata.

First, each repository could offer a service that verifies the owner of apackage using whatever criteria they deem appropriate. After verification, therepository would add the details to a dedicated package that would be installedby default.

This would require dedicated maintenance which is unrealistic for mostrepositories, even PyPI currently. It’s unclear how community projects withoutthe resources for something like a domain name would be supported. Critically,this solution would cause extra confusion for users in the case of multiplerepositories as each might have their own verification processes, attestationcriteria and default package containing the verified details. It would bechallenging to get community buy-in of every package manager to be aware ofeach repositories’ chosen verification package and install that by defaultbefore dependency resolution.

Should digital attestations become the chosen mechanism, a downside is thatimplementing this in custom package repositories would require a significantamount of work. In the case of PyPI, the prerequisite work onTrusted Publishing and then thePEP 740 implementation itself took theequivalent of a full-time engineer one year whose time was paid for by acorporate sponsor. Other organizations are unlikely to implement similar workbecause simpler mechanisms make it possible to implement reproducible builds.When everything is internally managed, attestations are also not very useful.Community projects are unlikely to undertake this effort because they wouldlikely lack the resources to maintain the necessary infrastructure themselvesand moreover there are significant downsides toencouraging dedicated package repositories.

The other idea would be to host provenance assertions externally and push morelogic client-side. A possible implementation might be to specify a provenanceAPI that could be hosted at a designated relative path like/provenance. Projects on each repository could then be configured to pointto a particular domain and this information would be passed on to clientsduring installation.

While this distributed approach does impose less of an infrastructure burden onrepositories, it has the potential to be a security risk. If an externalprovenance API is compromised, it could lead to malicious packages beinginstalled. If an external API is down, it could lead to package installationfailing or package managers might only emit warnings in which case there is nosecurity benefit.

Additionally, this disadvantages community projects that do not have theresources to maintain such an API. They could use free hosting solutions suchas what many do for documentation but they do not technically own theinfrastructure and they would be compromised should the generous offerings berestricted.

Finally, while both of these theoretical approaches are not yet prescriptive,they imply assertions at the artifact level which was already arejected idea.

Asserting Package Owner Names

This is about asserting that the package came from a specific organization oruser name. It’s quite similar to theorganization scoping idea except that a flatnamespace is the base assumption.

This would require modifications to theJSON API of each supportedrepository and could be implemented by exposing extra metadata or as properprovenance assertions.

As with the organization scoping idea, a newsyntax would be required likemicrosoft::azure-loganalytics wheremicrosoft is the organization andazure-loganalytics is the package. Although this plays well with theexisting flat namespace in comparison, it retains the critical downside ofbeing a disruption for the community with the number of changes required.

A unique downside is that names are an implementation detail of repositories.On PyPI, the names of organizations are separate from user names so there ispotential for conflicts. In the case of multiple repositories, users might runinto cases of dependency confusion similar to the one at the end of theEncourage Dedicated Package Repositoriesrejected idea.

To ameliorate this, it was suggested that the syntax be expanded to alsoinclude the expected repository URL likemicrosoft@pypi.org::azure-loganalytics. This syntax or something like itis so verbose that it could lead to user confusion, and even worse, frustrationshould it gain increased adoption among those able to maintain dedicatedinfrastructure (community projects would not benefit).

The expanded syntax is an attempt to standardize resolver behavior andconfiguration within dependency specifiers. Not only would this be mandatingthe UX of tools, it lacks precedent in package managers for language ecosystemswith or without the concept of package repositories. In such cases, theresolver configuration is separate from the dependency definition.

LanguageToolResolution behavior
RustCargoDependency resolution can bemodified withinCargo.toml using the the[patch] table.
JSYarnAlthough they have the concept ofprotocols (which aresimilar to the URL schemes of ourdirect references),users configure theresolutions field in thepackage.json file.
JSnpmUsers can configure theoverrides field in thepackage.json file.
RubyBundlerTheGemfile allows for specifying anexplicit source for a gem.
C#NuGetIt’s possible tooverride package versions by configuringtheDirectory.Packages.props file.
PHPComposerThecomposer.json file allows for specifyingrepository sources for specific packages.
GogoThego.mod file allows for specifying areplacedirective. Note that this is used for direct dependenciesas well as transitive dependencies.

Use Fixed Prefixes

The idea here would be to have one or more top-level fixed prefixes that areused for namespace reservations:

  • com-: Reserved for corporate organizations.
  • org-: Reserved for community organizations.

Organizations would then apply for a namespace prefixed by the type of theirorganization.

This would cause perpetual disruption because when projects begin it is unknownwhether a user base will be large enough to warrant a namespace reservation.Whenever that happens the project would have to be renamed which would put ahigh maintenance burden on the project maintainers and would cause confusionfor users who have to learn a new way to reference the project’s packages.The potential for this deterring projects from reserving namespaces at all ishigh.

Another issue with this approach is that projects often have branding in mind(example) and would be reluctant to change their package names.

It’s unrealistic to expect every company and project to voluntarily changetheir existing and future package names.

Use DNS

Theidea here is to add a newmetadata field to projects in the API calleddomain-authority. Repositorieswould support a new endpoint for verifying the domain via HTTPS. Clients wouldthen support options to allow certain domains.

This does not solve the problem for the target audience who do not check wheretheir packages are coming from and is more about checking for the integrity ofuploads which is already supported in a more secure way byPEP 740.

Most projects do not have a domain and could not benefit from this, unfairlyfavoring organizations that have the financial means to acquire one.

Open Issues

None at this time.

Footnotes

[1]
Additional examples of projects with restricted namespaces:
  • Typeshed is a community effort tomaintain type stubs for various packages. The stub packages they maintainmirror the package name they target and are prefixed bytypes-. Forexample, the packagerequests has a stub that users would depend oncalledtypes-requests. Unofficial stubs are not supposed to use thetypes- prefix and are expected to use a-stubs suffix instead.
  • Sphinx is a documentation frameworkpopular for large technical projects such asSwift and Python itself. They havethe concept ofextensions which are prefixed bysphinxcontrib-,many of which are maintained within adedicated organization.
  • Apache Airflow is a platform toprogrammatically orchestrate tasks as directed acyclic graphs (DAGs).They have the concept ofplugins, and alsoproviders which areprefixed byapache-airflow-providers-.
[2]
Additional examples of projects with open namespaces:
  • pytest is Python’s most popular testingframework. They have the concept ofplugins which may be developed byanyone and by convention are prefixed bypytest-.
  • MkDocs is a documentation framework based onMarkdown files. They also have the concept ofplugins which may bedeveloped by anyone and are usually prefixed bymkdocs-.
  • Datadog offers observability as a service.TheDatadog Agent shipsout-of-the-box withofficial integrationsfor many products, like various databases and web servers, which aredistributed as Python packages that are prefixed bydatadog-. There issupport for creatingthird-party integrations which customers may run.
[3]
The following shows the package prefixes for the major cloud providers:
[4]
Examples of typosquatting attacks targeting Python users:
  • django- namespace was squatted, among other packages, leading toapostmortemby PyPI.
  • cupy- namespace wassquatted by a maliciousactor thousands of times.
  • scikit- namespace wassquatted,among other packages. Notice how packages with a known prefix are muchmore prone to successful attacks.
  • typing- namespace wassquattedand this would be useful to prevent as ahidden grant.
[5]
Detailed write-up of thepotential for provenance assertions.
[6]
As an example, PyPI’s concept of organizations is describedhere.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0752.rst

Last modified:2025-03-29 21:57:33 GMT


[8]ページ先頭

©2009-2026 Movatter.jp