This PEP specifies a way for organizations to reserve package name prefixesfor future uploads.
“Namespaces are one honking great idea – let’s do more ofthose!” -PEP 20
The current ecosystem lacks a way for projects with many packages to signal averified pattern of ownership. Such projects fall into two categories.
The first category is projects[1] that want complete control over theirnamespace. A few examples:
google-cloud- e.g.google-cloud-compute forusing virtual machines.opentelemetry- with child prefixes in the formopentelemetry-<component>-<name>-. The contrib packages live in acentral repository and they are the only ones with the ability to publish.apache-airflow-providers-.The second category is projects[2] that want to share their namespace suchthat some packages are officially maintained and third-party developers areencouraged to participate by publishing their own. Some examples:
jupyter-.django- ordj-.Such projects are uniquely vulnerable to name-squatting attackswhich can ultimately result independency confusion.
For example, say a new product is released for which monitoring would bevaluable. It would be reasonable to assume thatDatadog would eventually support it as anofficial integration. It takes a nontrivial amount of time to deliver such anintegration due to roadmap prioritization and the time required forimplementation. It would be impossible to reserve the name of every potentialpackage so in the interim an attacker may create a package that appearslegitimate which would execute malicious code at runtime. Not only are usersmore likely to install such packages but doing so taints the perception of theentire project.
AlthoughPEP 708 attempts to address this attack vector, it is specificallyabout the case of multiple repositories being considered during dependencyresolution and does not offer any protection to the aforementioned use cases.
Namespacing also would drastically reduce the incidence oftyposquattingbecause typos would have to be in the prefix itself which isnormalized and likely to be a short, well-known identifier likeaws-. In recent years, typosquatting has become a popular attack vector[4].
Thecurrent protection against typosquatting used by PyPI is to normalizesimilar characters but that is insufficient for these use cases.
Another problem that namespacing would solve is the issue of choosing new namesfor packages following the agreed patterns of naming. Often (this is the casefor Apache Airflow for example), there are public discussions that precedethe decision to create a new package. The decision is based on the agreedname and follow the pattern of the existing packages. If more package names areconsidered during the discussion, all the names have to be reserved via a PyPIinterface before the discussion is public, otherwise the names can be taken byother users. This has happened in the past as explainedin the associateddiscussion.
Other package ecosystems have generally solved this problem by taking one oftwo approaches: either minimizing or maximizing backwards compatibility.
@google-cloud/storage where@google-cloud/ isthe scope. Regular user accounts (non-organization) may publishunscopedpackages for public use.This approach has the lowest amount of backwards compatibility because everyinstaller and tool has to be modified to account for scopes.This PEP specifies the NuGet approach of authorized reservation across a flatnamespace. Any solution that requires new package syntax must be built atop theexisting flat namespace and therefore implicit namespaces acquired via areservation mechanism would be a prerequisite to such explicit namespaces.
Although existing packages matching a reserved namespace would be untouched,preventing future unauthorized uploads and strategically applyingPEP 541takedown requests for malicious cases would reduce risks to users to anegligible level.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to beinterpreted as described inRFC 2119.
foo-bar isfoo.foo-bar is a valid child offoo as isfoo-bar-baz.Any package repository that allows for the creation of projects (e.g.non-mirrors) MAY offer the concept of organizations[6]. Organizations areentities that own projects and have various users associated with them.
Organizations MAY reserve one or more namespaces. Such reservations neitherconfer ownership nor grant special privileges to existing projects.
A namespace MUST be avalid project name andnormalized internally e.g.foo.bar would becomefoo-bar.
A namespace grant bestows ownership over the following:
foo would match the normalized project namefoo-barbut not the project namefoobar.Package name matching acts upon thenormalized namespace.
Namespaces are per-package repository and SHALL NOT be shared betweenrepositories. For example, if PyPI has a namespacemicrosoft that is ownedby the company Microsoft, packages starting withmicrosoft- that come fromother non-PyPI mirror repositories do not confer the same level of trust.
Grants MUST NOT overlap. For example, if there is an existing grantforfoo-bar then a new grant forfoo would be forbidden. An overlap isdetermined by comparing thenormalized proposed namespace with thenormalized namespace of every existing root grant. Every comparison must appenda hyphen to the end of the proposed and existing namespace. An overlap isdetected when any existing namespace starts with the proposed namespace.
If the name of a package being uploaded matches a reserved namespace and eitherof the following criteria are true:
Then the upload MUST fail with a 403 HTTP status code.
The owner of a grant may choose to allow others the ability to release newprojects with the associated namespace. Doing so MUST allowuploads for new projects matching the namespace from any user.
It is possible for the owner of a namespace to both make it open and allowother organizations to use the grant. In this case, the authorizedorganizations have no special permissions and are equivalent to an open grantwithout ownership.
Repositories MAY create hidden grants that are not visible to the public whichprevent their namespaces from being claimed by others. Such grants MUST NOT beopen and SHOULD NOT be exposed in theAPI.
Hidden grants are useful for repositories that wish to enforce uploadrestrictions without the need to expose the namespace to the public.
TheJSON API version will be incremented from1.2 to1.3.The following API changes MUST be implemented by repositories that supportthis PEP. Repositories that do not support this PEP MUST NOT implement thesechanges so that consumers of the API are able to determine whether therepository supports this PEP.
Theproject detail response will be modified asfollows.
Thenamespace key MUST benull if the project does not match an activenamespace grant. If the project does match a namespace grant, the value MUST bea mapping with the following keys:
prefix: This is the associatednormalized namespace e.g.foo-bar. If the owner of the project owns multiple matching grants thenthis MUST be the namespace with the most number of characters. For example,if the project name matched bothfoo-bar andfoo-bar-baz then thiskey would be the latter.authorized: This is a boolean and will be true if the project owneris an organization and is one of the current owners of the grant. This isuseful for tools that wish to make a distinction between official andcommunity packages.open: This is a boolean indicating whether the namespace isopen.The format of this URL is/namespace/<namespace> where<namespace> isthenormalized namespace. For example, the URL for the namespacefoo.bar would be/namespace/foo-bar.
The response will be a mapping with the following keys:
prefix: This is thenormalized version of the namespace e.g.foo-bar.owner: This is the organization that is responsible for the namespace.open: This is a boolean indicating whether the namespace isopen.parent: This is the parent namespace if it exists. For example, if thenamespace isfoo-bar and there is an active grant forfoo, then thiswould be"foo". If there is no parent then this key will benull.children: This is an array of any child namespaces. For example, if thenamespace isfoo and there are active grants forfoo-bar andfoo-bar-baz then this would be["foo-bar","foo-bar-baz"].When a reserved namespace becomes unclaimed, repositories MUST set thenamespace key tonull in theAPI.
Namespaces that were previously claimed but are now not SHOULD be eligible forclaiming again by any organization.
Representatives from the following organizations have expressed support forthis PEP (with a link to the discussion):
There are no intrinsic concerns because there is still a flat namespace andinstallers need no modification. Additionally, many projects have alreadychosen to signal a shared purpose with a prefix liketypeshed has done.
For consumers of packages we will document how metadata is exposed in theAPI and potentially in future note tooling thatsupports utilizing namespaces to provide extra security guarantees duringinstallation.
A complete reference implementation of this PEP is available inPR #17691.
As package repositories have a flat namespace, allowing any user to reserve anamespace would be untenable not just because there would becontention for a finite resource, but also because no repository has enoughhuman operators to manage the vetting of an arbitrary number of users.
An earlier version of this PEP proposed that metadata be associated withindividual artifacts at the point of release. This was rejected because ithad the potential to cause confusion for users who would expect the namespaceauthorization guarantee to be at the project level based on current grantsrather than the time at which a given release occurred.
The primary motivation for this PEP is to reduce dependency confusion attacksand NPM-style scoping with an allowance of the legacy flat namespace wouldincrease the risk. If documentation instructed a user to installbar in thenamespacefoo then the user must be careful to install@foo/bar and notfoo-bar, or vice versa. The Python packaging ecosystem has normalizationrules for names in order to maximize the ease of communication and this wouldbe a regression.
The runtime environment of Python is also not conducive to scoping. Whereasmultiple versions of the same JavaScript package may coexist, Python onlyallows a single global namespace. Barring major changes to the language itself,this is nearly impossible to change. Additionally, users have come to expectthat the package name is usually the same as what they would import andeliminating the flat namespace would do away with that convention.
Scoping would be particularly affected by organization changes which are boundto happen over time. An organization may change their name due to internalshuffling, an acquisition, or any other reason. Whenever this happens everyproject they own would in effect be renamed which would cause unnecessaryconfusion for users, frequently.
Finally, the disruption to the community would be massive because it wouldrequire an update from every package manager, security scanner, IDE, etc. Newpackages released with the scoping would be incompatible with older tools andwould cause confusion for users along with frustration from maintainers havingto triage such complaints.
Critically, this imposes a burden on projects to maintain their own infra. Thisis an unrealistic expectation for the vast majority of companies and a completenon-starter for community projects.
This does not help in most cases because the default behavior of most packagemanagers is to use PyPI so users attempting to perform a simplepipinstallwould already be vulnerable to malicious packages.
In this theoretical future every project must document how to add theirrepository to dependency resolution, which would be different for each packagemanager. Few package managers are able to download specific dependencies fromspecific repositories and would require users to use verbose configuration inthe common case.
The ones that do not support this would instead find a given package using anordered enumeration of repositories, leading to dependency confusion.For example, say a user wants two packages from two custom repositoriesXandY. If each repository has both packages but one is malicious onXand the other is malicious onY then the user would be unable to satisfytheir requirements without encountering a malicious package.
The idea here[5] would be to design a general purpose way for clients to makeprovenance assertions to verify certain properties of dependencies, each withcustom syntax. Some examples:
pipinstall"azure-loganalyticsfrommicrosoft"pipinstall"google-cloud-computefromcloud.google.com"pipinstall"aws-cdk-libfromcontact@amazon.com"A fundamental downside is that it doesn’t play well with multiplerepositories. For example, say a user wants theazure-loganalytics packageand wants to ensure it comes from the organization namedmicrosoft. IfMicrosoft’s organization name on PyPI ismicrosoft then a package managerthat defaults to PyPI could acceptazure-loganalyticsfrommicrosoft.However, if multiple repositories are used for dependency resolution then theuser would have to specify the repository as part of the definition which isunrealistic for reasons outlined in the dedicated section onasserting package owner names.
Another general weakness with this approach is that a user attempting toperform a simplepipinstall without special syntax, which is the mostcommon scenario, would already be vulnerable to malicious packages. In order toovercome this there would have to be some default trust mechanism, which in allcases would impose certain UX or resolver logic upon every tool.
For example, package managers could be changed such that the first time apackage is installed the user would receive a confirmation prompt displayingthe provenance details. This would be very confusing and noisy, especially fornew users, and would be a breaking UX change for existing users. Many methodsof installation wouldn’t work for this scenario such as running in CI orinstalling from a requirements file where the user would potentially be gettinghundreds of prompts.
One solution to make this less disruptive for users would be to manuallymaintain a list of trustworthy details (organization/user names, domain names,email addresses, etc.). This could be discoverable by packages providingentry points which package managers could learn to detect and whichcorporate environments could install by default. This has the major downside ofnot providing automatic guarantees which would limit the usefulness for theaverage user who is more likely to be affected.
There are two ideas that could be used to provide automatic protection, whichcould be based onPEP 740 attestations or a new mechanism for utilizingthird-party APIs that host the metadata.
First, each repository could offer a service that verifies the owner of apackage using whatever criteria they deem appropriate. After verification, therepository would add the details to a dedicated package that would be installedby default.
This would require dedicated maintenance which is unrealistic for mostrepositories, even PyPI currently. It’s unclear how community projects withoutthe resources for something like a domain name would be supported. Critically,this solution would cause extra confusion for users in the case of multiplerepositories as each might have their own verification processes, attestationcriteria and default package containing the verified details. It would bechallenging to get community buy-in of every package manager to be aware ofeach repositories’ chosen verification package and install that by defaultbefore dependency resolution.
Should digital attestations become the chosen mechanism, a downside is thatimplementing this in custom package repositories would require a significantamount of work. In the case of PyPI, the prerequisite work onTrusted Publishing and then thePEP 740 implementation itself took theequivalent of a full-time engineer one year whose time was paid for by acorporate sponsor. Other organizations are unlikely to implement similar workbecause simpler mechanisms make it possible to implement reproducible builds.When everything is internally managed, attestations are also not very useful.Community projects are unlikely to undertake this effort because they wouldlikely lack the resources to maintain the necessary infrastructure themselvesand moreover there are significant downsides toencouraging dedicated package repositories.
The other idea would be to host provenance assertions externally and push morelogic client-side. A possible implementation might be to specify a provenanceAPI that could be hosted at a designated relative path like/provenance. Projects on each repository could then be configured to pointto a particular domain and this information would be passed on to clientsduring installation.
While this distributed approach does impose less of an infrastructure burden onrepositories, it has the potential to be a security risk. If an externalprovenance API is compromised, it could lead to malicious packages beinginstalled. If an external API is down, it could lead to package installationfailing or package managers might only emit warnings in which case there is nosecurity benefit.
Additionally, this disadvantages community projects that do not have theresources to maintain such an API. They could use free hosting solutions suchas what many do for documentation but they do not technically own theinfrastructure and they would be compromised should the generous offerings berestricted.
Finally, while both of these theoretical approaches are not yet prescriptive,they imply assertions at the artifact level which was already arejected idea.
This is about asserting that the package came from a specific organization oruser name. It’s quite similar to theorganization scoping idea except that a flatnamespace is the base assumption.
This would require modifications to theJSON API of each supportedrepository and could be implemented by exposing extra metadata or as properprovenance assertions.
As with the organization scoping idea, a newsyntax would be required likemicrosoft::azure-loganalytics wheremicrosoft is the organization andazure-loganalytics is the package. Although this plays well with theexisting flat namespace in comparison, it retains the critical downside ofbeing a disruption for the community with the number of changes required.
A unique downside is that names are an implementation detail of repositories.On PyPI, the names of organizations are separate from user names so there ispotential for conflicts. In the case of multiple repositories, users might runinto cases of dependency confusion similar to the one at the end of theEncourage Dedicated Package Repositoriesrejected idea.
To ameliorate this, it was suggested that the syntax be expanded to alsoinclude the expected repository URL likemicrosoft@pypi.org::azure-loganalytics. This syntax or something like itis so verbose that it could lead to user confusion, and even worse, frustrationshould it gain increased adoption among those able to maintain dedicatedinfrastructure (community projects would not benefit).
The expanded syntax is an attempt to standardize resolver behavior andconfiguration within dependency specifiers. Not only would this be mandatingthe UX of tools, it lacks precedent in package managers for language ecosystemswith or without the concept of package repositories. In such cases, theresolver configuration is separate from the dependency definition.
| Language | Tool | Resolution behavior |
|---|---|---|
| Rust | Cargo | Dependency resolution can bemodified withinCargo.toml using the the[patch] table. |
| JS | Yarn | Although they have the concept ofprotocols (which aresimilar to the URL schemes of ourdirect references),users configure theresolutions field in thepackage.json file. |
| JS | npm | Users can configure theoverrides field in thepackage.json file. |
| Ruby | Bundler | TheGemfile allows for specifying anexplicit source for a gem. |
| C# | NuGet | It’s possible tooverride package versions by configuringtheDirectory.Packages.props file. |
| PHP | Composer | Thecomposer.json file allows for specifyingrepository sources for specific packages. |
| Go | go | Thego.mod file allows for specifying areplacedirective. Note that this is used for direct dependenciesas well as transitive dependencies. |
The idea here would be to have one or more top-level fixed prefixes that areused for namespace reservations:
com-: Reserved for corporate organizations.org-: Reserved for community organizations.Organizations would then apply for a namespace prefixed by the type of theirorganization.
This would cause perpetual disruption because when projects begin it is unknownwhether a user base will be large enough to warrant a namespace reservation.Whenever that happens the project would have to be renamed which would put ahigh maintenance burden on the project maintainers and would cause confusionfor users who have to learn a new way to reference the project’s packages.The potential for this deterring projects from reserving namespaces at all ishigh.
Another issue with this approach is that projects often have branding in mind(example) and would be reluctant to change their package names.
It’s unrealistic to expect every company and project to voluntarily changetheir existing and future package names.
Theidea here is to add a newmetadata field to projects in the API calleddomain-authority. Repositorieswould support a new endpoint for verifying the domain via HTTPS. Clients wouldthen support options to allow certain domains.
This does not solve the problem for the target audience who do not check wheretheir packages are coming from and is more about checking for the integrity ofuploads which is already supported in a more secure way byPEP 740.
Most projects do not have a domain and could not benefit from this, unfairlyfavoring organizations that have the financial means to acquire one.
None at this time.
types-. Forexample, the packagerequests has a stub that users would depend oncalledtypes-requests. Unofficial stubs are not supposed to use thetypes- prefix and are expected to use a-stubs suffix instead.sphinxcontrib-,many of which are maintained within adedicated organization.apache-airflow-providers-.pytest-.mkdocs-.datadog-. There issupport for creatingthird-party integrations which customers may run.google-django- namespace was squatted, among other packages, leading toapostmortemby PyPI.cupy- namespace wassquatted by a maliciousactor thousands of times.scikit- namespace wassquatted,among other packages. Notice how packages with a known prefix are muchmore prone to successful attacks.typing- namespace wassquattedand this would be useful to prevent as ahidden grant.This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0752.rst
Last modified:2025-03-29 21:57:33 GMT