Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 420 – Implicit Namespace Packages

Author:
Eric V. Smith <eric at trueblade.com>
Status:
Final
Type:
Standards Track
Created:
19-Apr-2012
Python-Version:
3.3
Post-History:

Resolution:
Python-Dev message

Table of Contents

Abstract

Namespace packages are a mechanism for splitting a single Python packageacross multiple directories on disk. In current Python versions, an algorithmto compute the packages__path__ must be formulated. With the enhancementproposed here, the import machinery itself will construct the list ofdirectories that make up the package. This PEP builds upon previous work,documented inPEP 382 andPEP 402. Those PEPs have since been rejected infavor of this one. An implementation of this PEP is at[1].

Terminology

Within this PEP:

  • “package” refers to Python packages as defined by Python’s importstatement.
  • “distribution” refers to separately installable sets of Pythonmodules as stored in the Python package index, and installed bydistutils or setuptools.
  • “vendor package” refers to groups of files installed by anoperating system’s packaging mechanism (e.g. Debian or Redhatpackages install on Linux systems).
  • “regular package” refers to packages as they are implemented inPython 3.2 and earlier.
  • “portion” refers to a set of files in a single directory (possiblystored in a zip file) that contribute to a namespace package.
  • “legacy portion” refers to a portion that uses__path__manipulation in order to implement namespace packages.

This PEP defines a new type of package, the “namespace package”.

Namespace packages today

Python currently providespkgutil.extend_path to denote a packageas a namespace package. The recommended way of using it is to put:

frompkgutilimportextend_path__path__=extend_path(__path__,__name__)

in the package’s__init__.py. Every distribution needs to providethe same contents in its__init__.py, so thatextend_path isinvoked independent of which portion of the package gets importedfirst. As a consequence, the package’s__init__.py cannotpractically define any names as it depends on the order of the packagefragments onsys.path to determine which portion is importedfirst. As a special feature,extend_path reads files named<packagename>.pkg which allows declaration of additional portions.

setuptools provides a similar function namedpkg_resources.declare_namespace that is used in the form:

importpkg_resourcespkg_resources.declare_namespace(__name__)

In the portion’s__init__.py, no assignment to__path__ isnecessary, asdeclare_namespace modifies the package__path__throughsys.modules. As a special feature,declare_namespacealso supports zip files, and registers the package name internally sothat future additions tosys.path by setuptools can properly addadditional portions to each package.

setuptools allows declaring namespace packages in a distribution’ssetup.py, so that distribution developers don’t need to put themagic__path__ modification into__init__.py themselves.

SeePEP 402’s“The Problem”section for additional motivationsfor namespace packages. Note thatPEP 402 has been rejected, but themotivating use cases are still valid.

Rationale

The current imperative approach to namespace packages has led tomultiple slightly-incompatible mechanisms for providing namespacepackages. For example, pkgutil supports*.pkg files; setuptoolsdoesn’t. Likewise, setuptools supports inspecting zip files, andsupports adding portions to its_namespace_packages variable,whereas pkgutil doesn’t.

Namespace packages are designed to support being split across multipledirectories (and hence found via multiplesys.path entries). Inthis configuration, it doesn’t matter if multiple portions all providean__init__.py file, so long as each portion correctly initializesthe namespace package. However, Linux distribution vendors (amongstothers) prefer to combine the separate portions and install them allinto thesame file system directory. This creates a potential forconflict, as the portions are now attempting to provide thesamefile on the target system - something that is not allowed by manypackage managers. Allowing implicit namespace packages means that therequirement to provide an__init__.py file can be droppedcompletely, and affected portions can be installed into a commondirectory or split across multiple directories as distributions seefit.

A namespace package will not be constrained by a fixed__path__,computed from the parent path at namespace package creation time.Consider the standard libraryencodings package:

  1. Suppose thatencodings becomes a namespace package.
  2. It sometimes gets imported during interpreter startup toinitialize the standard io streams.
  3. An application modifiessys.path after startup and wants tocontribute additional encodings from new path entries.
  4. An attempt is made to import an encoding from anencodingsportion that is found on a path entry added in step 3.

If the import system was restricted to only finding portions along thevalue ofsys.path that existed at the time theencodingsnamespace package was created, the additional paths added in step 3would never be searched for the additional portions imported in step4. In addition, if step 2 were sometimes skipped (due to some runtimeflag or other condition), then the path items added in step 3 wouldindeed be used the first time a portion was imported. Thus this PEPrequires that the list of path entries be dynamically computed wheneach portion is loaded. It is expected that the import machinery willdo this efficiently by caching__path__ values and only refreshingthem when it detects that the parent path has changed. In the case ofa top-level package likeencodings, this parent path would besys.path.

Specification

Regular packages will continue to have an__init__.py and willreside in a single directory.

Namespace packages cannot contain an__init__.py. As aconsequence,pkgutil.extend_path andpkg_resources.declare_namespace become obsolete for purposes ofnamespace package creation. There will be no marker file or directoryfor specifying a namespace package.

During import processing, the import machinery will continue toiterate over each directory in the parent path as it does in Python3.2. While looking for a module or package named “foo”, for eachdirectory in the parent path:

  • If<directory>/foo/__init__.py is found, a regular package isimported and returned.
  • If not, but<directory>/foo.{py,pyc,so,pyd} is found, a moduleis imported and returned. The exact list of extension varies byplatform and whether the -O flag is specified. The list here isrepresentative.
  • If not, but<directory>/foo is found and is a directory, it isrecorded and the scan continues with the next directory in theparent path.
  • Otherwise the scan continues with the next directory in the parentpath.

If the scan completes without returning a module or package, and atleast one directory was recorded, then a namespace package is created.The new namespace package:

  • Has a__path__ attribute set to an iterable of the path stringsthat were found and recorded during the scan.
  • Does not have a__file__ attribute.

Note that if “import foo” is executed and “foo” is found as anamespace package (using the above rules), then “foo” is immediatelycreated as a package. The creation of the namespace package is notdeferred until a sub-level import occurs.

A namespace package is not fundamentally different from a regularpackage. It is just a different way of creating packages. Once anamespace package is created, there is no functional differencebetween it and a regular package.

Dynamic path computation

The import machinery will behave as if a namespace package’s__path__ is recomputed before each portion is loaded.

For performance reasons, it is expected that this will be achieved bydetecting that the parent path has changed. If no change has takenplace, then no__path__ recomputation is required. Theimplementation must ensure that changes to the contents of the parentpath are detected, as well as detecting the replacement of the parentpath with a new path entry list object.

Impact on import finders and loaders

PEP 302 defines “finders” that are called to search path elements.These finders’find_module methods return either a “loader” objectorNone.

For a finder to contribute to namespace packages, it must implement anewfind_loader(fullname) method.fullname has the samemeaning as forfind_module.find_loader always returns a2-tuple of(loader,<iterable-of-path-entries>).loader maybeNone, in which case<iterable-of-path-entries> (which maybe empty) is added to the list of recorded path entries and pathsearching continues. Ifloader is notNone, it is immediatelyused to load a module or regular package.

Even ifloader is returned and is notNone,<iterable-of-path-entries> must still contain the path entries forthe package. This allows code such aspkgutil.extend_path() tocompute path entries for packages that it does not load.

Note that multiple path entries per finder are allowed. This is tosupport the case where a finder discovers multiple namespace portionsfor a givenfullname. Many finders will support only a singlenamespace package portion perfind_loader call, in which case thisiterable will contain only a single string.

The import machinery will callfind_loader if it exists, else fallback tofind_module. Legacy finders which implementfind_module but notfind_loader will be unable to contributeportions to a namespace package.

The specification expandsPEP 302 loaders to include an optional method calledmodule_repr() which if present, is used to generate module object reprs.See the section below for further details.

Differences between namespace packages and regular packages

Namespace packages and regular packages are very similar. Thedifferences are:

  • Portions of namespace packages need not all come from the samedirectory structure, or even from the same loader. Regular packagesare self-contained: all parts live in the same directory hierarchy.
  • Namespace packages have no__file__ attribute.
  • Namespace packages’__path__ attribute is a read-only iterableof strings, which is automatically updated when the parent path ismodified.
  • Namespace packages have no__init__.py module.
  • Namespace packages have a different type of object for their__loader__ attribute.

Namespace packages in the standard library

It is possible, and this PEP explicitly allows, that parts of thestandard library be implemented as namespace packages. When and ifany standard library packages become namespace packages is outside thescope of this PEP.

Migrating from legacy namespace packages

As described above, prior to this PEPpkgutil.extend_path() wasused by legacy portions to create namespace packages. Because it islikely not practical for all existing portions of a namespace packageto be migrated to this PEP at once,extend_path() will be modifiedto also recognizePEP 420 namespace packages. This will allow someportions of a namespace to be legacy portions while others aremigrated toPEP 420. These hybrid namespace packages will not havethe dynamic path computation that normal namespace packages have,sinceextend_path() never provided this functionality in the past.

Packaging Implications

Multiple portions of a namespace package can be installed into thesame directory, or into separate directories. For this section,suppose there are two portions which define “foo.bar” and “foo.baz”.“foo” itself is a namespace package.

If these are installed in the same location, a single directory “foo”would be in a directory that is onsys.path. Inside “foo” wouldbe two directories, “bar” and “baz”. If “foo.bar” is removed (perhapsby an OS package manager), care must be taken not to remove the“foo/baz” or “foo” directories. Note that in this case “foo” will bea namespace package (because it lacks an__init__.py), even thoughall of its portions are in the same directory.

Note that “foo.bar” and “foo.baz” can be installed into the same “foo”directory because they will not have any files in common.

If the portions are installed in different locations, two different“foo” directories would be in directories that are onsys.path.“foo/bar” would be in one of these sys.path entries, and “foo/baz”would be in the other. Upon removal of “foo.bar”, the “foo/bar” andcorresponding “foo” directories can be completely removed. But“foo/baz” and its corresponding “foo” directory cannot be removed.

It is also possible to have the “foo.bar” portion installed in adirectory onsys.path, and have the “foo.baz” portion provided ina zip file, also onsys.path.

Examples

Nested namespace packages

This example uses the following directory structure:

Lib/test/namespace_pkgsproject1parentchildone.pyproject2parentchildtwo.py

Here, both parent and child are namespace packages: Portions of themexist in different directories, and they do not have__init__.pyfiles.

Here we add the parent directories tosys.path, and show that theportions are correctly found:

>>>importsys>>>sys.path+=['Lib/test/namespace_pkgs/project1','Lib/test/namespace_pkgs/project2']>>>importparent.child.one>>>parent.__path___NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])>>>parent.child.__path___NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])>>>importparent.child.two>>>

Dynamic path computation

This example uses a similar directory structure, but adds a thirdportion:

Lib/test/namespace_pkgsproject1parentchildone.pyproject2parentchildtwo.pyproject3parentchildthree.py

We addproject1 andproject2 tosys.path, then importparent.child.one andparent.child.two. Then we add theproject3 tosys.path and whenparent.child.three isimported,project3/parent is automatically added toparent.__path__:

# add the first two parent paths to sys.path>>>importsys>>>sys.path+=['Lib/test/namespace_pkgs/project1','Lib/test/namespace_pkgs/project2']# parent.child.one can be imported, because project1 was added to sys.path:>>>importparent.child.one>>>parent.__path___NamespacePath(['Lib/test/namespace_pkgs/project1/parent','Lib/test/namespace_pkgs/project2/parent'])# parent.child.__path__ contains project1/parent/child and project2/parent/child, but not project3/parent/child:>>>parent.child.__path___NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child','Lib/test/namespace_pkgs/project2/parent/child'])# parent.child.two can be imported, because project2 was added to sys.path:>>>importparent.child.two# we cannot import parent.child.three, because project3 is not in the path:>>>importparent.child.threeTraceback(mostrecentcalllast):File"<stdin>",line1,in<module>File"<frozen importlib._bootstrap>",line1286,in_find_and_loadFile"<frozen importlib._bootstrap>",line1250,in_find_and_load_unlockedImportError:Nomodulenamed'parent.child.three'# now add project3 to sys.path:>>>sys.path.append('Lib/test/namespace_pkgs/project3')# and now parent.child.three can be imported:>>>importparent.child.three# project3/parent has been added to parent.__path__:>>>parent.__path___NamespacePath(['Lib/test/namespace_pkgs/project1/parent','Lib/test/namespace_pkgs/project2/parent','Lib/test/namespace_pkgs/project3/parent'])# and project3/parent/child has been added to parent.child.__path__>>>parent.child.__path___NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child','Lib/test/namespace_pkgs/project2/parent/child','Lib/test/namespace_pkgs/project3/parent/child'])>>>

Discussion

At PyCon 2012, we had a discussion about namespace packages at whichPEP 382 andPEP 402 were rejected, to be replaced by this PEP[3].

There is no intention to remove support of regular packages. If adeveloper knows that her package will never be a portion of anamespace package, then there is a performance advantage to it being aregular package (with an__init__.py). Creation and loading of aregular package can take place immediately when it is located alongthe path. With namespace packages, all entries in the path must bescanned before the package is created.

Note that an ImportWarning will no longer be raised for a directorylacking an__init__.py file. Such a directory will now beimported as a namespace package, whereas in prior Python versions anImportWarning would be raised.

Alyssa (Nick) Coghlan presented a list of her objections to this proposal[4].They are:

  1. Implicit package directories go against the Zen of Python.
  2. Implicit package directories pose awkward backwards compatibilitychallenges.
  3. Implicit package directories introduce ambiguity into file systemlayouts.
  4. Implicit package directories will permanently entrench currentnewbie-hostile behavior in__main__.

Alyssa later gave a detailed response to her own objections[5], whichis summarized here:

  1. The practicality of this PEP wins over other proposals and thestatus quo.
  2. Minor backward compatibility issues are okay, as long as they areproperly documented.
  3. This will be addressed inPEP 395.
  4. This will also be addressed inPEP 395.

The inclusion of namespace packages in the standard library wasmotivated by Martin v. Löwis, who wanted theencodings package tobecome a namespace package[6]. While this PEP allows for standardlibrary packages to become namespaces, it defers a decision onencodings.

find_module versusfind_loader

An early draft of this PEP specified a change to thefind_modulemethod in order to support namespace packages. It would be modifiedto return a string in the case where a namespace package portion wasdiscovered.

However, this caused a problem with existing code outside of thestandard library which callsfind_module. Because this code wouldnot be upgraded in concert with changes required by this PEP, it wouldfail when it would receive unexpected return values fromfind_module. Because of this incompatibility, this PEP nowspecifies that finders that want to provide namespace portions mustimplement thefind_loader method, described above.

The use case for supporting multiple portions perfind_loader callis given in[7].

Dynamic path computation

Guido raised a concern that automatic dynamic path computation was anunnecessary feature[8]. Later in that thread, PJ Eby and AlyssaCoghlan presented arguments as to why dynamic computation wouldminimize surprise to Python users. The conclusion of that discussionhas been included in this PEP’s Rationale section.

An earlier version of this PEP required that dynamic path computationcould only take affect if the parent path object were modifiedin-place. That is, this would work:

sys.path.append('new-dir')

But this would not:

sys.path=sys.path+['new-dir']

In the same thread[8], it was pointed out that this restriction isnot required. If the parent path is looked up by name instead of byholding a reference to it, then there is no restriction on how theparent path is modified or replaced. For a top-level namespacepackage, the lookup would be the module named"sys" then itsattribute"path". For a namespace package nested inside a packagefoo, the lookup would be for the module named"foo" then itsattribute"__path__".

Module reprs

Previously, module reprs were hard coded based on assumptions about a module’s__file__ attribute. If this attribute existed and was a string, it wasassumed to be a file system path, and the module object’s repr would includethis in its value. The only exception was thatPEP 302 reserved missing__file__ attributes to built-in modules, and in CPython, this assumptionwas baked into the module object’s implementation. Because of thisrestriction, some modules contained contrived__file__ values that did notreflect file system paths, and which could cause unexpected problems later(e.g.os.path.join() on a non-path__file__ would return gibberish).

This PEP relaxes this constraint, and leaves the setting of__file__ tothe purview of the loader producing the module. Loaders may opt to leave__file__ unset if no file system path is appropriate. Loaders may alsoset additional reserved attributes on the module if useful. This means thatthe definitive way to determine the origin of a module is to check its__loader__ attribute.

For example, namespace packages as described in this PEP will have no__file__ attribute because no corresponding file exists. In order toprovide flexibility and descriptiveness in the reprs of such modules, a newoptional protocol is added toPEP 302 loaders. Loaders can implement amodule_repr() method which takes a single argument, the module object.This method should return the string to be used verbatim as the repr of themodule. The rules for producing a module repr are now standardized as:

  • If the module has an__loader__ and that loader has amodule_repr()method, call it with a single argument, which is the module object. Thevalue returned is used as the module’s repr.
  • If an exception occurs inmodule_repr(), the exception iscaught and discarded, and the calculation of the module’s reprcontinues as ifmodule_repr() did not exist.
  • If the module has an__file__ attribute, this is used as part of themodule’s repr.
  • If the module has no__file__ but does have an__loader__, then theloader’s repr is used as part of the module’s repr.
  • Otherwise, just use the module’s__name__ in the repr.

Here is a snippet showing how namespace module reprs are calculatedfrom its loader:

classNamespaceLoader:@classmethoddefmodule_repr(cls,module):return"<module '{}' (namespace)>".format(module.__name__)

Built-in module reprs would no longer need to be hard-coded, butinstead would come from their loader as well:

classBuiltinImporter:@classmethoddefmodule_repr(cls,module):return"<module '{}' (built-in)>".format(module.__name__)

Here are some example reprs of different types of modules withdifferent sets of the related attributes:

>>>importemail>>>email<module 'email' from '/home/barry/projects/python/pep-420/Lib/email/__init__.py'>>>>m=type(email)('foo')>>>m<module 'foo'>>>>m.__file__='zippy:/de/do/dah'>>>m<module 'foo' from 'zippy:/de/do/dah'>>>>classLoader:pass...>>>m.__loader__=Loader>>>delm.__file__>>>m<module 'foo' (<class '__main__.Loader'>)>>>>classNewLoader:...@classmethod...defmodule_repr(cls,module):...return'<mystery module!>'...>>>m.__loader__=NewLoader>>>m<mystery module!>>>>

References

[1]
PEP 420 branch (http://hg.python.org/features/pep-420)
[3]
PyCon 2012 Namespace Package discussion outcome(https://mail.python.org/pipermail/import-sig/2012-March/000421.html)
[4]
Alyssa Coghlan’s objection to the lack of marker files or directories(https://mail.python.org/pipermail/import-sig/2012-March/000423.html)
[5]
Alyssa Coghlan’s response to her initial objections(https://mail.python.org/pipermail/import-sig/2012-April/000464.html)
[6]
Martin v. Löwis’s suggestion to makeencodings a namespacepackage(https://mail.python.org/pipermail/import-sig/2012-May/000540.html)
[7]
Use case for multiple portions perfind_loader call(https://mail.python.org/pipermail/import-sig/2012-May/000585.html)
[8] (1,2)
Discussion about dynamic path computation(https://mail.python.org/pipermail/python-dev/2012-May/119560.html)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0420.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp