Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 402 – Simplified Package Layout and Partitioning

Author:
Phillip J. Eby
Status:
Rejected
Type:
Standards Track
Topic:
Packaging
Created:
12-Jul-2011
Python-Version:
3.3
Post-History:
20-Jul-2011
Replaces:
382

Table of Contents

Rejection Notice

On the first day of sprints at US PyCon 2012 we had a long andfruitful discussion aboutPEP 382 andPEP 402. We ended up rejectingboth but a new PEP will be written to carry on in the spirit of PEP402. Martin von Löwis wrote up a summary:[3].

Abstract

This PEP proposes an enhancement to Python’s package importingto:

  • Surprise users of other languages less,
  • Make it easier to convert a module into a package, and
  • Support dividing packages into separately installed components(ala “namespace packages”, as described inPEP 382)

The proposed enhancements do not change the semantics of anycurrently-importable directory layouts, but make it possible forpackages to use a simplified directory layout (that is not importablecurrently).

However, the proposed changes do NOT add any performance overhead tothe importing of existing modules or packages, and performance for thenew directory layout should be about the same as that of previous“namespace package” solutions (such aspkgutil.extend_path()).

The Problem

“Most packages are like modules. Their contents are highlyinterdependent and can’t be pulled apart. [However,] somepackages exist to provide a separate namespace. … It shouldbe possible to distribute sub-packages or submodules of these[namespace packages] independently.”

—Jim Fulton, shortly before the release of Python 2.3[1]

When new users come to Python from other languages, they are oftenconfused by Python’s package import semantics. At Google, for example,Guido received complaints from “a large crowd with pitchforks”[2]that the requirement for packages to contain an__init__ modulewas a “misfeature”, and should be dropped.

In addition, users coming from languages like Java or Perl aresometimes confused by a difference in Python’s import path searching.

In most other languages that have a similar path mechanism to Python’ssys.path, a package is merely a namespace that contains modulesor classes, and can thus be spread across multiple directories inthe language’s path. In Perl, for instance, aFoo::Bar modulewill be searched for inFoo/ subdirectories all along the moduleinclude path, not just in the first such subdirectory found.

Worse, this is not just a problem for new users: it preventsanyonefrom easily splitting a package into separately-installablecomponents. In Perl terms, it would be as if every possibleNet::module on CPAN had to be bundled up and shipped in a single tarball!

For that reason, various workarounds for this latter limitation exist,circulated under the term “namespace packages”. The Python standardlibrary has provided one such workaround since Python 2.3 (via thepkgutil.extend_path() function), and the “setuptools” packageprovides another (viapkg_resources.declare_namespace()).

The workarounds themselves, however, fall prey to athird issue withPython’s way of laying out packages in the filesystem.

Because a packagemust contain an__init__ module, any attemptto distribute modules for that package must necessarily include that__init__ module, if those modules are to be importable.

However, the very fact that each distribution of modules for a packagemust contain this (duplicated)__init__ module, means that OSvendors who package up these module distributions must somehow handlethe conflict caused by several module distributions installing that__init__ module to the same location in the filesystem.

This led to the proposing ofPEP 382 (“Namespace Packages”) - a wayto signal to Python’s import machinery that a directory wasimportable, using unique filenames per module distribution.

However, there was more than one downside to this approach.Performance for all import operations would be affected, and theprocess of designating a package became even more complex. Newterminology had to be invented to explain the solution, and so on.

As terminology discussions continued on the Import-SIG, it soon becameapparent that the main reason it was so difficult to explain theconcepts related to “namespace packages” was because Python’scurrent way of handling packages is somewhat underpowered, whencompared to other languages.

That is, in other popular languages with package systems, no specialterm is needed to describe “namespace packages”, becauseallpackages generally behave in the desired fashion.

Rather than being an isolated single directory with a special markermodule (as in Python), packages in other languages are typically justtheunion of appropriately-named directories across theentireimport or inclusion path.

In Perl, for example, the moduleFoo is always found in aFoo.pm file, and a moduleFoo::Bar is always found in aFoo/Bar.pm file. (In other words, there is One Obvious Way tofind the location of a particular module.)

This is because Perl considers a module to bedifferent from apackage: the package is purely anamespace in which other modulesmay reside, and is onlycoincidentally the name of a module as well.

In current versions of Python, however, the module and the package aremore tightly bound together.Foo is always a module – whether itis found inFoo.py orFoo/__init__.py – and it is tightlylinked to its submodules (if any), whichmust reside in the exactsame directory where the__init__.py was found.

On the positive side, this design choice means that a package is quiteself-contained, and can be installed, copied, etc. as a unit just byperforming an operation on the package’s root directory.

On the negative side, however, it is non-intuitive for beginners, andrequires a more complex step to turn a module into a package. IfFoo begins its life asFoo.py, then it must be moved andrenamed toFoo/__init__.py.

Conversely, if you intend to create aFoo.Bar module from thestart, but have no particular module contents to put inFooitself, then you have to create an empty and seemingly-irrelevantFoo/__init__.py file, just so thatFoo.Bar can be imported.

(And these issues don’t just confuse newcomers to the language,either: they annoy many experienced developers as well.)

So, after some discussion on the Import-SIG, this PEP was createdas an alternative to PEP 382, in an attempt to solveall of theabove problems, not just the “namespace package” use cases.

And, as a delightful side effect, the solution proposed in this PEPdoes not affect the import performance of ordinary modules orself-contained (i.e.__init__-based) packages.

The Solution

In the past, various proposals have been made to allow more intuitiveapproaches to package directory layout. However, most of them failedbecause of an apparent backward-compatibility problem.

That is, if the requirement for an__init__ module were simplydropped, it would open up the possibility for a directory named, say,string onsys.path, to block importing of the standard librarystring module.

Paradoxically, however, the failure of this approach doesnot arisefrom the elimination of the__init__ requirement!

Rather, the failure arises because the underlying approach takes forgranted that a package is just ONE thing, instead of two.

In truth, a package comprises two separate, but related entities: amodule (with its own, optional contents), and anamespace whereother modules or packages can be found.

In current versions of Python, however, the module part (found in__init__) and the namespace for submodule imports (representedby the__path__ attribute) are both initialized at the same time,when the package is first imported.

And, if you assume this is theonly way to initialize these twothings, then there is no way to drop the need for an__init__module, while still being backwards-compatible with existing directorylayouts.

After all, as soon as you encounter a directory onsys.pathmatching the desired name, that means you’ve “found” the package, andmust stop searching, right?

Well, not quite.

A Thought Experiment

Let’s hop into the time machine for a moment, and pretend we’re backin the early 1990s, shortly before Python packages and__init__.pyhave been invented. But, imagine that weare familiar withPerl-like package imports, and we want to implement a similar systemin Python.

We’d still have Python’smodule imports to build on, so we couldcertainly conceive of havingFoo.py as a parentFoo modulefor aFoo package. But how would we implement submodule andsubpackage imports?

Well, if we didn’t have the idea of__path__ attributes yet,we’d probably just searchsys.path looking forFoo/Bar.py.

But we’donly do it when someone actually tried toimportFoo.Bar.

NOT when they importedFoo.

Andthat lets us get rid of the backwards-compatibility problemof dropping the__init__ requirement, back here in 2011.

How?

Well, when weimportFoo, we’re not evenlooking forFoo/directories onsys.path, because we don’tcare yet. The onlypoint at which we care, is the point when somebody tries to actuallyimport a submodule or subpackage ofFoo.

That means that ifFoo is a standard library module (for example),and I happen to have aFoo directory onsys.path (withoutan__init__.py, of course), thennothing breaks. TheFoomodule is still just a module, and it’s still imported normally.

Self-Contained vs. “Virtual” Packages

Of course, in today’s Python, trying toimportFoo.Bar willfail ifFoo is just aFoo.py module (and thus lacks a__path__ attribute).

So, this PEP proposes todynamically create a__path__, in thecase where one is missing.

That is, if I try toimportFoo.Bar the proposed change to theimport machinery will notice that theFoo module lacks a__path__, and will therefore try tobuild one before proceeding.

And it will do this by making a list of all the existingFoo/subdirectories of the directories listed insys.path.

If the list is empty, the import will fail withImportError, justlike today. But if the list isnot empty, then it is saved ina newFoo.__path__ attribute, making the module a “virtualpackage”.

That is, because it now has a valid__path__, we can proceedto import submodules or subpackages in the normal way.

Now, notice that this change does not affect “classic”, self-containedpackages that have an__init__ module in them. Such packagesalreadyhave a__path__ attribute (initialized at import time)so the import machinery won’t try to create another one later.

This means that (for example) the standard libraryemail packagewill not be affected in any way by you having a bunch of unrelateddirectories namedemail onsys.path. (Even if they contain*.py files.)

But itdoes mean that if you want to turn yourFoo module intoaFoo package, all you have to do is add aFoo/ directorysomewhere onsys.path, and start adding modules to it.

But what if you only want a “namespace package”? That is, a packagethat isonly a namespace for various separately-distributedsubmodules and subpackages?

For example, if you’re Zope Corporation, distributing dozens ofseparate tools likezc.buildout, each in packages under thezcnamespace, you don’t want to have to make and include an emptyzc.py in every tool you ship. (And, if you’re a Linux or otherOS vendor, you don’t want to deal with the package installationconflicts created by trying to install ten copies ofzc.py to thesame location!)

No problem. All we have to do is make one more minor tweak to theimport process: if the “classic” import process fails to find aself-contained module or package (e.g., ifimportzc fails to findazc.py orzc/__init__.py), then we once more try to build a__path__ by searching for all thezc/ directories onsys.path, and putting them in a list.

If this list is empty, we raiseImportError. But if it’snon-empty, we create an emptyzc module, and put the list inzc.__path__. Congratulations:zc is now a namespace-only,“pure virtual” package! It has no module contents, but you can stillimport submodules and subpackages from it, regardless of where they’relocated onsys.path.

(By the way, both of these additions to the import protocol (i.e. thedynamically-added__path__, and dynamically-created modules)apply recursively to child packages, using the parent package’s__path__ in place ofsys.path as a basis for generating achild__path__. This means that self-contained and virtualpackages can contain each other without limitation, with the caveatthat if you put a virtual package inside a self-contained one, it’sgonna have a really short__path__!)

Backwards Compatibility and Performance

Notice that these two changesonly affect import operations thattoday would result inImportError. As a result, the performanceof imports that do not involve virtual packages is unaffected, andpotential backward compatibility issues are very restricted.

Today, if you try to import submodules or subpackages from a modulewith no__path__, it’s an immediate error. And of course, if youdon’t have azc.py orzc/__init__.py somewhere onsys.pathtoday,importzc would likewise fail.

Thus, the only potential backwards-compatibility issues are:

  1. Tools that expect package directories to have an__init__module, that expect directories without an__init__ moduleto be unimportable, or that expect__path__ attributes to bestatic, will not recognize virtual packages as packages.

    (In practice, this just means that tools will need updating tosupport virtual packages, e.g. by usingpkgutil.walk_modules()instead of using hardcoded filesystem searches.)

  2. Code thatexpects certain imports to fail may now do somethingunexpected. This should be fairly rare in practice, as most sane,non-test code does not import things that are expected not toexist!

The biggest likely exception to the above would be when a piece ofcode tries to check whether some package is installed by importingit. If this is doneonly by importing a top-level module (i.e., notchecking for a__version__ or some other attribute),and thereis a directory of the same name as the sought-for package onsys.path somewhere,and the package is not actually installed,then such code could be fooled into thinking a package is installedthat really isn’t.

For example, suppose someone writes a script (datagen.py)containing the following code:

try:importjsonexceptImportError:importsimplejsonasjson

And runs it in a directory laid out like this:

datagen.pyjson/foo.jsbar.js

Ifimportjson succeeded due to the mere presence of thejson/subdirectory, the code would incorrectly believe that thejsonmodule was available, and proceed to fail with an error.

However, we can prevent corner cases like these from arising, simplyby making one small change to the algorithm presented so far. Insteadof allowing you to import a “pure virtual” package (likezc),we allow only importing of thecontents of virtual packages.

That is, a statement likeimportzc should raiseImportErrorif there is nozc.py orzc/__init__.py onsys.path. But,doingimportzc.buildout should still succeed, as long as there’sazc/buildout.py orzc/buildout/__init__.py onsys.path.

In other words, we don’t allow pure virtual packages to be importeddirectly, only modules and self-contained packages. (This is anacceptable limitation, because there is nofunctional value toimporting such a package by itself. After all, the module objectwill have nocontents until you import at least one of itssubpackages or submodules!)

Oncezc.buildout has been successfully imported, though, therewill be azc module insys.modules, and trying to import itwill of course succeed. We are only preventing aninitial importfrom succeeding, in order to prevent false-positive import successeswhen clashing subdirectories are present onsys.path.

So, with this slight change, thedatagen.py example above willwork correctly. When it doesimportjson, the mere presence of ajson/ directory will simply not affect the import process at all,even if it contains.py files. Thejson/ directory will stillonly be searched in the case where an import likeimportjson.converter is attempted.

Meanwhile, tools that expect to locate packages and modules bywalking a directory tree can be updated to use the existingpkgutil.walk_modules() API, and tools that need to inspectpackages in memory should use the other APIs described in theStandard Library Changes/Additions section below.

Specification

A change is made to the existing import process, when importingnames containing at least one. – that is, imports of modulesthat have a parent package.

Specifically, if the parent package does not exist, or exists butlacks a__path__ attribute, an attempt is first made to create a“virtual path” for the parent package (following the algorithmdescribed in the section onvirtual paths, below).

If the computed “virtual path” is empty, anImportError results,just as it would today. However, if a non-empty virtual path isobtained, the normal import of the submodule or subpackage proceeds,using that virtual path to find the submodule or subpackage. (Justas it would have with the parent’s__path__, if the parent packagehad existed and had a__path__.)

When a submodule or subpackage is found (but not yet loaded),the parent package is created and added tosys.modules (if itdidn’t exist before), and its__path__ is set to the computedvirtual path (if it wasn’t already set).

In this way, when the actual loading of the submodule or subpackageoccurs, it will see a parent package existing, and any relativeimports will work correctly. However, if no submodule or subpackageexists, then the parent package willnot be created, nor will astandalone module be converted into a package (by the addition of aspurious__path__ attribute).

Note, by the way, that this change must be appliedrecursively: thatis, iffoo andfoo.bar are pure virtual packages, thenimportfoo.bar.baz must wait untilfoo.bar.baz is found beforecreating module objects forbothfoo andfoo.bar, and thencreate both of them together, properly setting thefoo module’s.bar attribute to point to thefoo.bar module.

In this way, pure virtual packages are never directly importable:animportfoo orimportfoo.bar by itself will fail, and thecorresponding modules will not appear insys.modules until theyare needed to point to asuccessfully imported submodule orself-contained subpackage.

Virtual Paths

A virtual path is created by obtaining aPEP 302 “importer” object foreach of the path entries found insys.path (for a top-levelmodule) or the parent__path__ (for a submodule).

(Note: becausesys.meta_path importers are not associated withsys.path or__path__ entry strings, such importers donotparticipate in this process.)

Each importer is checked for aget_subpath() method, and ifpresent, the method is called with the full name of the module/packagethe path is being constructed for. The return value is either astring representing a subdirectory for the requested package, orNone if no such subdirectory exists.

The strings returned by the importers are added to the path listbeing built, in the same order as they are found. (None valuesand missingget_subpath() methods are simply skipped.)

The resulting list (whether empty or not) is then stored in asys.virtual_package_paths dictionary, keyed by module name.

This dictionary has two purposes. First, it serves as a cache, inthe event that more than one attempt is made to import a submoduleof a virtual package.

Second, and more importantly, the dictionary can be used by code thatextendssys.path at runtime toupdate imported packages’__path__ attributes accordingly. (SeeStandard LibraryChanges/Additions below for more details.)

In Python code, the virtual path construction algorithm would looksomething like this:

defget_virtual_path(modulename,parent_path=None):ifmodulenameinsys.virtual_package_paths:returnsys.virtual_package_paths[modulename]ifparent_pathisNone:parent_path=sys.pathpath=[]forentryinparent_path:# Obtain a PEP 302 importer object - see pkgutil moduleimporter=pkgutil.get_importer(entry)ifhasattr(importer,'get_subpath'):subpath=importer.get_subpath(modulename)ifsubpathisnotNone:path.append(subpath)sys.virtual_package_paths[modulename]=pathreturnpath

And a function like this one should be exposed in the standardlibrary as e.g.imp.get_virtual_path(), so that people creating__import__ replacements orsys.meta_path hooks can reuse it.

Standard Library Changes/Additions

Thepkgutil module should be updated to handle thisspecification appropriately, including any necessary changes toextend_path(),iter_modules(), etc.

Specifically the proposed changes and additions topkgutil are:

  • A newextend_virtual_paths(path_entry) function, to extendexisting, already-imported virtual packages’__path__ attributesto include any portions found in a newsys.path entry. Thisfunction should be called by applications extendingsys.pathat runtime, e.g. when adding a plugin directory or an egg to thepath.

    The implementation of this function does a simple top-down traversalofsys.virtual_package_paths, and performs any necessaryget_subpath() calls to identify what path entries need to beadded to the virtual path for that package, given thatpath_entryhas been added tosys.path. (Or, in the case of sub-packages,adding a derived subpath entry, based on their parent package’svirtual path.)

    (Note: this function must update both the path values insys.virtual_package_paths as well as the__path__ attributesof any corresponding modules insys.modules, even though in thecommon case they will both be the samelist object.)

  • A newiter_virtual_packages(parent='') function to allowtop-down traversal of virtual packages fromsys.virtual_package_paths, by yielding the child virtualpackages ofparent. For example, callingiter_virtual_packages("zope") might yieldzope.appandzope.products (if they are virtual packages listed insys.virtual_package_paths), butnotzope.foo.bar.(This function is needed to implementextend_virtual_paths(),but is also potentially useful for other code that needs to inspectimported virtual packages.)
  • ImpImporter.iter_modules() should be changed to also detect andyield the names of modules found in virtual packages.

In addition to the above changes, thezipimport importer shouldhave itsiter_modules() implementation similarly changed. (Note:current versions of Python implement this via a shim inpkgutil,so technically this is also a change topkgutil.)

Last, but not least, theimp module (orimportlib, ifappropriate) should expose the algorithm described in thevirtualpaths section above, as aget_virtual_path(modulename,parent_path=None) function, so thatcreators of__import__ replacements can use it.

Implementation Notes

For users, developers, and distributors of virtual packages:

  • While virtual packages are easy to set up and use, there is stilla time and place for using self-contained packages. While it’s notstrictly necessary, adding an__init__ module to yourself-contained packages lets users of the package (and Pythonitself) know thatall of the package’s code will be found inthat single subdirectory. In addition, it lets you define__all__, expose a public API, provide a package-level docstring,and do other things that make more sense for a self-containedproject than for a mere “namespace” package.
  • sys.virtual_package_paths is allowed to contain entries fornon-existent or not-yet-imported package names; code that uses itscontents should not assume that every key in this dictionary is alsopresent insys.modules or that importing the name willnecessarily succeed.
  • If you are changing a currently self-contained package into avirtual one, it’s important to note that you can no longer use its__file__ attribute to locate data files stored in a packagedirectory. Instead, you must search__path__ or use the__file__ of a submodule adjacent to the desired files, orof a self-contained subpackage that contains the desired files.

    (Note: this caveat is already true for existing users of “namespacepackages” today. That is, it is an inherent result of being ableto partition a package, that you must knowwhich partition thedesired data file lives in. We mention it here simply so thatnew users converting from self-contained to virtual packages willalso be aware of it.)

  • XXX what is the __file__ of a “pure virtual” package?None?Some arbitrary string? The path of the first directory with atrailing separator? No matter what we put,some code isgoing to break, but the last choice might allow some code toaccidentally work. Is that good or bad?

For those implementingPEP 302 importer objects:

  • Importers that support theiter_modules() method (used bypkgutil to locate importable modules and packages) and want toadd virtual package support should modify theiriter_modules()method so that it discovers and lists virtual packages as well asstandard modules and packages. To do this, the importer shouldsimply list all immediate subdirectory names in its jurisdictionthat are valid Python identifiers.

    XXX This might list a lot of not-really-packages. Should werequire importable contents to exist? If so, how deep do wesearch, and how do we prevent e.g. link loops, or traversing ontodifferent filesystems, etc.? Ick. Also, if virtual packages arelisted, they still can’t beimported, which is a problem for theway thatpkgutil.walk_modules() is currently implemented.

  • “Meta” importers (i.e., importers placed onsys.meta_path) donot need to implementget_subpath(), because the methodis only called on importers corresponding tosys.path entriesand__path__ entries. If a meta importer wishes to supportvirtual packages, it must do so entirely within its ownfind_module() implementation.

    Unfortunately, it is unlikely that any such implementation will beable to merge its package subpaths with those of other metaimporters orsys.path importers, so the meaning of “supportingvirtual packages” for a meta importer is currently undefined!

    (However, since the intended use case for meta importers is toreplace Python’s normal import process entirely for some subset ofmodules, and the number of such importers currently implemented isquite small, this seems unlikely to be a big issue in practice.)

References

[1]
“namespace” vs “module” packages (mailing list thread)(http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
[2]
“Dropping __init__.py requirement for subpackages”(https://mail.python.org/pipermail/python-dev/2006-April/064400.html)
[3]
Namespace Packages resolution(https://mail.python.org/pipermail/import-sig/2012-March/000421.html)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0402.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp