Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 395 – Qualified Names for Modules

Author:
Alyssa Coghlan <ncoghlan at gmail.com>
Status:
Withdrawn
Type:
Standards Track
Created:
04-Mar-2011
Python-Version:
3.4
Post-History:
05-Mar-2011, 19-Nov-2011

Table of Contents

PEP Withdrawal

This PEP was withdrawn by the author in December 2013, as other significantchanges in the time since it was written have rendered several aspectsobsolete. Most notablyPEP 420 namespace packages rendered some of theproposals related to package detection unworkable andPEP 451 modulespecifications resolved the multiprocessing issues and provide a possiblemeans to tackle the pickle compatibility issues.

A future PEP to resolve the remaining issues would still be appropriate,but it’s worth starting any such effort as a fresh PEP restating theremaining problems in an updated context rather than trying to build onthis one directly.

Abstract

This PEP proposes new mechanisms that eliminate some longstanding traps forthe unwary when dealing with Python’s import system, as well as serialisationand introspection of functions and classes.

It builds on the “Qualified Name” concept defined inPEP 3155.

Relationship with Other PEPs

Most significantly, this PEP is currently deferred as it requiressignificant changes in order to be made compatible with the removalof mandatory __init__.py files inPEP 420 (which has been implementedand released in Python 3.3).

This PEP builds on the “qualified name” concept introduced byPEP 3155, andalso shares in that PEP’s aim of fixing some ugly corner cases when dealingwith serialisation of arbitrary functions and classes.

It also builds onPEP 366, which took initial tentative steps towards makingexplicit relative imports from the main module work correctly in at leastsome circumstances.

Finally,PEP 328 eliminated implicit relative imports from imported modules.This PEP proposes that the de facto implicit relative imports from mainmodules that are provided by the current initialisation behaviour forsys.path[0] also be eliminated.

What’s in a__name__?

Over time, a module’s__name__ attribute has come to be used to handle anumber of different tasks.

The key use cases identified for this module attribute are:

  1. Flagging the main module in a program, using theif__name__=="__main__": convention.
  2. As the starting point for relative imports
  3. To identify the location of function and class definitions within therunning application
  4. To identify the location of classes for serialisation into pickle objectswhich may be shared with other interpreter instances

Traps for the Unwary

The overloading of the semantics of__name__, along with some historicallyassociated behaviour in the initialisation ofsys.path[0], has resulted inseveral traps for the unwary. These traps can be quite annoying in practice,as they are highly unobvious (especially to beginners) and can cause quiteconfusing behaviour.

Why are my imports broken?

There’s a general principle that applies when modifyingsys.path:neverput a package directory directly onsys.path. The reason this isproblematic is that every module in that directory is now potentiallyaccessible under two different names: as a top level module (since thepackage directory is onsys.path) and as a submodule of the package (ifthe higher level directory containing the package itself is also onsys.path).

As an example, Django (up to and including version 1.3) is guilty of settingup exactly this situation for site-specific applications - the applicationends up being accessible as bothapp andsite.app in the modulenamespace, and these are actually twodifferent copies of the module. Thisis a recipe for confusion if there is any meaningful mutable module levelstate, so this behaviour is being eliminated from the default site set up inversion 1.4 (site-specific apps will always be fully qualified with the sitename).

However, it’s hard to blame Django for this, when the same part of Pythonresponsible for setting__name__="__main__" in the main module commitsthe exact same error when determining the value forsys.path[0].

The impact of this can be seen relatively frequently if you follow the“python” and “import” tags on Stack Overflow. When I had the time to followit myself, I regularly encountered people struggling to understand thebehaviour of straightforward package layouts like the following (I actuallyuse package layouts along these lines in my own projects):

project/setup.pyexample/__init__.pyfoo.pytests/__init__.pytest_foo.py

While I would often see it without the__init__.py files first, that’s atrivial fix to explain. What’s hard to explain is that all of the followingways to invoketest_foo.pyprobably won’t work due to broken imports(either failing to findexample for absolute imports, complainingabout relative imports in a non-package or beyond the toplevel package forexplicit relative imports, or issuing even more obscure errors if some othersubmodule happens to shadow the name of a top-level module, such as anexample.json module that handled serialisation or anexample.tests.unittest test runner):

# These commands will most likely *FAIL*, even if the code is correct# working directory: project/example/tests./test_foo.pypythontest_foo.pypython-mpackage.tests.test_foopython-c"from package.tests.test_foo import main; main()"# working directory: project/packagetests/test_foo.pypythontests/test_foo.pypython-mpackage.tests.test_foopython-c"from package.tests.test_foo import main; main()"# working directory: projectexample/tests/test_foo.pypythonexample/tests/test_foo.py# working directory: project/..project/example/tests/test_foo.pypythonproject/example/tests/test_foo.py# The -m and -c approaches don't work from here either, but the failure# to find 'package' correctly is easier to explain in this case

That’s right, that long list is of all the methods of invocation that willalmost certainlybreak if you try them, and the error messages won’t makeany sense if you’re not already intimately familiar not only with the wayPython’s import system works, but also with how it gets initialised.

For a long time, the only way to getsys.path right with that kind ofsetup was to either set it manually intest_foo.py itself (hardlysomething a novice, or even many veteran, Python programmers are going toknow how to do) or else to make sure to import the module instead ofexecuting it directly:

# working directory: projectpython-c"from package.tests.test_foo import main; main()"

Since the implementation ofPEP 366 (which defined a mechanism that allowsrelative imports to work correctly when a module inside a package is executedvia the-m switch), the following also works properly:

# working directory: projectpython-mpackage.tests.test_foo

The fact that most methods of invoking Python code from the command linebreak when that code is inside a package, and the two that do work are highlysensitive to the current working directory is all thoroughly confusing for abeginner. I personally believe it is one of the key factors leadingto the perception that Python packages are complicated and hard to get right.

This problem isn’t even limited to the command line - iftest_foo.py isopen in Idle and you attempt to run it by pressing F5, or if you try to runit by clicking on it in a graphical filebrowser, then it will fail in justthe same way it would if run directly from the command line.

There’s a reason the general “no package directories onsys.path”guideline exists, and the fact that the interpreter itself doesn’t followit when determiningsys.path[0] is the root cause of all sorts of grief.

In the past, this couldn’t be fixed due to backwards compatibility concerns.However, scripts potentially affected by this problem willalready requirefixes when porting to the Python 3.x (due to the elimination of implicitrelative imports when importing modules normally). This provides a convenientopportunity to implement a corresponding change in the initialisationsemantics forsys.path[0].

Importing the main module twice

Another venerable trap is the issue of importing__main__ twice. Thisoccurs when the main module is also imported under its real name, effectivelycreating two instances of the same module under different names.

If the state stored in__main__ is significant to the correct operationof the program, or if there is top-level code in the main module that hasnon-idempotent side effects, then this duplication can cause obscure andsurprising errors.

In a bit of a pickle

Something many users may not realise is that thepickle module sometimesrelies on the__module__ attribute when serialising instances of arbitraryclasses. So instances of classes defined in__main__ are pickled that way,and won’t be unpickled correctly by another python instance that only importedthat module instead of running it directly. This behaviour is the underlyingreason for the advice from many Python veterans to do as little as possiblein the__main__ module in any application that involves any form ofobject serialisation and persistence.

Similarly, when creating a pseudo-module (see next paragraph), pickles relyon the name of the module where a class is actually defined, rather than theofficially documented location for that class in the module hierarchy.

For the purposes of this PEP, a “pseudo-module” is a package designed likethe Python 3.2unittest andconcurrent.futures packages. Thesepackages are documented as if they were single modules, but are in factinternally implemented as a package. This issupposed to be animplementation detail that users and other implementations don’t need toworry about, but, thanks topickle (and serialisation in general),the details are often exposed and can effectively become part of the publicAPI.

While this PEP focuses specifically onpickle as the principalserialisation scheme in the standard library, this issue may also affectother mechanisms that support serialisation of arbitrary class instancesand rely on__module__ attributes to determine how to handledeserialisation.

Where’s the source?

Some sophisticated users of the pseudo-module technique describedabove recognise the problem with implementation details leaking out via thepickle module, and choose to address it by altering__name__ to referto the public location for the module before defining any functions or classes(or else by modifying the__module__ attributes of those objects afterthey have been defined).

This approach is effective at eliminating the leakage of information viapickling, but comes at the cost of breaking introspection for functions andclasses (as their__module__ attribute now points to the wrong place).

Forkless Windows

To get around the lack ofos.fork on Windows, themultiprocessingmodule attempts to re-execute Python with the same main module, but skippingover any code guarded byif__name__=="__main__": checks. It does thebest it can with the information it has, but is forced to make assumptionsthat simply aren’t valid whenever the main module isn’t an ordinary directlyexecuted script or top-level module. Packages and non-top-level modulesexecuted via the-m switch, as well as directly executed zipfiles ordirectories, are likely to make multiprocessing on Windows do the wrong thing(either quietly or noisily, depending on application details) when spawning anew process.

While this issue currently only affects Windows directly, it also impactsany proposals to provide Windows-style “clean process” invocation via themultiprocessing module on other platforms.

Qualified Names for Modules

To make it feasible to fix these problems once and for all, it is proposedto add a new module level attribute:__qualname__. This abbreviation of“qualified name” is taken fromPEP 3155, where it is used to store the namingpath to a nested class or function definition relative to the top levelmodule.

For modules,__qualname__ will normally be the same as__name__, justas it is for top-level functions and classes inPEP 3155. However, it willdiffer in some situations so that the above problems can be addressed.

Specifically, whenever__name__ is modified for some other purpose (suchas to denote the main module), then__qualname__ will remain unchanged,allowing code that needs it to access the original unmodified value.

If a module loader does not initialise__qualname__ itself, then theimport system will add it automatically (setting it to the same value as__name__).

Alternative Names

Two alternative names were also considered for the new attribute: “full name”(__fullname__) and “implementation name” (__implname__).

Either of those would actually be valid for the use case in this PEP.However, as a meta-issue,PEP 3155 isalso adding a new attribute (forfunctions and classes) that is “like__name__, but different in some caseswhere__name__ is missing necessary information” and those terms aren’taccurate for thePEP 3155 function and class use case.

PEP 3155 deliberately omits the module information, so the term “full name”is simply untrue, and “implementation name” implies that it may specify anobject other than that specified by__name__, and that is never thecase forPEP 3155 (in that PEP,__name__ and__qualname__ alwaysrefer to the same function or class, it’s just that__name__ isinsufficient to accurately identify nested functions and classes).

Since it seems needlessly inconsistent to addtwo new terms for attributesthat only exist because backwards compatibility concerns keep us fromchanging the behaviour of__name__ itself, this PEP instead chose toadopt thePEP 3155 terminology.

If the relative inscrutability of “qualified name” and__qualname__encourages interested developers to look them up at least once rather thanassuming they know what they mean just from the name and guessing wrong,that’s not necessarily a bad outcome.

Besides, 99% of Python developers should never need to even care these extraattributes exist - they’re really an implementation detail to let us fix afew problematic behaviours exhibited by imports, pickling and introspection,not something people are going to be dealing with on a regular basis.

Eliminating the Traps

The following changes are interrelated and make the most sense whenconsidered together. They collectively either completely eliminate the trapsfor the unwary noted above, or else provide straightforward mechanisms fordealing with them.

A rough draft of some of the concepts presented here was first posted on thepython-ideas list ([1]), but they have evolved considerably since first beingdiscussed in that thread. Further discussion has subsequently taken place onthe import-sig mailing list ([2].[3]).

Fixing main module imports inside packages

To eliminate this trap, it is proposed that an additional filesystem check beperformed when determining a suitable value forsys.path[0]. This checkwill look for Python’s explicit package directory markers and use them to findthe appropriate directory to add tosys.path.

The current algorithm for settingsys.path[0] in relevant cases is roughlyas follows:

# Interactive prompt, -m switch, -c switchsys.path.insert(0,'')
# Valid sys.path entry execution (i.e. directory and zip execution)sys.path.insert(0,sys.argv[0])
# Direct script executionsys.path.insert(0,os.path.dirname(sys.argv[0]))

It is proposed that this initialisation process be modified to takepackage details stored on the filesystem into account:

# Interactive prompt, -m switch, -c switchin_package,path_entry,_ignored=split_path_module(os.getcwd(),'')ifin_package:sys.path.insert(0,path_entry)else:sys.path.insert(0,'')# Start interactive prompt or run -c command as usual#   __main__.__qualname__ is set to "__main__"# The -m switches uses the same sys.path[0] calculation, but:#   modname is the argument to the -m switch#   modname is passed to ``runpy._run_module_as_main()`` as usual#   __main__.__qualname__ is set to modname
# Valid sys.path entry execution (i.e. directory and zip execution)modname="__main__"path_entry,modname=split_path_module(sys.argv[0],modname)sys.path.insert(0,path_entry)# modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``# __main__.__qualname__ is set to modname
# Direct script executionin_package,path_entry,modname=split_path_module(sys.argv[0])sys.path.insert(0,path_entry)ifin_package:# Pass modname to ``runpy._run_module_as_main()``else:# Run script directly# __main__.__qualname__ is set to modname

Thesplit_path_module() supporting function used in the above pseudo-codewould have the following semantics:

def_splitmodname(fspath):path_entry,fname=os.path.split(fspath)modname=os.path.splitext(fname)[0]returnpath_entry,modnamedef_is_package_dir(fspath):returnany(os.exists("__init__"+info[0])forinfoinimp.get_suffixes())defsplit_path_module(fspath,modname=None):"""Given a filesystem path and a relative module name, determine an       appropriate sys.path entry and a fully qualified module name.       Returns a 3-tuple of (package_depth, fspath, modname). A reported       package depth of 0 indicates that this would be a top level import.       If no relative module name is given, it is derived from the final       component in the supplied path with the extension stripped.    """ifmodnameisNone:fspath,modname=_splitmodname(fspath)package_depth=0while_is_package_dir(fspath):fspath,pkg=_splitmodname(fspath)modname=pkg+'.'+modnamereturnpackage_depth,fspath,modname

This PEP also proposes that thesplit_path_module() functionality beexposed directly to Python users via therunpy module.

With this fix in place, and the same simple package layout described earlier,all of the following commands would invoke the test suite correctly:

# working directory: project/example/tests./test_foo.pypythontest_foo.pypython-mpackage.tests.test_foopython-c"from .test_foo import main; main()"python-c"from ..tests.test_foo import main; main()"python-c"from package.tests.test_foo import main; main()"# working directory: project/packagetests/test_foo.pypythontests/test_foo.pypython-mpackage.tests.test_foopython-c"from .tests.test_foo import main; main()"python-c"from package.tests.test_foo import main; main()"# working directory: projectexample/tests/test_foo.pypythonexample/tests/test_foo.pypython-mpackage.tests.test_foopython-c"from package.tests.test_foo import main; main()"# working directory: project/..project/example/tests/test_foo.pypythonproject/example/tests/test_foo.py# The -m and -c approaches still don't work from here, but the failure# to find 'package' correctly is pretty easy to explain in this case

With these changes, clicking Python modules in a graphical file browsershould always execute them correctly, even if they live inside a package.Depending on the details of how it invokes the script, Idle would likely alsobe able to runtest_foo.py correctly with F5, without needing any Idlespecific fixes.

Optional addition: command line relative imports

With the above changes in place, it would be a fairly minor addition to allowexplicit relative imports as arguments to the-m switch:

# working directory: project/example/testspython-m.test_foopython-m..tests.test_foo# working directory: project/example/python-m.tests.test_foo

With this addition, system initialisation for the-m switch would changeas follows:

# -m switch (permitting explicit relative imports)in_package,path_entry,pkg_name=split_path_module(os.getcwd(),'')qualname=<<argumentsto-mswitch>>ifqualname.startswith('.'):modname=qualnamewhilemodname.startswith('.'):modname=modname[1:]pkg_name,sep,_ignored=pkg_name.rpartition('.')ifnotsep:raiseImportError("Attempted relative import beyond top level package")qualname=pkg_name+'.'modnameifin_package:sys.path.insert(0,path_entry)else:sys.path.insert(0,'')# qualname is passed to ``runpy._run_module_as_main()``# _main__.__qualname__ is set to qualname

Compatibility with PEP 382

Making this proposal compatible with thePEP 382 namespace packaging PEP istrivial. The semantics of_is_package_dir() are merely changed to be:

def_is_package_dir(fspath):return(fspath.endswith(".pyp")orany(os.exists("__init__"+info[0])forinfoinimp.get_suffixes()))

Incompatibility with PEP 402

PEP 402 proposes the elimination of explicit markers in the file system forPython packages. This fundamentally breaks the proposed concept of being ableto take a filesystem path and a Python module name and work out an unambiguousmapping to the Python module namespace. Instead, the appropriate mappingwould depend on the current values insys.path, rendering it impossibleto ever fix the problems described above with the calculation ofsys.path[0] when the interpreter is initialised.

While some aspects of this PEP could probably be salvaged ifPEP 402 wereadopted, the core concept of making import semantics from main and othermodules more consistent would no longer be feasible.

This incompatibility is discussed in more detail in the relevant import-sigthreads ([2],[3]).

Potential incompatibilities with scripts stored in packages

The proposed change tosys.path[0] initialisationmay break someexisting code. Specifically, it will break scripts stored in packagedirectories that rely on the implicit relative imports from__main__ inorder to run correctly under Python 3.

While such scripts could be imported in Python 2 (due to implicit relativeimports) it is already the case that they cannot be imported in Python 3,as implicit relative imports are no longer permitted when a module isimported.

By disallowing implicit relatives imports from the main module as well,such modules won’t even work as scripts with this PEP. Switching themover to explicit relative imports will then get them working again asboth executable scriptsand as importable modules.

To support earlier versions of Python, a script could be written to usedifferent forms of import based on the Python version:

if__name__=="__main__"andsys.version_info<(3,3):importpeer# Implicit relative importelse:from.importpeer# explicit relative import

Fixing dual imports of the main module

Given the above proposal to get__qualname__ consistently set correctlyin the main module, one simple change is proposed to eliminate the problemof dual imports of the main module: the addition of asys.metapath hookthat detects attempts to import__main__ under its real name and returnsthe original main module instead:

classAliasImporter:def__init__(self,module,alias):self.module=moduleself.alias=aliasdef__repr__(self):fmt="{0.__class__.__name__}({0.module.__name__},{0.alias})"returnfmt.format(self)deffind_module(self,fullname,path=None):ifpathisNoneandfullname==self.alias:returnselfreturnNonedefload_module(self,fullname):iffullname!=self.alias:raiseImportError("{!r} cannot load{!r}".format(self,fullname))returnself.main_module

This metapath hook would be added automatically during import systeminitialisation based on the following logic:

main=sys.modules["__main__"]ifmain.__name__!=main.__qualname__:sys.metapath.append(AliasImporter(main,main.__qualname__))

This is probably the least important proposal in the PEP - it justcloses off the last mechanism that is likely to lead to module duplicationafter the configuration ofsys.path[0] at interpreter startup isaddressed.

Fixing pickling without breaking introspection

To fix this problem, it is proposed to make use of the new module level__qualname__ attributes to determine the real module location when__name__ has been modified for any reason.

In the main module,__qualname__ will automatically be set to the mainmodule’s “real” name (as described above) by the interpreter.

Pseudo-modules that adjust__name__ to point to the public namespace willleave__qualname__ untouched, so the implementation location remains readilyaccessible for introspection.

If__name__ is adjusted at the top of a module, then this willautomatically adjust the__module__ attribute for all functions andclasses subsequently defined in that module.

Since multiple submodules may be set to use the same “public” namespace,functions and classes will be given a new__qualmodule__ attributethat refers to the__qualname__ of their module.

This isn’t strictly necessary for functions (you could find out theirmodule’s qualified name by looking in their globals dictionary), but it isneeded for classes, since they don’t hold a reference to the globals oftheir defining module. Once a new attribute is added to classes, it ismore convenient to keep the API consistent and add a new attribute tofunctions as well.

These changes mean that adjusting__name__ (and, either directly orindirectly, the corresponding function and class__module__ attributes)becomes the officially sanctioned way to implement a namespace as a package,while exposing the API as if it were still a single module.

All serialisation code that currently uses__name__ and__module__attributes will then avoid exposing implementation details by default.

To correctly handle serialisation of items from the main module, the classand function definition logic will be updated to also use__qualname__for the__module__ attribute in the case where__name__=="__main__".

With__name__ and__module__ being officially blessed as being usedfor thepublic names of things, the introspection tools in the standardlibrary will be updated to use__qualname__ and__qualmodule__where appropriate. For example:

  • pydoc will report both public and qualified names for modules
  • inspect.getsource() (and similar tools) will use the qualified namesthat point to the implementation of the code
  • additionalpydoc and/orinspect APIs may be provided that reportall modules with a given public__name__.

Fixing multiprocessing on Windows

With__qualname__ now available to tellmultiprocessing the realname of the main module, it will be able to simply include it in theserialised information passed to the child process, eliminating theneed for the current dubious introspection of the__file__ attribute.

For older Python versions,multiprocessing could be improved by applyingthesplit_path_module() algorithm described above when attempting towork out how to execute the main module based on its__file__ attribute.

Explicit relative imports

This PEP proposes that__package__ be unconditionally defined in themain module as__qualname__.rpartition('.')[0]. Aside from that, itproposes that the behaviour of explicit relative imports be left alone.

In particular, if__package__ is not set in a module when an explicitrelative import occurs, the automatically cached value will continue to bederived from__name__ rather than__qualname__. This minimises anybackwards incompatibilities with existing code that deliberately manipulatesrelative imports by adjusting__name__ rather than setting__package__directly.

This PEP doesnot propose that__package__ be deprecated. While it istechnically redundant following the introduction of__qualname__, it justisn’t worth the hassle of deprecating it within the lifetime of Python 3.x.

Reference Implementation

None as yet.

References

[1]
Module aliases and/or “real names”
[2] (1,2)
PEP 395 (Module aliasing) and the namespace PEPs
[3] (1,2)
Updated PEP 395 (aka “Implicit Relative Imports Must Die!”)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0395.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp