Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 302 – New Import Hooks

Author:
Just van Rossum <just at letterror.com>,Paul Moore <p.f.moore at gmail.com>
Status:
Final
Type:
Standards Track
Created:
19-Dec-2002
Python-Version:
2.3
Post-History:
19-Dec-2002

Table of Contents

Warning

The language reference for import[10] and importlib documentation[11] now supersede this PEP. This document is no longer updatedand provided for historical purposes only.

Abstract

This PEP proposes to add a new set of import hooks that offer bettercustomization of the Python import mechanism. Contrary to the current__import__ hook, a new-style hook can be injected into the existingscheme, allowing for a finer grained control of how modules are found and howthey are loaded.

Motivation

The only way to customize the import mechanism is currently to override thebuilt-in__import__ function. However, overriding__import__ has manyproblems. To begin with:

  • An__import__ replacement needs tofully reimplement the entireimport mechanism, or call the original__import__ before or after thecustom code.
  • It has very complex semantics and responsibilities.
  • __import__ gets called even for modules that are already insys.modules, which is almost never what you want, unless you’re writingsome sort of monitoring tool.

The situation gets worse when you need to extend the import mechanism from C:it’s currently impossible, apart from hacking Python’simport.c orreimplementing much ofimport.c from scratch.

There is a fairly long history of tools written in Python that allow extendingthe import mechanism in various way, based on the__import__ hook. TheStandard Library includes two such tools:ihooks.py (by GvR) andimputil.py[1] (Greg Stein), but perhaps the most famous isiu.py byGordon McMillan, available as part of his Installer package. Their usefulnessis somewhat limited because they are written in Python; bootstrapping issuesneed to worked around as you can’t load the module containing the hook withthe hook itself. So if you want the entire Standard Library to be loadablefrom an import hook, the hook must be written in C.

Use cases

This section lists several existing applications that depend on import hooks.Among these, a lot of duplicate work was done that could have been saved ifthere had been a more flexible import hook at the time. This PEP should makelife a lot easier for similar projects in the future.

Extending the import mechanism is needed when you want to load modules thatare stored in a non-standard way. Examples include modules that are bundledtogether in an archive; byte code that is not stored in apyc formattedfile; modules that are loaded from a database over a network.

The work on this PEP was partly triggered by the implementation ofPEP 273,which adds imports from Zip archives as a built-in feature to Python. Whilethe PEP itself was widely accepted as a must-have feature, the implementationleft a few things to desire. For one thing it went through great lengths tointegrate itself withimport.c, adding lots of code that was eitherspecific for Zip file imports ornot specific to Zip imports, yet was notgenerally useful (or even desirable) either. Yet thePEP 273 implementationcan hardly be blamed for this: it is simply extremely hard to do, given thecurrent state ofimport.c.

Packaging applications for end users is a typical use case for import hooks,if notthe typical use case. Distributing lots of source orpyc filesaround is not always appropriate (let alone a separate Python installation),so there is a frequent desire to package all needed modules in a single file.So frequent in fact that multiple solutions have been implemented over theyears.

The oldest one is included with the Python source code: Freeze[2]. It putsmarshalled byte code into static objects in C source code. Freeze’s “importhook” is hard wired intoimport.c, and has a couple of issues. Latersolutions include Fredrik Lundh’s Squeeze, Gordon McMillan’s Installer, andThomas Heller’s py2exe[3]. MacPython ships with a tool calledBuildApplication.

Squeeze, Installer and py2exe use an__import__ based scheme (py2execurrently uses Installer’siu.py, Squeeze usedihooks.py), MacPythonhas two Mac-specific import hooks hard wired intoimport.c, that aresimilar to the Freeze hook. The hooks proposed in this PEP enables us (atleast in theory; it’s not a short-term goal) to get rid of the hard codedhooks inimport.c, and would allow the__import__-based tools to getrid of most of theirimport.c emulation code.

Before work on the design and implementation of this PEP was started, a newBuildApplication-like tool for Mac OS X prompted one of the authors ofthis PEP (JvR) to expose the table of frozen modules to Python, in theimpmodule. The main reason was to be able to use the freeze import hook(avoiding fancy__import__ support), yet to also be able to supply a setof modules at runtime. This resulted in issue #642578[4], which wasmysteriously accepted (mostly because nobody seemed to care either way ;-).Yet it is completely superfluous when this PEP gets accepted, as it offers amuch nicer and general way to do the same thing.

Rationale

While experimenting with alternative implementation ideas to get built-in Zipimport, it was discovered that achieving this is possible with only a fairlysmall amount of changes toimport.c. This allowed to factor out theZip-specific stuff into a new source file, while at the same time creating ageneral new import hook scheme: the one you’re reading about now.

An earlier design allowed non-string objects onsys.path. Such an objectwould have the necessary methods to handle an import. This has twodisadvantages: 1) it breaks code that assumes all items onsys.path arestrings; 2) it is not compatible with thePYTHONPATH environment variable.The latter is directly needed for Zip imports. A compromise came from Jython:allow stringsubclasses onsys.path, which would then act as importerobjects. This avoids some breakage, and seems to work well for Jython (whereit is used to load modules from.jar files), but it was perceived as an“ugly hack”.

This led to a more elaborate scheme, (mostly copied from McMillan’siu.py) in which each in a list of candidates is asked whether it canhandle thesys.path item, until one is found that can. This list ofcandidates is a new object in thesys module:sys.path_hooks.

Traversingsys.path_hooks for each path item for each new import can beexpensive, so the results are cached in another new object in thesysmodule:sys.path_importer_cache. It mapssys.path entries to importerobjects.

To minimize the impact onimport.c as well as to avoid adding extraoverhead, it was chosen to not add an explicit hook and importer object forthe existing file system import logic (asiu.py has), but to simply fallback to the built-in logic if no hook onsys.path_hooks could handle thepath item. If this is the case, aNone value is stored insys.path_importer_cache, again to avoid repeated lookups. (Later we cango further and add a real importer object for the built-in mechanism, for now,theNone fallback scheme should suffice.)

A question was raised: what about importers that don’t needany entry onsys.path? (Built-in and frozen modules fall into that category.) Again,Gordon McMillan to the rescue:iu.py contains a thing he calls themetapath. In this PEP’s implementation, it’s a list of importer objectsthat is traversedbeforesys.path. This list is yet another new objectin thesys module:sys.meta_path. Currently, this list is empty bydefault, and frozen and built-in module imports are done after traversingsys.meta_path, but still beforesys.path.

Specification part 1: The Importer Protocol

This PEP introduces a new protocol: the “Importer Protocol”. It is importantto understand the context in which the protocol operates, so here is a briefoverview of the outer shells of the import mechanism.

When an import statement is encountered, the interpreter looks up the__import__ function in the built-in name space.__import__ is thencalled with four arguments, amongst which are the name of the module beingimported (may be a dotted name) and a reference to the current globalnamespace.

The built-in__import__ function (known asPyImport_ImportModuleEx()inimport.c) will then check to see whether the module doing the import isa package or a submodule of a package. If it is indeed a (submodule of a)package, it first tries to do the import relative to the package (the parentpackage for a submodule). For example, if a package named “spam” does “importeggs”, it will first look for a module named “spam.eggs”. If that fails, theimport continues as an absolute import: it will look for a module named“eggs”. Dotted name imports work pretty much the same: if package “spam” does“import eggs.bacon” (and “spam.eggs” exists and is itself a package),“spam.eggs.bacon” is tried. If that fails “eggs.bacon” is tried. (There aremore subtleties that are not described here, but these are not relevant forimplementers of the Importer Protocol.)

Deeper down in the mechanism, a dotted name import is split up by itscomponents. For “import spam.ham”, first an “import spam” is done, and onlywhen that succeeds is “ham” imported as a submodule of “spam”.

The Importer Protocol operates at this level ofindividual imports. By thetime an importer gets a request for “spam.ham”, module “spam” has already beenimported.

The protocol involves two objects: afinder and aloader. A finder objecthas a single method:

finder.find_module(fullname,path=None)

This method will be called with the fully qualified name of the module. Ifthe finder is installed onsys.meta_path, it will receive a secondargument, which isNone for a top-level module, orpackage.__path__for submodules or subpackages[5]. It should return a loader object if themodule was found, orNone if it wasn’t. Iffind_module() raises anexception, it will be propagated to the caller, aborting the import.

A loader object also has one method:

loader.load_module(fullname)

This method returns the loaded module or raises an exception, preferablyImportError if an existing exception is not being propagated. Ifload_module() is asked to load a module that it cannot,ImportError isto be raised.

In many cases the finder and loader can be one and the same object:finder.find_module() would just returnself.

Thefullname argument of both methods is the fully qualified module name,for example “spam.eggs.ham”. As explained above, whenfinder.find_module("spam.eggs.ham") is called, “spam.eggs” has alreadybeen imported and added tosys.modules. However, thefind_module()method isn’t necessarily always called during an actual import: meta toolsthat analyze import dependencies (such as freeze, Installer or py2exe) don’tactually load modules, so a finder shouldn’tdepend on the parent packagebeing available insys.modules.

Theload_module() method has a few responsibilities that it must fulfillbefore it runs any code:

  • If there is an existing module object named ‘fullname’ insys.modules,the loader must use that existing module. (Otherwise, thereload()builtin will not work correctly.) If a module named ‘fullname’ does notexist insys.modules, the loader must create a new module object andadd it tosys.modules.

    Note that the module objectmust be insys.modules before the loaderexecutes the module code. This is crucial because the module code may(directly or indirectly) import itself; adding it tosys.modulesbeforehand prevents unbounded recursion in the worst case and multipleloading in the best.

    If the load fails, the loader needs to remove any module it may haveinserted intosys.modules. If the module was already insys.modulesthen the loader should leave it alone.

  • The__file__ attribute must be set. This must be a string, but it maybe a dummy value, for example “<frozen>”. The privilege of not having a__file__ attribute at all is reserved for built-in modules.
  • The__name__ attribute must be set. If one usesimp.new_module()then the attribute is set automatically.
  • If it’s a package, the__path__ variable must be set. This must be alist, but may be empty if__path__ has no further significance to theimporter (more on this later).
  • The__loader__ attribute must be set to the loader object. This ismostly for introspection and reloading, but can be used forimporter-specific extras, for example getting data associated with animporter.
  • The__package__ attribute must be set (PEP 366).

    If the module is a Python module (as opposed to a built-in module or adynamically loaded extension), it should execute the module’s code in themodule’s global name space (module.__dict__).

    Here is a minimal pattern for aload_module() method:

    # Consider using importlib.util.module_for_loader() to handle# most of these details for you.defload_module(self,fullname):code=self.get_code(fullname)ispkg=self.is_package(fullname)mod=sys.modules.setdefault(fullname,imp.new_module(fullname))mod.__file__="<%s>"%self.__class__.__name__mod.__loader__=selfifispkg:mod.__path__=[]mod.__package__=fullnameelse:mod.__package__=fullname.rpartition('.')[0]exec(code,mod.__dict__)returnmod

Specification part 2: Registering Hooks

There are two types of import hooks:Meta hooks andPath hooks. Metahooks are called at the start of import processing, before any other importprocessing (so that meta hooks can overridesys.path processing, frozenmodules, or even built-in modules). To register a meta hook, simply add thefinder object tosys.meta_path (the list of registered meta hooks).

Path hooks are called as part ofsys.path (orpackage.__path__)processing, at the point where their associated path item is encountered. Apath hook is registered by adding an importer factory tosys.path_hooks.

sys.path_hooks is a list of callables, which will be checked in sequenceto determine if they can handle a given path item. The callable is calledwith one argument, the path item. The callable must raiseImportError ifit is unable to handle the path item, and return an importer object if it canhandle the path item. Note that if the callable returns an importer objectfor a specificsys.path entry, the builtin import machinery will not beinvoked to handle that entry any longer, even if the importer object laterfails to find a specific module. The callable is typically the class of theimport hook, and hence the class__init__() method is called. (This isalso the reason why it should raiseImportError: an__init__() methodcan’t return anything. This would be possible with a__new__() method ina new style class, but we don’t want to require anything about how a hook isimplemented.)

The results of path hook checks are cached insys.path_importer_cache,which is a dictionary mapping path entries to importer objects. The cache ischecked beforesys.path_hooks is scanned. If it is necessary to force arescan ofsys.path_hooks, it is possible to manually clear all or part ofsys.path_importer_cache.

Just likesys.path itself, the newsys variables must have specifictypes:

  • sys.meta_path andsys.path_hooks must be Python lists.
  • sys.path_importer_cache must be a Python dict.

Modifying these variables in place is allowed, as is replacing them with newobjects.

Packages and the role of__path__

If a module has a__path__ attribute, the import mechanism will treat itas a package. The__path__ variable is used instead ofsys.path whenimporting submodules of the package. The rules forsys.path thereforealso apply topkg.__path__. Sosys.path_hooks is also consulted whenpkg.__path__ is traversed. Meta importers don’t necessarily usesys.path at all to do their work and may therefore ignore the value ofpkg.__path__. In this case it is still advised to set it to list, whichcan be empty.

Optional Extensions to the Importer Protocol

The Importer Protocol defines three optional extensions. One is to retrievedata files, the second is to support module packaging tools and/or tools thatanalyze module dependencies (for example Freeze), while the last is to supportexecution of modules as scripts. The latter two categories of tools usuallydon’t actuallyload modules, they only need to know if and where they areavailable. All three extensions are highly recommended for general purposeimporters, but may safely be left out if those features aren’t needed.

To retrieve the data for arbitrary “files” from the underlying storagebackend, loader objects may supply a method namedget_data():

loader.get_data(path)

This method returns the data as a string, or raiseIOError if the “file”wasn’t found. The data is always returned as if “binary” mode was used -there is no CRLF translation of text files, for example. It is meant forimporters that have some file-system-like properties. The ‘path’ argument isa path that can be constructed by mungingmodule.__file__ (orpkg.__path__ items) with theos.path.* functions, for example:

d=os.path.dirname(__file__)data=__loader__.get_data(os.path.join(d,"logo.gif"))

The following set of methods may be implemented if support for (for example)Freeze-like tools is desirable. It consists of three additional methodswhich, to make it easier for the caller, each of which should be implemented,or none at all:

loader.is_package(fullname)loader.get_code(fullname)loader.get_source(fullname)

All three methods should raiseImportError if the module wasn’t found.

Theloader.is_package(fullname) method should returnTrue if themodule specified by ‘fullname’ is a package andFalse if it isn’t.

Theloader.get_code(fullname) method should return the code objectassociated with the module, orNone if it’s a built-in or extensionmodule. If the loader doesn’t have the code object but itdoes have thesource code, it should return the compiled source code. (This is so that ourcaller doesn’t also need to checkget_source() if all it needs is the codeobject.)

Theloader.get_source(fullname) method should return the source code forthe module as a string (using newline characters for line endings) orNoneif the source is not available (yet it should still raiseImportError ifthe module can’t be found by the importer at all).

To support execution of modules as scripts (PEP 338),the above three methods forfinding the code associated with a module must be implemented. In addition tothose methods, the following method may be provided in order to allow therunpy module to correctly set the__file__ attribute:

loader.get_filename(fullname)

This method should return the value that__file__ would be set to if thenamed module was loaded. If the module is not found, thenImportErrorshould be raised.

Integration with the ‘imp’ module

The new import hooks are not easily integrated in the existingimp.find_module() andimp.load_module() calls. It’s questionablewhether it’s possible at all without breaking code; it is better to simply adda new function to theimp module. The meaning of the existingimp.find_module() andimp.load_module() calls changes from: “theyexpose the built-in import mechanism” to “they expose the basicunhookedbuilt-in import mechanism”. They simply won’t invoke any import hooks. A newimp module function is proposed (but not yet implemented) under the nameget_loader(), which is used as in the following pattern:

loader=imp.get_loader(fullname,path)ifloaderisnotNone:loader.load_module(fullname)

In the case of a “basic” import, one theimp.find_module() function wouldhandle, the loader object would be a wrapper for the current output ofimp.find_module(), andloader.load_module() would callimp.load_module() with that output.

Note that this wrapper is currently not yet implemented, although a Pythonprototype exists in thetest_importhooks.py script (theImpWrapperclass) included with the patch.

Forward Compatibility

Existing__import__ hooks will not invoke new-style hooks by magic, unlessthey call the original__import__ function as a fallback. For example,ihooks.py,iu.py andimputil.py are in this sense not forwardcompatible with this PEP.

Open Issues

Modules often need supporting data files to do their job, particularly in thecase of complex packages or full applications. Current practice is generallyto locate such files viasys.path (or apackage.__path__ attribute).This approach will not work, in general, for modules loaded via an importhook.

There are a number of possible ways to address this problem:

  • “Don’t do that”. If a package needs to locate data files via its__path__, it is not suitable for loading via an import hook. Thepackage can still be located on a directory insys.path, as at present,so this should not be seen as a major issue.
  • Locate data files from a standard location, rather than relative to themodule file. A relatively simple approach (which is supported bydistutils) would be to locate data files based onsys.prefix (orsys.exec_prefix). For example, looking inos.path.join(sys.prefix,"data",package_name).
  • Import hooks could offer a standard way of getting at data files relativeto the module file. The standardzipimport object provides a methodget_data(name) which returns the content of the “file” calledname,as a string. To allow modules to get at the importer object,zipimportalso adds an attribute__loader__ to the module, containing thezipimport object used to load the module. If such an approach is used,it is important that client code takes care not to break if theget_data() method is not available, so it is not clear that thisapproach offers a general answer to the problem.

It was suggested on python-dev that it would be useful to be able to receive alist of available modules from an importer and/or a list of available datafiles for use with theget_data() method. The protocol could grow twoadditional extensions, saylist_modules() andlist_files(). Thelatter makes sense on loader objects with aget_data() method. However,it’s a bit unclear which object should implementlist_modules(): theimporter or the loader or both?

This PEP is biased towards loading modules from alternative places: itcurrently doesn’t offer dedicated solutions for loading modules fromalternative file formats or with alternative compilers. In contrast, theihooks module from the standard library does have a fairly straightforwardway to do this. The Quixote project[7] uses this technique to import PTLfiles as if they are ordinary Python modules. To do the same with the newhooks would either mean to add a new module implementing a subset ofihooks as a new-style importer, or add a hookable built-in path importerobject.

There is no specific support within this PEP for “stacking” hooks. Forexample, it is not obvious how to write a hook to load modules fromtar.gzfiles by combining separate hooks to load modules from.tar and.gzfiles. However, there is no support for such stacking in the existing hookmechanisms (either the basic “replace__import__” method, or any of theexisting import hook modules) and so this functionality is not an obviousrequirement of the new mechanism. It may be worth considering as a futureenhancement, however.

It is possible (viasys.meta_path) to add hooks which run beforesys.path is processed. However, there is no equivalent way of addinghooks to run aftersys.path is processed. For now, if a hook is requiredaftersys.path has been processed, it can be simulated by adding anarbitrary “cookie” string at the end ofsys.path, and having the requiredhook associated with this cookie, via the normalsys.path_hooksprocessing. In the longer term, the path handling code will become a “real”hook onsys.meta_path, and at that stage it will be possible to insertuser-defined hooks either before or after it.

Implementation

ThePEP 302 implementation has been integrated with Python as of 2.3a1. Anearlier version is available as patch #652586[9], but more interestingly,the issue contains a fairly detailed history of the development and design.

PEP 273 has been implemented usingPEP 302’s import hooks.

References and Footnotes

[1]
imputil modulehttp://docs.python.org/library/imputil.html
[2]
The Freeze tool.See also theTools/freeze/ directory in a Python source distribution
[3]
py2exe by Thomas Hellerhttp://www.py2exe.org/
[4]
imp.set_frozenmodules() patchhttp://bugs.python.org/issue642578
[5]
The path argument tofinder.find_module() is there because thepkg.__path__ variable may be needed at this point. It may either comefrom the actual parent module or be supplied byimp.find_module() orthe proposedimp.get_loader() function.
[7]
Quixote, a framework for developing Web applicationshttp://www.mems-exchange.org/software/quixote/
[9]
New import hooks + Import from Zip fileshttp://bugs.python.org/issue652586
[10]
Language reference for importshttp://docs.python.org/3/reference/import.html
[11]
importlib documentationhttp://docs.python.org/3/library/importlib.html#module-importlib

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0302.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp