Python code in onemodule gains access to the code in another moduleby the process ofimporting it. Theimport statement isthe most common way of invoking the import machinery, but it is not the onlyway. Functions such asimportlib.import_module() and built-in__import__() can also be used to invoke the import machinery.
Theimport statement combines two operations; it searches for thenamed module, then it binds the results of that search to a name in the localscope. The search operation of theimport statement is defined asa call to the__import__() function, with the appropriate arguments.The return value of__import__() is used to perform the namebinding operation of theimport statement. See theimport statement for the exact details of that name bindingoperation.
A direct call to__import__() performs only the module search and, iffound, the module creation operation. While certain side-effects may occur,such as the importing of parent packages, and the updating of various caches(includingsys.modules), only theimport statement performsa name binding operation.
When calling__import__() as part of an import statement, theimport system first checks the module global namespace for a function bythat name. If it is not found, then the standard builtin__import__()is called. Other mechanisms for invoking the import system (such asimportlib.import_module()) do not perform this check and will alwaysuse the standard import system.
When a module is first imported, Python searches for the module and if found,it creates a module object[1], initializing it. If the named modulecannot be found, anImportError is raised. Python implements variousstrategies to search for the named module when the import machinery isinvoked. These strategies can be modified and extended by using various hooksdescribed in the sections below.
Changed in version 3.3:The import system has been updated to fully implement the second phaseofPEP 302. There is no longer any implicit import machinery - the fullimport system is exposed throughsys.meta_path. In addition,native namespace package support has been implemented (seePEP 420).
Theimportlib module provides a rich API for interacting with theimport system. For exampleimportlib.import_module() provides arecommended, simpler API than built-in__import__() for invoking theimport machinery. Refer to theimportlib library documentation foradditional detail.
Python has only one type of module object, and all modules are of this type,regardless of whether the module is implemented in Python, C, or somethingelse. To help organize modules and provide a naming hierarchy, Python has aconcept ofpackages.
You can think of packages as the directories on a file system and modules asfiles within directories, but don’t take this analogy too literally sincepackages and modules need not originate from the file system. For thepurposes of this documentation, we’ll use this convenient analogy ofdirectories and files. Like file system directories, packages are organizedhierarchically, and packages may themselves contain subpackages, as well asregular modules.
It’s important to keep in mind that all packages are modules, but not allmodules are packages. Or put another way, packages are just a special kind ofmodule. Specifically, any module that contains a__path__ attribute isconsidered a package.
All modules have a name. Subpackage names are separated from their parentpackage name by dots, akin to Python’s standard attribute access syntax. Thusyou might have a module calledsys and a package calledemail,which in turn has a subpackage calledemail.mime and a module withinthat subpackage calledemail.mime.text.
Python defines two types of packages,regular packages andnamespace packages. Regularpackages are traditional packages as they existed in Python 3.2 and earlier.A regular package is typically implemented as a directory containing an__init__.py file. When a regular package is imported, this__init__.py file is implicitly executed, and the objects it defines arebound to names in the package’s namespace. The__init__.py file cancontain the same Python code that any other module can contain, and Pythonwill add some additional attributes to the module when it is imported.
For example, the following file system layout defines a top levelparentpackage with three subpackages:
parent/__init__.pyone/__init__.pytwo/__init__.pythree/__init__.py
Importingparent.one will implicitly executeparent/__init__.py andparent/one/__init__.py. Subsequent imports ofparent.two orparent.three will executeparent/two/__init__.py andparent/three/__init__.py respectively.
A namespace package is a composite of variousportions,where each portion contributes a subpackage to the parent package. Portionsmay reside in different locations on the file system. Portions may also befound in zip files, on the network, or anywhere else that Python searchesduring import. Namespace packages may or may not correspond directly toobjects on the file system; they may be virtual modules that have no concreterepresentation.
Namespace packages do not use an ordinary list for their__path__attribute. They instead use a custom iterable type which will automaticallyperform a new search for package portions on the next import attempt withinthat package if the path of their parent package (orsys.path for atop level package) changes.
With namespace packages, there is noparent/__init__.py file. In fact,there may be multipleparent directories found during import search, whereeach one is provided by a different portion. Thusparent/one may not bephysically located next toparent/two. In this case, Python will create anamespace package for the top-levelparent package whenever it or one ofits subpackages is imported.
See alsoPEP 420 for the namespace package specification.
To begin the search, Python needs thefully qualifiedname of the module (or package, but for the purposes of this discussion, thedifference is immaterial) being imported. This name may come from variousarguments to theimport statement, or from the parameters to theimportlib.import_module() or__import__() functions.
This name will be used in various phases of the import search, and it may bethe dotted path to a submodule, e.g.foo.bar.baz. In this case, Pythonfirst tries to importfoo, thenfoo.bar, and finallyfoo.bar.baz.If any of the intermediate imports fail, anImportError is raised.
The first place checked during import search issys.modules. Thismapping serves as a cache of all modules that have been previously imported,including the intermediate paths. So iffoo.bar.baz was previouslyimported,sys.modules will contain entries forfoo,foo.bar,andfoo.bar.baz. Each key will have as its value the corresponding moduleobject.
During import, the module name is looked up insys.modules and ifpresent, the associated value is the module satisfying the import, and theprocess completes. However, if the value isNone, then anImportError is raised. If the module name is missing, Python willcontinue searching for the module.
sys.modules is writable. Deleting a key may not destroy theassociated module (as other modules may hold references to it),but it will invalidate the cache entry for the named module, causingPython to search anew for the named module upon its nextimport. The key can also be assigned toNone, forcing the next importof the module to result in anImportError.
Beware though, as if you keep a reference to the module object,invalidate its cache entry insys.modules, and then re-import thenamed module, the two module objects willnot be the same. By contrast,imp.reload() will reuse thesame module object, and simplyreinitialise the module contents by rerunning the module’s code.
If the named module is not found insys.modules, then Python’s importprotocol is invoked to find and load the module. This protocol consists oftwo conceptual objects,finders andloaders.A finder’s job is to determine whether it can find the named module usingwhatever strategy it knows about. Objects that implement both of theseinterfaces are referred to asimporters - they returnthemselves when they find that they can load the requested module.
Python includes a number of default finders and importers. The first oneknows how to locate built-in modules, and the second knows how to locatefrozen modules. A third default finder searches animport pathfor modules. Theimport path is a list of locations that mayname file system paths or zip files. It can also be extended to searchfor any locatable resource, such as those identified by URLs.
The import machinery is extensible, so new finders can be added to extend therange and scope of module searching.
Finders do not actually load modules. If they can find the named module, theyreturn aloader, which the import machinery then invokes to load themodule and create the corresponding module object.
The following sections describe the protocol for finders and loaders in moredetail, including how you can create and register new ones to extend theimport machinery.
The import machinery is designed to be extensible; the primary mechanism forthis are theimport hooks. There are two types of import hooks:metahooks andimport path hooks.
Meta hooks are called at the start of import processing, before any otherimport processing has occurred, other thansys.modules cache look up.This allows meta hooks to overridesys.path processing, frozenmodules, or even built-in modules. Meta hooks are registered by adding newfinder objects tosys.meta_path, as described below.
Import path hooks are called as part ofsys.path (orpackage.__path__) processing, at the point where their associated pathitem is encountered. Import path hooks are registered by adding new callablestosys.path_hooks as described below.
When the named module is not found insys.modules, Python nextsearchessys.meta_path, which contains a list of meta path finderobjects. These finders are queried in order to see if they know how to handlethe named module. Meta path finders must implement a method calledfind_module() which takes two arguments, a name and an import path.The meta path finder can use any strategy it wants to determine whether it canhandle the named module or not.
If the meta path finder knows how to handle the named module, it returns aloader object. If it cannot handle the named module, it returnsNone. Ifsys.meta_path processing reaches the end of its list without returninga loader, then anImportError is raised. Any other exceptions raisedare simply propagated up, aborting the import process.
Thefind_module() method of meta path finders is called with twoarguments. The first is the fully qualified name of the module beingimported, for examplefoo.bar.baz. The second argument is the pathentries to use for the module search. For top-level modules, the secondargument isNone, but for submodules or subpackages, the secondargument is the value of the parent package’s__path__ attribute. Ifthe appropriate__path__ attribute cannot be accessed, anImportError is raised.
The meta path may be traversed multiple times for a single import request.For example, assuming none of the modules involved has already been cached,importingfoo.bar.baz will first perform a top level import, callingmpf.find_module("foo",None) on each meta path finder (mpf). Afterfoo has been imported,foo.bar will be imported by traversing themeta path a second time, callingmpf.find_module("foo.bar",foo.__path__). Oncefoo.bar has beenimported, the final traversal will callmpf.find_module("foo.bar.baz",foo.bar.__path__).
Some meta path finders only support top level imports. These importers willalways returnNone when anything other thanNone is passed as thesecond argument.
Python’s defaultsys.meta_path has three meta path finders, one thatknows how to import built-in modules, one that knows how to import frozenmodules, and one that knows how to import modules from animport path(i.e. thepath based finder).
If and when a module loader is found itsload_module() method is called, with a singleargument, the fully qualified name of the module being imported. This methodhas several responsibilities, and should return the module object it hasloaded[2]. If it cannot load the module, it should raise anImportError, although any other exception raised duringload_module() will be propagated.
In many cases, the finder and loader can be the same object; in such cases thefinder.find_module() would just returnself.
Loaders must satisfy the following requirements:
If there is an existing module object with the given name insys.modules, the loader must use that existing module. (Otherwise,imp.reload() will not work correctly.) If the named module doesnot exist insys.modules, the loader must create a new moduleobject and add it tosys.modules.
Note that the modulemust exist insys.modules before the loaderexecutes the module code. This is crucial because the module code may(directly or indirectly) import itself; adding it tosys.modulesbeforehand prevents unbounded recursion in the worst case and multipleloading in the best.
If loading fails, the loader must remove any modules it has inserted intosys.modules, but it must removeonly the failing module, andonly if the loader itself has loaded it explicitly. Any module already inthesys.modules cache, and any module that was successfully loadedas a side-effect, must remain in the cache.
The loader may set the__file__ attribute of the module. If set, thisattribute’s value must be a string. The loader may opt to leave__file__ unset if it has no semantic meaning (e.g. a module loaded froma database). If__file__ is set, it may also be appropriate to set the__cached__ attribute which is the path to any compiled version of thecode (e.g. byte-compiled file). The file does not need to exist to set thisattribute; the path can simply point to whether the compiled file wouldexist (seePEP 3147).
The loader may set the__name__ attribute of the module. While notrequired, setting this attribute is highly recommended so that therepr() of the module is more informative.
If the module is a package (either regular or namespace), the loader mustset the module object’s__path__ attribute. The value must beiterable, but may be empty if__path__ has no further significanceto the loader. If__path__ is not empty, it must produce stringswhen iterated over. More details on the semantics of__path__ aregivenbelow.
The__loader__ attribute must be set to the loader object that loadedthe module. This is mostly for introspection and reloading, but can beused for additional loader-specific functionality, for example gettingdata associated with a loader.
The module’s__package__ attribute should be set. Its value must be astring, but it can be the same value as its__name__. If the attributeis set toNone or is missing, the import system will fill it in with amore appropriate value. When the module is a package, its__package__value should be set to its__name__. When the module is not a package,__package__ should be set to the empty string for top-level modules, orfor submodules, to the parent package’s name. SeePEP 366 for furtherdetails.
This attribute is used instead of__name__ to calculate explicitrelative imports for main modules, as defined inPEP 366.
If the module is a Python module (as opposed to a built-in module or adynamically loaded extension), the loader should execute the module’s codein the module’s global name space (module.__dict__).
By default, all modules have a usable repr, however depending on theattributes set above, and hooks in the loader, you can more explicitly controlthe repr of module objects.
Loaders may implement amodule_repr() method which takes a singleargument, the module object. Whenrepr(module) is called for a modulewith a loader supporting this protocol, whatever is returned frommodule.__loader__.module_repr(module) is returned as the module’s reprwithout further processing. This return value must be a string.
If the module has no__loader__ attribute, or the loader has nomodule_repr() method, then the module object implementation itselfwill craft a default repr using whatever information is available. It willtry to use themodule.__name__,module.__file__, andmodule.__loader__ as input into the repr, with defaults for whateverinformation is missing.
Here are the exact rules used:
- If the module has a__loader__ and that loader has amodule_repr() method, call it with a single argument, which is themodule object. The value returned is used as the module’s repr.
- If an exception occurs inmodule_repr(), the exception is caughtand discarded, and the calculation of the module’s repr continues as ifmodule_repr() did not exist.
- If the module has a__file__ attribute, this is used as part of themodule’s repr.
- If the module has no__file__ but does have a__loader__, then theloader’s repr is used as part of the module’s repr.
- Otherwise, just use the module’s__name__ in the repr.
This example, fromPEP 420 shows how a loader can craft its own modulerepr:
classNamespaceLoader:@classmethoddefmodule_repr(cls,module):return"<module '{}' (namespace)>".format(module.__name__)
By definition, if a module has an__path__ attribute, it is a package,regardless of its value.
A package’s__path__ attribute is used during imports of its subpackages.Within the import machinery, it functions much the same assys.path,i.e. providing a list of locations to search for modules during import.However,__path__ is typically much more constrained thansys.path.
__path__ must be an iterable of strings, but it may be empty.The same rules used forsys.path also apply to a package’s__path__, andsys.path_hooks (described below) areconsulted when traversing a package’s__path__.
A package’s__init__.py file may set or alter the package’s__path__attribute, and this was typically the way namespace packages were implementedprior toPEP 420. With the adoption ofPEP 420, namespace packages nolonger need to supply__init__.py files containing only__path__manipulation code; the namespace loader automatically sets__path__correctly for the namespace package.
As mentioned previously, Python comes with several default meta path finders.One of these, called thepath based finder, searches animportpath, which contains a list ofpath entries. Each pathentry names a location to search for modules.
The path based finder itself doesn’t know how to import anything. Instead, ittraverses the individual path entries, associating each of them with apath entry finder that knows how to handle that particular kind of path.
The default set of path entry finders implement all the semantics for findingmodules on the file system, handling special file types such as Python sourcecode (.py files), Python byte code (.pyc and.pyo files) andshared libraries (e.g..so files). When supported by thezipimportmodule in the standard library, the default path entry finders also handleloading all of these file types (other than shared libraries) from zipfiles.
Path entries need not be limited to file system locations. They can refer toURLs, database queries, or any other location that can be specified as astring.
The path based finder provides additional hooks and protocols so that youcan extend and customize the types of searchable path entries. For example,if you wanted to support path entries as network URLs, you could write a hookthat implements HTTP semantics to find modules on the web. This hook (acallable) would return apath entry finder supporting the protocoldescribed below, which was then used to get a loader for the module from theweb.
A word of warning: this section and the previous both use the termfinder,distinguishing between them by using the termsmeta path finder andpath entry finder. These two types of finders are very similar,support similar protocols, and function in similar ways during the importprocess, but it’s important to keep in mind that they are subtly different.In particular, meta path finders operate at the beginning of the importprocess, as keyed off thesys.meta_path traversal.
By contrast, path entry finders are in a sense an implementation detailof the path based finder, and in fact, if the path based finder were to beremoved fromsys.meta_path, none of the path entry finder semanticswould be invoked.
Thepath based finder is responsible for finding and loading Pythonmodules and packages whose location is specified with a stringpathentry. Most path entries name locations in the file system, but they neednot be limited to this.
As a meta path finder, thepath based finder implements thefind_module() protocol previously described, however it exposesadditional hooks that can be used to customize how modules are found andloaded from theimport path.
Three variables are used by thepath based finder,sys.path,sys.path_hooks andsys.path_importer_cache. The__path__attributes on package objects are also used. These provide additional waysthat the import machinery can be customized.
sys.path contains a list of strings providing search locations formodules and packages. It is initialized from thePYTHONPATHenvironment variable and various other installation- andimplementation-specific defaults. Entries insys.path can namedirectories on the file system, zip files, and potentially other “locations”(see thesite module) that should be searched for modules, such asURLs, or database queries. Only strings and bytes should be present onsys.path; all other data types are ignored. The encoding of bytesentries is determined by the individualpath entry finders.
Thepath based finder is ameta path finder, so the importmachinery begins theimport path search by calling the pathbased finder’sfind_module() method as described previously. Whenthepath argument tofind_module() is given, it will be alist of string paths to traverse - typically a package’s__path__attribute for an import within that package. If thepath argumentisNone, this indicates a top level import andsys.path is used.
The path based finder iterates over every entry in the search path, andfor each of these, looks for an appropriatepath entry finder for thepath entry. Because this can be an expensive operation (e.g. there may bestat() call overheads for this search), the path based finder maintainsa cache mapping path entries to path entry finders. This cache is maintainedinsys.path_importer_cache (despite the name, this cache actuallystores finder objects rather than being limited toimporter objects).In this way, the expensive search for a particularpath entrylocation’spath entry finder need only be done once. User code isfree to remove cache entries fromsys.path_importer_cache forcingthe path based finder to perform the path entry search again[3].
If the path entry is not present in the cache, the path based finder iteratesover every callable insys.path_hooks. Each of thepath entryhooks in this list is called with a single argument, thepath entry to be searched. This callable may either return apathentry finder that can handle the path entry, or it may raiseImportError. AnImportError is used by the path based finder tosignal that the hook cannot find apath entry finder for thatpath entry. The exception is ignored andimport pathiteration continues. The hook should expect either a string or bytes object;the encoding of bytes objects is up to the hook (e.g. it may be a file systemencoding, UTF-8, or something else), and if the hook cannot decode theargument, it should raiseImportError.
Ifsys.path_hooks iteration ends with nopath entry finderbeing returned, then the path based finder’sfind_module() methodwill storeNone insys.path_importer_cache (to indicate thatthere is no finder for this path entry) and returnNone, indicating thatthismeta path finder could not find the module.
If apath entry finderis returned by one of thepath entryhook callables onsys.path_hooks, then the following protocol is usedto ask the finder for a module loader, which is then used to load the module.
In order to support imports of modules and initialized packages and also tocontribute portions to namespace packages, path entry finders must implementthefind_loader() method.
find_loader() takes one argument, the fully qualified name of themodule being imported.find_loader() returns a 2-tuple where thefirst item is the loader and the second item is a namespaceportion.When the first item (i.e. the loader) isNone, this means that while thepath entry finder does not have a loader for the named module, it knows that thepath entry contributes to a namespace portion for the named module. This willalmost always be the case where Python is asked to import a namespace packagethat has no physical presence on the file system. When a path entry finderreturnsNone for the loader, the second item of the 2-tuple return valuemust be a sequence, although it can be empty.
Iffind_loader() returns a non-None loader value, the portion isignored and the loader is returned from the path based finder, terminatingthe search through the path entries.
For backwards compatibility with other implementations of the importprotocol, many path entry finders also support the same,traditionalfind_module() method that meta path finders support.However path entry finderfind_module() methods are never calledwith apath argument (they are expected to record the appropriatepath information from the initial call to the path hook).
Thefind_module() method on path entry finders is deprecated,as it does not allow the path entry finder to contribute portions tonamespace packages. Instead path entry finders should implement thefind_loader() method as described above. If it exists on the pathentry finder, the import system will always callfind_loader()in preference tofind_module().
The most reliable mechanism for replacing the entire import system is todelete the default contents ofsys.meta_path, replacing thementirely with a custom meta path hook.
If it is acceptable to only alter the behaviour of import statementswithout affecting other APIs that access the import system, then replacingthe builtin__import__() function may be sufficient. This techniquemay also be employed at the module level to only alter the behaviour ofimport statements within that module.
To selectively prevent import of some modules from a hook early on themeta path (rather than disabling the standard import system entirely),it is sufficient to raiseImportError directly fromfind_module() instead of returningNone. The latter indicatesthat the meta path search should continue. while raising an exceptionterminates it immediately.
XXX It would be really nice to have a diagram.
XXX * (import_machinery.rst) how about a section devoted just to theattributes of modules and packages, perhaps expanding upon or supplanting therelated entries in the data model reference page?
XXX runpy, pkgutil, et al in the library manual should all get “See Also”links at the top pointing to the new import system section.
The import machinery has evolved considerably since Python’s early days. Theoriginalspecification for packages is still available to read,although some details have changed since the writing of that document.
The original specification forsys.meta_path wasPEP 302, withsubsequent extension inPEP 420.
PEP 420 introducednamespace packages forPython 3.3.PEP 420 also introduced thefind_loader() protocol as analternative tofind_module().
PEP 366 describes the addition of the__package__ attribute forexplicit relative imports in main modules.
PEP 328 introduced absolute and explicit relative imports and initiallyproposed__name__ for semanticsPEP 366 would eventually specify for__package__.
PEP 338 defines executing modules as scripts.
Footnotes
| [1] | Seetypes.ModuleType. |
| [2] | The importlib implementation avoids using the return valuedirectly. Instead, it gets the module object by looking the module name upinsys.modules. The indirect effect of this is that an importedmodule may replace itself insys.modules. This isimplementation-specific behavior that is not guaranteed to work in otherPython implementations. |
| [3] | In legacy code, it is possible to find instances ofimp.NullImporter in thesys.path_importer_cache. Itis recommended that code be changed to useNone instead. SeePorting Python code for more details. |
Enter search terms or a module, class or function name.