Python Enhancement Proposals

Python »
PEP Index »
PEP 338

PEP 338 – Executing modules as scripts

Author:: Alyssa Coghlan <ncoghlan at gmail.com>
Status:

Abstract

This PEP defines semantics for executing any Python module as ascript, either with the-m command line switch, or by invokingit viarunpy.run_module(modulename).

The-m switch implemented in Python 2.4 is quite limited. ThisPEP proposes making use of thePEP 302 import hooks to allow anymodule which provides access to its code object to be executed.

Python 2.4 adds the command line switch-m to allow modules to belocated using the Python module namespace for execution as scripts.The motivating examples were standard library modules such aspdbandprofile, and the Python 2.4 implementation is fine for thislimited purpose.

A number of users and developers have requested extension of thefeature to also support running modules located inside packages. Oneexample provided is pychecker’spychecker.checker module. Thiscapability was left out of the Python 2.4 implementation because theimplementation of this was significantly more complicated, and the mostappropriate strategy was not at all clear.

The opinion on python-dev was that it was better to postpone theextension to Python 2.5, and go through the PEP process to help makesure we got it right.

Since that time, it has also been pointed out that the current versionof-m does not supportzipimport or any other kind ofalternative import behaviour (such as frozen modules).

Providing this functionality as a Python module is significantly easierthan writing it in C, and makes the functionality readily available toall Python programs, rather than being specific to the CPythoninterpreter. CPython’s command line switch can then be rewritten tomake use of the new module.

Scripts which execute other scripts (e.g.profile,pdb) alsohave the option to use the new module to provide-m style supportfor identifying the script to be executed.

Scope of this proposal

In Python 2.4, a module located using-m is executed just as ifits filename had been provided on the command line. The goal of thisPEP is to get as close as possible to making that statement also holdtrue for modules inside packages, or accessed via alternative importmechanisms (such aszipimport).

Prior discussions suggest it should be noted that this PEP isnotabout changing the idiom for making Python modules also useful asscripts (seePEP 299). That issue is considered orthogonal to thespecific feature addressed by this PEP.

Current Behaviour

Before describing the new semantics, it’s worth covering the existingsemantics for Python 2.4 (as they are currently defined only by thesource code and the command line help).

When-m is used on the command line, it immediately terminates theoption list (like-c). The argument is interpreted as the name ofa top-level Python module (i.e. one which can be found onsys.path).

If the module is found, and is of typePY_SOURCE orPY_COMPILED, then the command line is effectively reinterpretedfrompython<options>-m<module><args> topython<options><filename><args>. This includes settingsys.argv[0] correctly(some scripts rely on this - Python’s ownregrtest.py is oneexample).

If the module is not found, or is not of the correct type, an erroris printed.

Proposed Semantics

The semantics proposed are fairly simple: if-m is used to executea module thePEP 302 import mechanisms are used to locate the module andretrieve its compiled code, before executing the module in accordancewith the semantics for a top-level module. The interpreter does this byinvoking a new standard library functionrunpy.run_module.

This is necessary due to the way Python’s import machinery locatesmodules inside packages. A package may modify its own __path__variable during initialisation. In addition, paths may be affected by*.pth files, and some packages will install custom loaders onsys.metapath. Accordingly, the only way for Python to reliablylocate the module is by importing the containing package andusing thePEP 302 import hooks to gain access to the Python code.

Note that the process of locating the module to be executed may requireimporting the containing package. The effects of such a package importthat will be visible to the executed module are:

the containing package will be in sys.modules
any external effects of the package initialisation (e.g. installedimport hooks, loggers, atexit handlers, etc.)

Reference Implementation

A reference implementation is available on SourceForge ([2]), alongwith documentation for the library reference ([5]). There aretwo parts to this implementation. The first is a proposed standardlibrary modulerunpy. The second is a modification to the codeimplementing the-m switch to always delegate torunpy.run_module instead of trying to run the module directly.The delegation has the form:

runpy.run_module(sys.argv[0],run_name="__main__",alter_sys=True)

run_module is the only functionrunpy exposes in its public API.

run_module(mod_name[,init_globals][,run_name][,alter_sys])

Execute the code of the specified module and return the resultingmodule globals dictionary. The module’s code is first located usingthe standard import mechanism (refer toPEP 302 for details) andthen executed in a fresh module namespace.
The optional dictionary argumentinit_globals may be used topre-populate the globals dictionary before the code is executed.The supplied dictionary will not be modified. If any of the specialglobal variables below are defined in the supplied dictionary, thosedefinitions are overridden by the run_module function.
The special global variables__name__,__file__,__loader__ and__builtins__ are set in the globals dictionarybefore the module code is executed.
__name__ is set torun_name if this optional argument issupplied, and the originalmod_name argument otherwise.
__loader__ is set to thePEP 302 module loader used to retrievethe code for the module (This loader may be a wrapper around thestandard import mechanism).
__file__ is set to the name provided by the module loader. Ifthe loader does not make filename information available, thisargument is set toNone.
__builtins__ is automatically initialised with a reference tothe top level namespace of the__builtin__ module.
If the argumentalter_sys is supplied and evaluates toTrue,thensys.argv[0] is updated with the value of__file__andsys.modules[__name__] is updated with a temporary moduleobject for the module being executed. Bothsys.argv[0] andsys.modules[__name__] are restored to their original valuesbefore this function returns.

When invoked as a script, therunpy module finds and executes themodule supplied as the first argument. It adjustssys.argv bydeletingsys.argv[0] (which refers to therunpy module itself)and then invokesrun_module(sys.argv[0],run_name="__main__",alter_sys=True).

Import Statements and the Main Module

The release of 2.5b1 showed a surprising (although obvious inretrospect) interaction between this PEP andPEP 328 - explicitrelative imports don’t work from a main module. This is due tothe fact that relative imports rely on__name__ to determinethe current module’s position in the package hierarchy. In a mainmodule, the value of__name__ is always'__main__', soexplicit relative imports will always fail (as they only work fora module inside a package).

Investigation into why implicit relative importsappear to work whena main module is executed directly but fail when executed using -mshowed that such imports are actually always treated as absoluteimports. Because of the way direct execution works, the packagecontaining the executed module is added to sys.path, so its siblingmodules are actually imported as top level modules. This can easilylead to multiple copies of the sibling modules in the application ifimplicit relative imports are used in modules that may be directlyexecuted (e.g. test modules or utility scripts).

For the 2.5 release, the recommendation is to always use absoluteimports in any module that is intended to be used as a main module.The -m switch provides a benefit here, as it inserts the currentdirectory into sys.path, instead of the directory contain the mainmodule. This means that it is possible to run a module from inside apackage using -m so long as the current directory contains the toplevel directory for the package. Absolute imports will work correctlyeven if the package isn’t installed anywhere else on sys.path. If themodule is executed directly and uses absolute imports to retrieve itssibling modules, then the top level package directory needs to beinstalled somewhere on sys.path (since the current directory won’t beadded automatically).

Here’s an example file layout:

devel/pkg/__init__.pymoduleA.pymoduleB.pytest/__init__.pytest_A.pytest_B.py

So long as the current directory isdevel, ordevel is alreadyonsys.path and the test modules use absolute imports (such asimportpkgmoduleA to retrieve the module under test,PEP 338allows the tests to be run as:

python-mpkg.test.test_Apython-mpkg.test.test_B

The question of whether or not relative imports should be supportedwhen a main module is executed with -m is something that will berevisited for Python 2.6. Permitting it would require changes toeither Python’s import semantics or the semantics used to indicatewhen a module is the main module, so it is not a decision to be madehastily.

Resolved Issues

There were some key design decisions that influenced the development oftherunpy module. These are listed below.

The special variables__name__,__file__ and__loader__are set in a module’s global namespace before the module is executed.Asrun_module alters these values, it doesnot mutate thesupplied dictionary. If it did, then passingglobals() to thisfunction could have nasty side effects.
Sometimes, the information needed to populate the special variablessimply isn’t available. Rather than trying to be too clever, thesevariables are simply set toNone when the relevant informationcannot be determined.
There is no special protection on the alter_sys argument.This may result insys.argv[0] being set toNone if filename information is not available.
The import lock is NOT used to avoid potential threading issues thatarise when alter_sys is set to True. Instead, it is recommended thatthreaded code simply avoid using this flag.

Alternatives

The first alternative implementation considered ignored packages’__path__ variables, and looked only in the main package directory. APython script with this behaviour can be found in the discussion oftheexecmodule cookbook recipe[3].

Theexecmodule cookbook recipe itself was the proposed mechanism inan earlier version of this PEP (before the PEP’s author readPEP 302).

Both approaches were rejected as they do not meet the main goal of the-m switch – to allow the full Python namespace to be used tolocate modules for execution from the command line.

An earlier version of this PEP included some mistaken assumptionsabout the wayexec handled locals dictionaries and code fromfunction objects. These mistaken assumptions led to some unneededdesign complexity which has now been removed -run_code shares allof the quirks ofexec.

Earlier versions of the PEP also exposed a broader API that just thesinglerun_module() function needed to implement the updates tothe-m switch. In the interests of simplicity, those extra functionshave been dropped from the proposed API.

After the original implementation in SVN, it became clear that holdingthe import lock when executing the initial application script was notcorrect (e.g.python-mtest.regrtesttest_threadedimport failed).So therun_module function only holds the import lock during theactual search for the module, and releases it before execution, even ifalter_sys is set.