Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 489 – Multi-phase extension module initialization

Author:
Petr Viktorin <encukou at gmail.com>,Stefan Behnel <stefan_ml at behnel.de>,Alyssa Coghlan <ncoghlan at gmail.com>
BDFL-Delegate:
Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To:
Import-SIG list
Status:
Final
Type:
Standards Track
Created:
11-Aug-2013
Python-Version:
3.5
Post-History:
23-Aug-2013, 20-Feb-2015, 16-Apr-2015, 07-May-2015, 18-May-2015
Resolution:
Python-Dev message

Table of Contents

Important

This PEP is a historical document. The up-to-date, canonical documentation can now be found atInitializing C modules.For Python 3.14+, seeDefining extension modulesandModule definitions.

×

SeePEP 1 for how to propose changes.

Abstract

This PEP proposes a redesign of the way in which built-in and extension modulesinteract with the import machinery. This was last revised for Python 3.0 in PEP3121, but did not solve all problems at the time. The goal is to solveimport-related problems by bringing extension modules closer to the way Pythonmodules behave; specifically to hook into the ModuleSpec-based loadingmechanism introduced inPEP 451.

This proposal draws inspiration fromPyType_Spec ofPEP 384 to allow extensionauthors to only define features they need, and to allow future additionsto extension module declarations.

Extensions modules are created in a two-step process, fitting better intothe ModuleSpec architecture, with parallels to__new__ and__init__ of classes.

Extension modules can safely store arbitrary C-level per-module state inthe module that is covered by normal garbage collection and supportsreloading and sub-interpreters.Extension authors are encouraged to take these issues into accountwhen using the new API.

The proposal also allows extension modules with non-ASCII names.

Not all problems tackled inPEP 3121 are solved in this proposal.In particular, problems with run-time module lookup (PyState_FindModule)are left to a future PEP.

Motivation

Python modules and extension modules are not being set up in the same way.For Python modules, the module object is created and set up first, then themodule code is being executed (PEP 302).A ModuleSpec object (PEP 451) is used to hold information about the module,and passed to the relevant hooks.

For extensions (i.e. shared libraries) and built-in modules, the moduleinit function is executed straight away and does both the creation andinitialization. The initialization function is not passed the ModuleSpec,or any information it contains, such as the__file__ or fully-qualifiedname. This hinders relative imports and resource loading.

In Py3, modules are also not being added to sys.modules, which means that a(potentially transitive) re-import of the module will really try to re-importit and thus run into an infinite loop when it executes the module init functionagain. Without access to the fully-qualified module name, it is not trivial tocorrectly add the module to sys.modules either.This is specifically a problem for Cython generated modules, for which it’snot uncommon that the module init code has the same level of complexity asthat of any ‘regular’ Python module. Also, the lack of__file__ and__name__information hinders the compilation of “__init__.py” modules, i.e. packages,especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules hasproblems with sub-interpreter support and/or interpreter reloading, and, whileit is possible with the current infrastructure to support thesefeatures, it is neither easy nor efficient.Addressing these issues was the goal ofPEP 3121, but many extensions,including some in the standard library, took the least-effort approachto porting to Python 3, leaving these issues unresolved.This PEP keeps backwards compatibility, which should reduce pressure and giveextension authors adequate time to consider these issues when porting.

The current process

Currently, extension and built-in modules export an initialization functionnamed “PyInit_modulename”, named after the file name of the shared library.This function is executed by the import machinery and must return a fullyinitialized module object.The function receives no arguments, so it has no way of knowing about itsimport context.

During its execution, the module init function creates a module objectbased on aPyModuleDef object. It then continues to initialize it by addingattributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualifiedmodule name of the last module that it loaded, and when a module getscreated that has a matching name, this global variable is used to determinethe fully qualified name of the module object. This is not entirely safe as itrelies on the module init function creating its own module object first,but this assumption usually holds in practice.

The proposal

The initialization function (PyInit_modulename) will be allowed to returna pointer to aPyModuleDef object. The import machinery will be in chargeof constructing the module object, calling hooks provided in thePyModuleDefin the relevant phases of initialization (as described below).

This multi-phase initialization is an additional possibility. Single-phaseinitialization, the current practice of returning a fully initialized moduleobject, will still be accepted, so existing code will work unchanged,including binary compatibility.

ThePyModuleDef structure will be changed to contain a list of slots,similarly toPEP 384’sPyType_Spec for types.To keep binary compatibility, and avoid needing to introduce a new structure(which would introduce additional supporting functions and per-module storage),the currently unusedm_reload pointer ofPyModuleDef will be changed tohold the slots. The structures are defined as:

typedefstruct{intslot;void*value;}PyModuleDef_Slot;typedefstructPyModuleDef{PyModuleDef_Basem_base;constchar*m_name;constchar*m_doc;Py_ssize_tm_size;PyMethodDef*m_methods;PyModuleDef_Slot*m_slots;/* changed from `inquiry m_reload;` */traverseprocm_traverse;inquirym_clear;freefuncm_free;}PyModuleDef;

Them_slots member must be either NULL, or point to an array ofPyModuleDef_Slot structures, terminated by a slot with id set to 0(i.e.{0,NULL}).

To specify a slot, a unique slot ID must be provided.New Python versions may introduce new slot IDs, but slot IDs will never berecycled. Slots may get deprecated, but will continue to be supportedthroughout Python 3.x.

A slot’s value pointer may not be NULL, unless specified otherwise in theslot’s documentation.

The following slots are currently available, and described later:

  • Py_mod_create
  • Py_mod_exec

Unknown slot IDs will cause the import to fail with SystemError.

When using multi-phase initialization, them_name field ofPyModuleDef willnot be used during importing; the module name will be taken from the ModuleSpec.

Before it is returned from PyInit_*, thePyModuleDef object must be initializedusing the newly addedPyModuleDef_Init function. This sets the object type(which cannot be done statically on certain compilers), refcount, and internalbookkeeping data (m_index).For example, an extension module “example” would be exported as:

staticPyModuleDefexample_def={...}PyMODINIT_FUNCPyInit_example(void){returnPyModuleDef_Init(&example_def);}

ThePyModuleDef object must be available for the lifetime of the module createdfrom it – usually, it will be declared statically.

Pseudo-code Overview

Here is an overview of how the modified importers will operate.Details such as logging or handling of errors and invalid statesare left out, and C code is presented with a concise Python-like syntax.

The framework that calls the importers is explained inPEP 451.

importlib/_bootstrap.py:

classBuiltinImporter:defcreate_module(self,spec):module=_imp.create_builtin(spec)defexec_module(self,module):_imp.exec_dynamic(module)defload_module(self,name):# use a backwards compatibility shim_load_module_shim(self,name)

importlib/_bootstrap_external.py:

classExtensionFileLoader:defcreate_module(self,spec):module=_imp.create_dynamic(spec)defexec_module(self,module):_imp.exec_dynamic(module)defload_module(self,name):# use a backwards compatibility shim_load_module_shim(self,name)

Python/import.c (the_imp module):

defcreate_dynamic(spec):name=spec.namepath=spec.origin# Find an already loaded module that used single-phase init.# For multi-phase initialization, mod is NULL, so a new module# is always created.mod=_PyImport_FindExtensionObject(name,name)ifmod:returnmodreturn_PyImport_LoadDynamicModuleWithSpec(spec)defexec_dynamic(module):ifnotisinstance(module,types.ModuleType):# non-modules are skipped -- PyModule_GetDef fails on themreturndef=PyModule_GetDef(module)state=PyModule_GetState(module)ifstateisNULL:PyModule_ExecDef(module,def)defcreate_builtin(spec):name=spec.name# Find an already loaded module that used single-phase init.# For multi-phase initialization, mod is NULL, so a new module# is always created.mod=_PyImport_FindExtensionObject(name,name)ifmod:returnmodforinitname,initfuncinPyImport_Inittab:ifname==initname:m=initfunc()ifisinstance(m,PyModuleDef):def=mreturnPyModule_FromDefAndSpec(def,spec)else:# fall back to single-phase initializationmodule=m_PyImport_FixupExtensionObject(module,name,name)returnmodule

Python/importdl.c:

def_PyImport_LoadDynamicModuleWithSpec(spec):path=spec.originpackage,dot,name=spec.name.rpartition('.')# see the "Non-ASCII module names" section for export_hook_namehook_name=export_hook_name(name)# call platform-specific function for loading exported function# from shared libraryexportfunc=_find_shared_funcptr(hook_name,path)m=exportfunc()ifisinstance(m,PyModuleDef):def=mreturnPyModule_FromDefAndSpec(def,spec)module=m# fall back to single-phase initialization....

Objects/moduleobject.c:

defPyModule_FromDefAndSpec(def,spec):name=spec.namecreate=Noneforslot,valueindef.m_slots:ifslot==Py_mod_create:create=valueifcreate:m=create(spec,def)else:m=PyModule_New(name)ifisinstance(m,types.ModuleType):m.md_state=Nonem.md_def=defifdef.m_methods:PyModule_AddFunctions(m,def.m_methods)ifdef.m_doc:PyModule_SetDocString(m,def.m_doc)defPyModule_ExecDef(module,def):ifisinstance(module,types.module_type):ifmodule.md_stateisNULL:# allocate a block of zeroed-out memorymodule.md_state=_alloc(module.md_size)ifdef.m_slotsisNULL:returnforslot,valueindef.m_slots:ifslot==Py_mod_exec:value(module)

Module Creation Phase

Creation of the module object – that is, the implementation ofExecutionLoader.create_module – is governed by thePy_mod_create slot.

The Py_mod_create slot

ThePy_mod_create slot is used to support custom module subclasses.The value pointer must point to a function with the following signature:

PyObject*(*PyModuleCreateFunction)(PyObject*spec,PyModuleDef*def)

The function receives a ModuleSpec instance, as defined inPEP 451,and thePyModuleDef structure.It should return a new module object, or set an errorand return NULL.

This function is not responsible for setting import-related attributesspecified inPEP 451 (such as__name__ or__loader__) on the new module.

There is no requirement for the returned object to be an instance oftypes.ModuleType. Any type can be used, as long as it supports setting andgetting attributes, including at least the import-related attributes.However, onlyModuleType instances support module-specific functionalitysuch as per-module state and processing of execution slots.If something other than aModuleType subclass is returned, no execution slotsmay be defined; if any are, aSystemError is raised.

Note that when this function is called, the module’s entry insys.modulesis not populated yet. Attempting to import the same module again(possibly transitively), may lead to an infinite loop.Extension authors are advised to keepPy_mod_create minimal, an in particularto not call user code from it.

MultiplePy_mod_create slots may not be specified. If they are, importwill fail withSystemError.

IfPy_mod_create is not specified, the import machinery will create a normalmodule object usingPyModule_New. The name is taken fromspec.

Post-creation steps

If thePy_mod_create function returns an instance oftypes.ModuleTypeor a subclass (or if aPy_mod_create slot is not present), the importmachinery will associate thePyModuleDef with the module.This also makes thePyModuleDef accessible to execution phase, thePyModule_GetDef function, and garbage collection routines (traverse,clear, free).

If thePy_mod_create function does not return a module subclass, thenm_sizemust be 0, andm_traverse,m_clear andm_free must all be NULL.Otherwise,SystemError is raised.

Additionally, initial attributes specified in thePyModuleDef are set on themodule object, regardless of its type:

  • The docstring is set fromm_doc, if non-NULL.
  • The module’s functions are initialized fromm_methods, if any.

Module Execution Phase

Module execution – that is, the implementation ofExecutionLoader.exec_module – is governed by “execution slots”.This PEP only adds one,Py_mod_exec, but others may be added in the future.

The execution phase is done on thePyModuleDef associated with the moduleobject. For objects that are not a subclass ofPyModule_Type (for whichPyModule_GetDef would fail), the execution phase is skipped.

Execution slots may be specified multiple times, and are processed in the orderthey appear in the slots array.When using the default import machinery, they are processed afterimport-related attributes specified inPEP 451(such as__name__ or__loader__) are set and the module is addedtosys.modules.

Pre-Execution steps

Before processing the execution slots, per-module state is allocated for themodule. From this point on, per-module state is accessible throughPyModule_GetState.

The Py_mod_exec slot

The entry in this slot must point to a function with the following signature:

int(*PyModuleExecFunction)(PyObject*module)

It will be called to initialize a module. Usually, this amounts tosetting the module’s initial attributes.The “module” argument receives the module object to initialize.

The function must return0 on success, or, on error, set an exception andreturn-1.

IfPyModuleExec replaces the module’s entry insys.modules, the new objectwill be used and returned by importlib machinery after all execution slotsare processed. This is a feature of the import machinery itself.The slots themselves are all processed using the module returned from thecreation phase;sys.modules is not consulted during the execution phase.(Note that for extension modules, implementingPy_mod_create is usuallya better solution for using custom module objects.)

Legacy Init

The backwards-compatible single-phase initialization continues to be supported.In this scheme, thePyInit function returns a fully initialized module ratherthan aPyModuleDef object.In this case, thePyInit hook implements the creation phase, and the executionphase is a no-op.

Modules that need to work unchanged on older versions of Python should stick tosingle-phase initialization, because the benefits it brings can’t beback-ported.Here is an example of a module that supports multi-phase initialization,and falls back to single-phase when compiled for an older version of CPython.It is included mainly as an illustration of the changes needed to enablemulti-phase init:

#include<Python.h>staticintspam_exec(PyObject*module){PyModule_AddStringConstant(module,"food","spam");return0;}#ifdef Py_mod_execstaticPyModuleDef_Slotspam_slots[]={{Py_mod_exec,spam_exec},{0,NULL}};#endifstaticPyModuleDefspam_def={PyModuleDef_HEAD_INIT,/* m_base */"spam",/* m_name */PyDoc_STR("Utilities for cooking spam"),/* m_doc */0,/* m_size */NULL,/* m_methods */#ifdef Py_mod_execspam_slots,/* m_slots */#elseNULL,#endifNULL,/* m_traverse */NULL,/* m_clear */NULL,/* m_free */};PyMODINIT_FUNCPyInit_spam(void){#ifdef Py_mod_execreturnPyModuleDef_Init(&spam_def);#elsePyObject*module;module=PyModule_Create(&spam_def);if(module==NULL)returnNULL;if(spam_exec(module)!=0){Py_DECREF(module);returnNULL;}returnmodule;#endif}

Built-In modules

Any extension module can be used as a built-in module by linking it intothe executable, and including it in the inittab (either at runtime withPyImport_AppendInittab, or at configuration time, using tools likefreeze).

To keep this possibility, all changes to extension module loading introducedin this PEP will also apply to built-in modules.The only exception is non-ASCII module names, explained below.

Subinterpreters and Interpreter Reloading

Extensions using the new initialization scheme are expected to supportsubinterpreters and multiplePy_Initialize/Py_Finalize cycles correctly,avoiding the issues mentioned in Python documentation[6].The mechanism is designed to make this easy, but care is still requiredon the part of the extension author.No user-defined functions, methods, or instances may leak to differentinterpreters.To achieve this, all module-level state should be kept in either the moduledict, or in the module object’s storage reachable byPyModule_GetState.A simple rule of thumb is: Do not define any static data, except built-in typeswith no mutable or user-settable class attributes.

Functions incompatible with multi-phase initialization

ThePyModule_Create function will fail when used on aPyModuleDef structurewith a non-NULLm_slots pointer.The function doesn’t have access to the ModuleSpec object necessary formulti-phase initialization.

ThePyState_FindModule function will return NULL, andPyState_AddModuleandPyState_RemoveModule will also fail on modules with non-NULLm_slots.PyState registration is disabled because multiple module objects may be createdfrom the samePyModuleDef.

Module state and C-level callbacks

Due to the unavailability ofPyState_FindModule, any function that needs accessto module-level state (including functions, classes or exceptions defined atthe module level) must receive a reference to the module object (or theparticular object it needs), either directly or indirectly.This is currently difficult in two situations:

  • Methods of classes, which receive a reference to the class, but not tothe class’s module
  • Libraries with C-level callbacks, unless the callbacks can receive customdata set at callback registration

Fixing these cases is outside of the scope of this PEP, but will be needed forthe new mechanism to be useful to all modules. Proper fixes have been discussedon the import-sig mailing list[5].

As a rule of thumb, modules that rely onPyState_FindModule are, at the moment,not good candidates for porting to the new mechanism.

New Functions

A new function and macro implementing the module creation phase will be added.These are similar toPyModule_Create andPyModule_Create2, except theytake an additional ModuleSpec argument, and handle module definitions withnon-NULL slots:

PyObject*PyModule_FromDefAndSpec(PyModuleDef*def,PyObject*spec)PyObject*PyModule_FromDefAndSpec2(PyModuleDef*def,PyObject*spec,intmodule_api_version)

A new function implementing the module execution phase will be added.This allocates per-module state (if not allocated already), andalwaysprocesses execution slots. The import machinery calls this method whena module is executed, unless the module is being reloaded:

PyAPI_FUNC(int)PyModule_ExecDef(PyObject*module,PyModuleDef*def)

Another function will be introduced to initialize aPyModuleDef object.This idempotent function fills in the type, refcount, and module index.It returns its argument cast toPyObject*, so it can be returned directlyfrom aPyInit function:

PyObject*PyModuleDef_Init(PyModuleDef*);

Additionally, two helpers will be added for setting the docstring andmethods on a module:

intPyModule_SetDocString(PyObject*,constchar*)intPyModule_AddFunctions(PyObject*,PyMethodDef*)

Export Hook Name

As portable C identifiers are limited to ASCII, module namesmust be encoded to form the PyInit hook name.

For ASCII module names, the import hook is namedPyInit_<modulename>, where<modulename> is the name of the module.

For module names containing non-ASCII characters, the import hook is namedPyInitU_<encodedname>, where the name is encoded using CPython’s“punycode” encoding (Punycode with a lowercase suffix),with hyphens (“-”) replaced by underscores (“_”).

In Python:

defexport_hook_name(name):try:suffix=b'_'+name.encode('ascii')exceptUnicodeEncodeError:suffix=b'U_'+name.encode('punycode').replace(b'-',b'_')returnb'PyInit'+suffix

Examples:

Module nameInit hook name
spamPyInit_spam
lančmítPyInitU_lanmt_2sa6t
スパムPyInitU_zck5b2b

For modules with non-ASCII names, single-phase initialization is not supported.

In the initial implementation of this PEP, built-in modules with non-ASCIInames will not be supported.

Module Reloading

Reloading an extension module usingimportlib.reload() will continue tohave no effect, except re-setting import-related attributes.

Due to limitations in shared library loading (both dlopen on POSIX andLoadModuleEx on Windows), it is not generally possible to loada modified library after it has changed on disk.

Use cases for reloading other than trying out a new version of the moduleare too rare to require all module authors to keep reloading in mind.If reload-like functionality is needed, authors can export a dedicatedfunction for it.

Multiple modules in one library

To support multiple Python modules in one shared library, the library canexport additional PyInit* symbols besides the one that correspondsto the library’s filename.

Note that this mechanism can currently only be used toload extra modules,but not tofind them. (This is a limitation of the loader mechanism,which this PEP does not try to modify.)To work around the lack of a suitable finder, code like the followingcan be used:

importimportlib.machineryimportimportlib.utilloader=importlib.machinery.ExtensionFileLoader(name,path)spec=importlib.util.spec_from_loader(name,loader)module=importlib.util.module_from_spec(spec)loader.exec_module(module)returnmodule

On platforms that support symbolic links, these may be used to install onelibrary under multiple names, exposing all exported modules to normalimport machinery.

Testing and initial implementations

For testing, a new built-in module_testmultiphase will be created.The library will export several additional modules using the mechanismdescribed in “Multiple modules in one library”.

The_testcapi module will be unchanged, and will use single-phaseinitialization indefinitely (or until it is no longer supported).

Thearray andxx* modules will be converted to use multi-phaseinitialization as part of the initial implementation.

Summary of API Changes and Additions

New functions:

  • PyModule_FromDefAndSpec (macro)
  • PyModule_FromDefAndSpec2
  • PyModule_ExecDef
  • PyModule_SetDocString
  • PyModule_AddFunctions
  • PyModuleDef_Init

New macros:

  • Py_mod_create
  • Py_mod_exec

New types:

  • PyModuleDef_Type will be exposed

New structures:

  • PyModuleDef_Slot

Other changes:

PyModuleDef.m_reload changes toPyModuleDef.m_slots.

BuiltinImporter andExtensionFileLoader will now implementcreate_module andexec_module.

The internal_imp module will have backwards incompatible changes:create_builtin,create_dynamic, andexec_dynamic will be added;init_builtin,load_dynamic will be removed.

The undocumented functionsimp.load_dynamic andimp.init_builtin willbe replaced by backwards-compatible shims.

Backwards Compatibility

Existing modules will continue to be source- and binary-compatible with newversions of Python.Modules that use multi-phase initialization will not be compatible withversions of Python that do not implement this PEP.

The functionsinit_builtin andload_dynamic will be removed fromthe_imp module (but not from theimp module).

All changed loaders (BuiltinImporter andExtensionFileLoader) willremain backwards-compatible; theload_module method will be replaced bya shim.

Internal functions of Python/import.c and Python/importdl.c will be removed.(Specifically, these are_PyImport_GetDynLoadFunc,_PyImport_GetDynLoadWindows, and_PyImport_LoadDynamicModule.)

Possible Future Extensions

The slots mechanism, inspired byPyType_Slot fromPEP 384,allows later extensions.

Some extension modules exports many constants; for example_ssl hasa long list of calls in the form:

PyModule_AddIntConstant(m,"SSL_ERROR_ZERO_RETURN",PY_SSL_ERROR_ZERO_RETURN);

Converting this to a declarative list, similar toPyMethodDef,would reduce boilerplate, and provide free error-checking whichis often missing.

String constants and types can be handled similarly.(Note that non-default bases for types cannot be portably specifiedstatically; this case would need aPy_mod_exec function that runsbefore the slots are added. The free error-checking would still bebeneficial, though.)

Another possibility is providing a “main” function that would be runwhen the module is given to Python’s-m switch.For this to work, therunpy module will need to be modified to takeadvantage of ModuleSpec-based loading introduced inPEP 451.Also, it will be necessary to add a mechanism for setting up a moduleaccording to slots it wasn’t originally defined with.

Implementation

Work-in-progress implementation is available in a GitHub repository[3];a patchset is at[4].

Previous Approaches

Stefan Behnel’s initial proto-PEP[1]had a “PyInit_modulename” hook that would create a module class,whose__init__ would be then called to create the module.This proposal did not correspond to the (then nonexistent)PEP 451,where module creation and initialization is broken into distinct steps.It also did not support loading an extension into pre-existing module objects.

Alyssa (Nick) Coghlan proposed “Create” and “Exec” hooks, and wrote a prototypeimplementation[2].At this timePEP 451 was still not implemented, so the prototypedoes not use ModuleSpec.

The original version of this PEP usedCreate andExec hooks, and allowedloading into arbitrary pre-constructed objects withExec hook.The proposal made extension module initialization closer to how Python modulesare initialized, but it was later recognized that this isn’t an important goal.The current PEP describes a simpler solution.

A further iteration used a “PyModuleExport” hook as an alternative toPyInit,wherePyInit was used for existing scheme, andPyModuleExport for multi-phase.However, not being able to determine the hook name based on module namecomplicated automatic generation ofPyImport_Inittab by tools like freeze.Keeping only thePyInit hook name, even if it’s not entirely appropriate forexporting a definition, yielded a much simpler solution.

References

[1]
https://mail.python.org/pipermail/python-dev/2013-August/128087.html
[2]
https://mail.python.org/pipermail/python-dev/2013-August/128101.html
[3]
https://github.com/encukou/cpython/commits/pep489
[4]
https://github.com/encukou/cpython/compare/master…encukou:pep489.patch
[5]
https://mail.python.org/pipermail/import-sig/2015-April/000959.html
[6]
https://docs.python.org/3/c-api/init.html#sub-interpreter-support

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0489.rst

Last modified:2025-10-07 15:05:23 GMT


[8]ページ先頭

©2009-2025 Movatter.jp