Defining extension modules¶
A C extension for CPython is a shared library (for example, a.so
fileon Linux,.pyd
DLL on Windows), which is loadable into the Python process(for example, it is compiled with compatible compiler settings), and whichexports aninitialization function.
To be importable by default (that is, byimportlib.machinery.ExtensionFileLoader
),the shared library must be available onsys.path
,and must be named after the module name plus an extension listed inimportlib.machinery.EXTENSION_SUFFIXES
.
Note
Building, packaging and distributing extension modules is best done withthird-party tools, and is out of scope of this document.One suitable tool is Setuptools, whose documentation can be found athttps://setuptools.pypa.io/en/latest/setuptools.html.
Normally, the initialization function returns a module definition initializedusingPyModuleDef_Init()
.This allows splitting the creation process into several phases:
Before any substantial code is executed, Python can determine whichcapabilities the module supports, and it can adjust the environment orrefuse loading an incompatible extension.
By default, Python itself creates the module object – that is, it doesthe equivalent of
object.__new__()
for classes.It also sets initial attributes like__package__
and__loader__
.Afterwards, the module object is initialized using extension-specificcode – the equivalent of
__init__()
on classes.
This is calledmulti-phase initialization to distinguish it from the legacy(but still supported)single-phase initialization scheme,where the initialization function returns a fully constructed module.See thesingle-phase-initialization section belowfor details.
Changed in version 3.5:Added support for multi-phase initialization (PEP 489).
Multiple module instances¶
By default, extension modules are not singletons.For example, if thesys.modules
entry is removed and the moduleis re-imported, a new module object is created, and typically populated withfresh method and type objects.The old module is subject to normal garbage collection.This mirrors the behavior of pure-Python modules.
Additional module instances may be created insub-interpretersor after Python runtime reinitialization(Py_Finalize()
andPy_Initialize()
).In these cases, sharing Python objects between module instances would likelycause crashes or undefined behavior.
To avoid such issues, each instance of an extension module shouldbeisolated: changes to one instance should not implicitly affect the others,and all state owned by the module, including references to Python objects,should be specific to a particular module instance.SeeIsolating Extension Modules for more details and a practical guide.
A simpler way to avoid these issues israising an error on repeated initialization.
All modules are expected to supportsub-interpreters, or otherwise explicitlysignal a lack of support.This is usually achieved by isolation or blocking repeated initialization,as above.A module may also be limited to the main interpreter usingthePy_mod_multiple_interpreters
slot.
Initialization function¶
The initialization function defined by an extension module has thefollowing signature:
Its name should bePyInit_<name>
, with<name>
replaced by thename of the module.
For modules with ASCII-only names, the function must instead be namedPyInit_<name>
, with<name>
replaced by the name of the module.When usingMulti-phase initialization, non-ASCII module namesare allowed. In this case, the initialization function name isPyInitU_<name>
, with<name>
encoded using Python’spunycode encoding with hyphens replaced by underscores. In Python:
definitfunc_name(name):try:suffix=b'_'+name.encode('ascii')exceptUnicodeEncodeError:suffix=b'U_'+name.encode('punycode').replace(b'-',b'_')returnb'PyInit'+suffix
It is recommended to define the initialization function using a helper macro:
- PyMODINIT_FUNC¶
Declare an extension module initialization function.This macro:
specifies thePyObject* return type,
adds any special linkage declarations required by the platform, and
for C++, declares the function as
extern"C"
.
For example, a module calledspam
would be defined like this:
staticstructPyModuleDefspam_module={.m_base=PyModuleDef_HEAD_INIT,.m_name="spam",...};PyMODINIT_FUNCPyInit_spam(void){returnPyModuleDef_Init(&spam_module);}
It is possible to export multiple modules from a single shared library bydefining multiple initialization functions. However, importing them requiresusing symbolic links or a custom importer, because by default only thefunction corresponding to the filename is found.See theMultiple modules in one librarysection inPEP 489 for details.
The initialization function is typically the only non-static
item defined in the module’s C source.
Multi-phase initialization¶
Normally, theinitialization function(PyInit_modulename
) returns aPyModuleDef
instance withnon-NULL
m_slots
.Before it is returned, thePyModuleDef
instance must be initializedusing the following function:
- PyObject*PyModuleDef_Init(PyModuleDef*def)¶
- Part of theStable ABI since version 3.5.
Ensure a module definition is a properly initialized Python object thatcorrectly reports its type and a reference count.
Returndef cast to
PyObject*
, orNULL
if an error occurred.Calling this function is required forMulti-phase initialization.It should not be used in other contexts.
Note that Python assumes that
PyModuleDef
structures are staticallyallocated.This function may return either a new reference or a borrowed one;this reference must not be released.Added in version 3.5.
Legacy single-phase initialization¶
Attention
Single-phase initialization is a legacy mechanism to initialize extensionmodules, with known drawbacks and design flaws. Extension module authorsare encouraged to use multi-phase initialization instead.
In single-phase initialization, theinitialization function (PyInit_modulename
)should create, populate and return a module object.This is typically done usingPyModule_Create()
and functions likePyModule_AddObjectRef()
.
Single-phase initialization differs from thedefaultin the following ways:
Single-phase modules are, or rathercontain, “singletons”.
When the module is first initialized, Python saves the contents ofthe module’s
__dict__
(that is, typically, the module’s functions andtypes).For subsequent imports, Python does not call the initialization functionagain.Instead, it creates a new module object with a new
__dict__
, and copiesthe saved contents to it.For example, given a single-phase module_testsinglephase
[1] that defines a functionsum
and an exception classerror
:>>>importsys>>>import_testsinglephaseasone>>>delsys.modules['_testsinglephase']>>>import_testsinglephaseastwo>>>oneistwoFalse>>>one.__dict__istwo.__dict__False>>>one.sumistwo.sumTrue>>>one.erroristwo.errorTrue
The exact behavior should be considered a CPython implementation detail.
To work around the fact that
PyInit_modulename
does not take aspecargument, some state of the import machinery is saved and applied to thefirst suitable module created during thePyInit_modulename
call.Specifically, when a sub-module is imported, this mechanism prepends theparent package name to the name of the module.A single-phase
PyInit_modulename
function should create “its” moduleobject as soon as possible, before any other module objects can be created.Non-ASCII module names (
PyInitU_modulename
) are not supported.Single-phase modules support module lookup functions like
PyState_FindModule()
.
_testsinglephase
is an internal module usedin CPython’s self-test suite; your installation may or may notinclude it.