Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 630 – Isolating Extension Modules

PEP 630 – Isolating Extension Modules

Author:
Petr Viktorin <encukou at gmail.com>
Discussions-To:
Capi-SIG list
Status:
Final
Type:
Informational
Created:
25-Aug-2020
Post-History:
16-Jul-2020

Table of Contents

Important

This PEP is a historical document. The up-to-date, canonical documentation can now be found atIsolating Extension Modules HOWTO.

×

SeePEP 1 for how to propose changes.

Abstract

Traditionally, state belonging to Python extension modules was kept in Cstatic variables, which have process-wide scope. This documentdescribes problems of such per-process state and efforts to makeper-module state—a better default—possible and easy to use.

The document also describes how to switch to per-module state wherepossible. This transition involves allocating space for that state, potentiallyswitching from static types to heap types, and—perhaps mostimportantly—accessing per-module state from code.

About This Document

As aninformational PEP,this document does not introduce any changes; those should be done intheir own PEPs (or issues, if small enough). Rather, it covers themotivation behind an effort that spans multiple releases, and instructsearly adopters on how to use the finished features.

Once support is reasonably complete, this content can be moved to Python’sdocumentation as aHOWTO.Meanwhile, in the spirit of documentation-driven development,gaps identified in this PEP can show where to focus the effort,and it can be updated as new features are implemented.

Whenever this PEP mentionsextension modules, the advice alsoapplies tobuilt-in modules.

Note

This PEP contains generic advice. When following it, always take intoaccount the specifics of your project.

For example, while much of this advice applies to the C parts ofPython’s standard library, the PEP does not factor in stdlib specifics(unusual backward compatibility issues, access to private API, etc.).

PEPs related to this effort are:

  • PEP 384Defining a Stable ABI, which added a C API for creatingheap types
  • PEP 489Multi-phase extension module initialization
  • PEP 573Module State Access from C Extension Methods

This document is concerned with Python’s public C API, which is notoffered by all implementations of Python. However, nothing in this PEP isspecific to CPython.

As with any Informational PEP, this text does not necessarily representa Python community consensus or recommendation.

Motivation

Aninterpreter is the context in which Python code runs. It containsconfiguration (e.g. the import path) and runtime state (e.g. the set ofimported modules).

Python supports running multiple interpreters in one process. There aretwo cases to think about—users may run interpreters:

  • in sequence, with severalPy_InitializeEx/Py_FinalizeExcycles, and
  • in parallel, managing “sub-interpreters” usingPy_NewInterpreter/Py_EndInterpreter.

Both cases (and combinations of them) would be most useful whenembedding Python within a library. Libraries generally shouldn’t makeassumptions about the application that uses them, which includesassuming a process-wide “main Python interpreter”.

Currently, CPython doesn’t handle this use case well. Many extensionmodules (and even some stdlib modules) useper-process global state,because Cstatic variables are extremely easy to use. Thus, datathat should be specific to an interpreter ends up being shared betweeninterpreters. Unless the extension developer is careful, it is very easyto introduce edge cases that lead to crashes when a module is loaded inmore than one interpreter in the same process.

Unfortunately,per-interpreter state is not easy to achieve—extensionauthors tend to not keep multiple interpreters in mind when developing,and it is currently cumbersome to test the behavior.

Rationale for Per-module State

Instead of focusing on per-interpreter state, Python’s C API is evolvingto better support the more granularper-module state. By default,C-level data will be attached to amodule object. Each interpreterwill then create its own module object, keeping the data separate. Fortesting the isolation, multiple module objects corresponding to a singleextension can even be loaded in a single interpreter.

Per-module state provides an easy way to think about lifetime andresource ownership: the extension module will initialize when amodule object is created, and clean up when it’s freed. In this regard,a module is just like any otherPyObject*; there are no “oninterpreter shutdown” hooks to think—or forget—about.

Goal: Easy-to-Use Module State

It is currently cumbersome or impossible to do everything the C APIoffers while keeping modules isolated. Enabled byPEP 384, changes inPEP 489 andPEP 573 (and future planned ones) aim to first make itpossible to build modules this way, and then to make iteasy towrite new modules this way and to convert old ones, so that it canbecome a natural default.

Even if per-module state becomes the default, there will be use casesfor different levels of encapsulation: per-process, per-interpreter,per-thread or per-task state. The goal is to treat these as exceptionalcases: they should be possible, but extension authors will need tothink more carefully about them.

Non-goals: Speedups and the GIL

There is some effort to speed up CPython on multi-core CPUs by making the GILper-interpreter. While isolating interpreters helps that effort,defaulting to per-module state will be beneficial even if no speedup isachieved, as it makes supporting multiple interpreters safer by default.

Making Modules Safe with Multiple Interpreters

There are many ways to correctly support multiple interpreters inextension modules. The rest of this text describes the preferred way towrite such a module, or to convert an existing one.

Note that support is a work in progress; the API for some features yourmodule needs might not yet be ready.

A full example module is available asxxlimited.

This section assumes that “you” are an extension module author.

Isolated Module Objects

The key point to keep in mind when developing an extension module isthat several module objects can be created from a single shared library.For example:

>>>importsys>>>importbinascii>>>old_binascii=binascii>>>delsys.modules['binascii']>>>importbinascii# create a new module object>>>old_binascii==binasciiFalse

As a rule of thumb, the two modules should be completely independent.All objects and state specific to the module should be encapsulatedwithin the module object, not shared with other module objects, andcleaned up when the module object is deallocated. Exceptions arepossible (seeManaging Global State), but they will need morethought and attention to edge cases than code that follows this rule ofthumb.

While some modules could do with less stringent restrictions, isolatedmodules make it easier to set clear expectations (and guidelines) thatwork across a variety of use cases.

Surprising Edge Cases

Note that isolated modules do create some surprising edge cases. Mostnotably, each module object will typically not share its classes andexceptions with other similar modules. Continuing from theexample above,note thatold_binascii.Error andbinascii.Error areseparate objects. In the following code, the exception isnot caught:

>>>old_binascii.Error==binascii.ErrorFalse>>>try:...old_binascii.unhexlify(b'qwertyuiop')...exceptbinascii.Error:...print('boo')...Traceback (most recent call last):  File"<stdin>", line2, in<module>binascii.Error:Non-hexadecimal digit found

This is expected. Notice that pure-Python modules behave the same way:it is a part of how Python works.

The goal is to make extension modules safe at the C level, not to makehacks behave intuitively. Mutatingsys.modules “manually” countsas a hack.

Managing Global State

Sometimes, state of a Python module is not specific to that module, butto the entire process (or something else “more global” than a module).For example:

  • Thereadline module managesthe terminal.
  • A module running on a circuit board wants to controlthe on-boardLED.

In these cases, the Python module should provideaccess to the globalstate, rather thanown it. If possible, write the module so thatmultiple copies of it can access the state independently (along withother libraries, whether for Python or other languages).

If that is not possible, consider explicit locking.

If it is necessary to use process-global state, the simplest way toavoid issues with multiple interpreters is to explicitly prevent amodule from being loaded more than once per process—seeOpt-Out: Limiting to One Module Object per Process.

Managing Per-Module State

To use per-module state, usemulti-phase extension module initializationintroduced inPEP 489. This signals that your module supports multipleinterpreters correctly.

SetPyModuleDef.m_size to a positive number to request that manybytes of storage local to the module. Usually, this will be set to thesize of some module-specificstruct, which can store all of themodule’s C-level state. In particular, it is where you should putpointers to classes (including exceptions, but excluding static types)and settings (e.g.csv’sfield_size_limit)which the C code needs to function.

Note

Another option is to store state in the module’s__dict__,but you must avoid crashing when users modify__dict__ fromPython code. This means error- and type-checking at the C level,which is easy to get wrong and hard to test sufficiently.

If the module state includesPyObject pointers, the module objectmust hold references to those objects and implement the module-level hooksm_traverse,m_clear andm_free. These work liketp_traverse,tp_clear andtp_free of a class. Adding them willrequire some work and make the code longer; this is the price formodules which can be unloaded cleanly.

An example of a module with per-module state is currently available asxxlimited;example module initialization shown at the bottom of the file.

Opt-Out: Limiting to One Module Object per Process

A non-negativePyModuleDef.m_size signals that a module supportsmultiple interpreters correctly. If this is not yet the case for yourmodule, you can explicitly make your module loadable only once perprocess. For example:

staticintloaded=0;staticintexec_module(PyObject*module){if(loaded){PyErr_SetString(PyExc_ImportError,"cannot load module more than once per process");return-1;}loaded=1;// ... rest of initialization}

Module State Access from Functions

Accessing the state from module-level functions is straightforward.Functions get the module object as their first argument; for extractingthe state, you can usePyModule_GetState:

staticPyObject*func(PyObject*module,PyObject*args){my_struct*state=(my_struct*)PyModule_GetState(module);if(state==NULL){returnNULL;}// ... rest of logic}

Note

PyModule_GetState may return NULL without setting anexception if there is no module state, i.e.PyModuleDef.m_size waszero. In your own module, you’re in control ofm_size, so this iseasy to prevent.

Heap Types

Traditionally, types defined in C code arestatic; that is,staticPyTypeObject structures defined directly in code andinitialized usingPyType_Ready().

Such types are necessarily shared across the process. Sharing thembetween module objects requires paying attention to any state they ownor access. To limit the possible issues, static types are immutable atthe Python level: for example, you can’t setstr.myattribute=123.

Note

Sharing truly immutable objects between interpreters is fine,as long as they don’t provide access to mutable objects.However, in CPython, every Python object has a mutable implementationdetail: the reference count. Changes to the refcount are guarded by the GIL.Thus, code that shares any Python objects across interpreters implicitlydepends on CPython’s current, process-wide GIL.

Because they are immutable and process-global, static types cannot access“their” module state.If any method of such a type requires access to module state,the type must be converted to aheap-allocated type, orheap typefor short. These correspond more closely to classes created by Python’sclass statement.

For new modules, using heap types by default is a good rule of thumb.

Static types can be converted to heap types, but note thatthe heap type API was not designed for “lossless” conversionfrom static types – that is, creating a type that works exactly like a givenstatic type. Unlike static types, heap type objects are mutable by default.Also, when rewriting the class definition in a new API,you are likely to unintentionally change a few details (e.g. pickle-abilityor inherited slots). Always test the details that are important to you.

Defining Heap Types

Heap types can be created by filling aPyType_Spec structure, adescription or “blueprint” of a class, and callingPyType_FromModuleAndSpec() to construct a new class object.

Note

Other functions, likePyType_FromSpec(), can also createheap types, butPyType_FromModuleAndSpec() associates the modulewith the class, allowing access to the module state from methods.

The class should generally be stored inboth the module state (forsafe access from C) and the module’s__dict__ (for access fromPython code).

Garbage Collection Protocol

Instances of heap types hold a reference to their type.This ensures that the type isn’t destroyed before all its instances are,but may result in reference cycles that need to be broken by thegarbage collector.

To avoid memory leaks, instances of heap types must implement thegarbage collection protocol.That is, heap types should:

  • Have thePy_TPFLAGS_HAVE_GC flag.
  • Define a traverse function usingPy_tp_traverse, whichvisits the type (e.g. usingPy_VISIT(Py_TYPE(self));).

Please refer to thedocumentation ofPy_TPFLAGS_HAVE_GC andtp_traversefor additional considerations.

If your traverse function delegates to thetp_traverse of its base class(or another type), ensure thatPy_TYPE(self) is visited only once.Note that only heap type are expected to visit the type intp_traverse.

For example, if your traverse function includes:

base->tp_traverse(self,visit,arg)

…andbase may be a static type, then it should also include:

if(base->tp_flags&Py_TPFLAGS_HEAPTYPE){// a heap type's tp_traverse already visited Py_TYPE(self)}else{Py_VISIT(Py_TYPE(self));}

It is not necessary to handle the type’s reference count intp_newandtp_clear.

Module State Access from Classes

If you have a type object defined withPyType_FromModuleAndSpec(),you can callPyType_GetModule to get the associated module, and thenPyModule_GetState to get the module’s state.

To save a some tedious error-handling boilerplate code, you can combinethese two steps withPyType_GetModuleState, resulting in:

my_struct*state=(my_struct*)PyType_GetModuleState(type);if(state===NULL){returnNULL;}

Module State Access from Regular Methods

Accessing the module-level state from methods of a class is somewhat morecomplicated, but is possible thanks to the changes introduced inPEP 573.To get the state, you need to first get thedefining class, and thenget the module state from it.

The largest roadblock is gettingthe class a method was defined in, orthat method’s “defining class” for short. The defining class can have areference to the module it is part of.

Do not confuse the defining class withPy_TYPE(self). If the methodis called on asubclass of your type,Py_TYPE(self) will refer tothat subclass, which may be defined in different module than yours.

Note

The following Python code can illustrate the concept.Base.get_defining_class returnsBase eveniftype(self)==Sub:

classBase:defget_defining_class(self):return__class__classSub(Base):pass

For a method to get its “defining class”, it must use theMETH_METHOD|METH_FASTCALL|METH_KEYWORDScalling conventionand the correspondingPyCMethod signature:

PyObject*PyCMethod(PyObject*self,// object the method was called onPyTypeObject*defining_class,// defining classPyObject*const*args,// C array of argumentsPy_ssize_tnargs,// length of "args"PyObject*kwnames)// NULL, or dict of keyword arguments

Once you have the defining class, callPyType_GetModuleState to getthe state of its associated module.

For example:

staticPyObject*example_method(PyObject*self,PyTypeObject*defining_class,PyObject*const*args,Py_ssize_tnargs,PyObject*kwnames){my_struct*state=(my_struct*)PyType_GetModuleState(defining_class);if(state===NULL){returnNULL;}...// rest of logic}PyDoc_STRVAR(example_method_doc,"...");staticPyMethodDefmy_methods[]={{"example_method",(PyCFunction)(void(*)(void))example_method,METH_METHOD|METH_FASTCALL|METH_KEYWORDS,example_method_doc}{NULL},}

Module State Access from Slot Methods, Getters and Setters

Note

This is new in Python 3.11.

Slot methods – the fast C equivalents for special methods, such asnb_addfor__add__ ortp_newfor initialization – have a very simple API that doesn’t allowpassing in the defining class, unlike withPyCMethod.The same goes for getters and setters defined withPyGetSetDef.

To access the module state in these cases, use thePyType_GetModuleByDeffunction, and pass in the module definition.Once you have the module, callPyModule_GetStateto get the state:

PyObject*module=PyType_GetModuleByDef(Py_TYPE(self),&module_def);my_struct*state=(my_struct*)PyModule_GetState(module);if(state===NULL){returnNULL;}

PyType_GetModuleByDef works by searching theMRO(i.e. all superclasses) for the first superclass that has a correspondingmodule.

Note

In very exotic cases (inheritance chains spanning multiple modulescreated from the same definition),PyType_GetModuleByDef might notreturn the module of the true defining class. However, it will alwaysreturn a module with the same definition, ensuring a compatibleC memory layout.

Lifetime of the Module State

When a module object is garbage-collected, its module state is freed.For each pointer to (a part of) the module state, you must hold a referenceto the module object.

Usually this is not an issue, because types created withPyType_FromModuleAndSpec, and their instances, hold a referenceto the module.However, you must be careful in reference counting when you referencemodule state from other places, such as callbacks for externallibraries.

Open Issues

Several issues around per-module state and heap types are still open.

Discussions about improving the situation are best held on thecapi-sigmailing list.

Type Checking

Currently (as of Python 3.10), heap types have no good API to writePy*_Check functions (likePyUnicode_Check exists forstr, astatic type), and so it is not easy to ensure that instances have aparticular C layout.

Metaclasses

Currently (as of Python 3.10), there is no good API to specify themetaclass of a heap type; that is, theob_type field of the typeobject.

Per-Class Scope

It is also not possible to attach state totypes. WhilePyHeapTypeObject is a variable-size object (PyVarObject),its variable-size storage is currently consumed by slots. Fixing thisis complicated by the fact that several classes in an inheritancehierarchy may need to reserve some state.

Lossless Conversion to Heap Types

The heap type API was not designed for “lossless” conversion from static types;that is, creating a type that works exactly like a given static type.The best way to address it would probably be to write a guide that coversknown “gotchas”.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0630.rst

Last modified:2025-10-25 21:23:03 GMT


[8]ページ先頭

©2009-2026 Movatter.jp