Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 573 – Module State Access from C Extension Methods

Author:
Petr Viktorin <encukou at gmail.com>,Alyssa Coghlan <ncoghlan at gmail.com>,Eric Snow <ericsnowcurrently at gmail.com>,Marcel Plch <gmarcel.plch at gmail.com>
BDFL-Delegate:
Stefan Behnel
Discussions-To:
Import-SIG list
Status:
Final
Type:
Standards Track
Created:
02-Jun-2016
Python-Version:
3.9
Post-History:


Table of Contents

Abstract

This PEP proposes to add a way for CPython extension methods to access context,such as the state of the modules they are defined in.

This will allow extension methods to use direct pointer dereferencesrather than PyState_FindModule for looking up module state, reducing oreliminating the performance cost of using module-scoped state over processglobal state.

This fixes one of the remaining roadblocks for adoption ofPEP 3121 (Extensionmodule initialization and finalization) andPEP 489(Multi-phase extension module initialization).

While this PEP takes an additional step towards fully solving the problems thatPEP 3121 andPEP 489 started tackling, it does not attempt to resolveallremaining concerns. In particular, access to the module statefrom slot methods (nb_add, etc) is not solved.

Terminology

Process-Global State

C-level static variables. Since this is very low-levelmemory storage, it must be managed carefully.

Per-module State

State local to a module object, allocated dynamically as part of amodule object’s initialization. This isolates the state from otherinstances of the module (including those in other subinterpreters).

Accessed byPyModule_GetState().

Static Type

A type object defined as a C-level static variable, i.e. a compiled-in typeobject.

A static type needs to be shared between module instances and has noinformation of what module it belongs to.Static types do not have__dict__ (although their instances might).

Heap Type

A type object created at run time.

Defining Class

The defining class of a method (either bound or unbound) is the class on whichthe method was defined.A class that merely inherits the method from its base is not the defining class.

For example,int is the defining class ofTrue.to_bytes,True.__floor__ andint.__repr__.

In C, the defining class is the one defined with the correspondingtp_methods or “tp slots”[1] entry.For methods defined in Python, the defining class is saved in the__class__ closure cell.

C-API

The “Python/C API” as described in Python documentation.CPython implements the C-API, but other implementations exist.

Rationale

PEP 489 introduced a new way to initialize extension modules, which bringsseveral advantages to extensions that implement it:

  • The extension modules behave more like their Python counterparts.
  • The extension modules can easily support loading into pre-existingmodule objects, which paves the way for extension module support forrunpy or for systems that enable extension module reloading.
  • Loading multiple modules from the same extension is possible, whichmakes it possible to test module isolation (a key feature for propersub-interpreter support) from a single interpreter.

The biggest hurdle for adoption ofPEP 489 is allowing access to module statefrom methods of extension types.Currently, the way to access this state from extension methods is by looking upthe module viaPyState_FindModule (in contrast to module level functions inextension modules, which receive a module reference as an argument).However,PyState_FindModule queries the thread-local state, making itrelatively costly compared to C level process global access and consequentlydeterring module authors from using it.

Also,PyState_FindModule relies on the assumption that in eachsubinterpreter, there is at most one module corresponding toa givenPyModuleDef. This assumption does not hold for modules that usePEP 489’s multi-phase initialization, soPyState_FindModule is unavailablefor these modules.

A faster, safer way of accessing module-level state from extension methodsis needed.

Background

The implementation of a Python method may need access to one or more ofthe following pieces of information:

  • The instance it is called on (self)
  • The underlying function
  • Thedefining class, i. e. the class the method was defined in
  • The corresponding module
  • The module state

In Python code, the Python-level equivalents may be retrieved as:

importsysclassFoo:defmeth(self):instance=selfmodule_globals=globals()module_object=sys.modules[__name__]# (1)underlying_function=Foo.meth# (1)defining_class=Foo# (1)defining_class=__class__# (2)

Note

The defining class is nottype(self), sincetype(self) mightbe a subclass ofFoo.

The statements marked (1) implicitly rely on name-based lookup via thefunction’s__globals__: either theFoo attribute to access the definingclass and Python function object, or__name__ to find the module object insys.modules.

In Python code, this is feasible, as__globals__ is set appropriately whenthe function definition is executed, and even if the namespace has beenmanipulated to return a different object, at worst an exception will be raised.

The__class__ closure, (2), is a safer way to get the defining class, but itstill relies on__closure__ being set appropriately.

By contrast, extension methods are typically implemented as normal C functions.This means that they only have access to their arguments and C level thread-localand process-global states. Traditionally, many extension modules have storedtheir shared state in C-level process globals, causing problems when:

  • running multiple initialize/finalize cycles in the same process
  • reloading modules (e.g. to test conditional imports)
  • loading extension modules in subinterpreters

PEP 3121 attempted to resolve this by offering thePyState_FindModule API,but this still has significant problems when it comes to extension methods(rather than module level functions):

  • it is markedly slower than directly accessing C-level process-global state
  • there is still some inherent reliance on process global state that means itstill doesn’t reliably handle module reloading

It’s also the case that when looking up a C-level struct such as module state,supplying an unexpected object layout can crash the interpreter, so it’ssignificantly more important to ensure that extension methods receive the kindof object they expect.

Proposal

Currently, a bound extension method (PyCFunction orPyCFunctionWithKeywords) receives onlyself, and (if applicable) thesupplied positional and keyword arguments.

While module-level extension functions already receive access to the definingmodule object via theirself argument, methods of extension types don’t havethat luxury: they receive the bound instance viaself, and hence have nodirect access to the defining class or the module level state.

The additional module level context described above can be made available withtwo changes.Both additions are optional; extension authors need to opt in to startusing them:

  • Add a pointer to the module to heap type objects.
  • Pass the defining class to the underlying C function.

    In CPython, the defining class is readily available at the time the built-inmethod object (PyCFunctionObject) is created, so it can be storedin a new struct that extendsPyCFunctionObject.

The module state can then be retrieved from the module object viaPyModule_GetState.

Note that this proposal implies that any type whose methods need to accessper-module state must be a heap type, rather than a static type. This isnecessary to support loading multiple module objects from a singleextension: a static type, as a C-level global, has no information aboutwhich module object it belongs to.

Slot methods

The above changes don’t cover slot methods, such astp_iter ornb_add.

The problem with slot methods is that their C API is fixed, so we can’tsimply add a new argument to pass in the defining class.Two possible solutions have been proposed to this problem:

  • Look up the class through walking the MRO.This is potentially expensive, but will be usable if performance is nota problem (such as when raising a module-level exception).
  • Storing a pointer to the defining class of each slot in a separate table,__typeslots__[2]. This is technically feasible and fast,but quite invasive.

Modules affected by this concern also have the option of usingthread-local state orPEP 567 context variables as a caching mechanism, orelse defining their own reload-friendly lookup caching scheme.

Solving the issue generally is deferred to a future PEP.

Specification

Adding module references to heap types

A new factory method will be added to the C-API for creating modules:

PyObject*PyType_FromModuleAndSpec(PyObject*module,PyType_Spec*spec,PyObject*bases)

This acts the same asPyType_FromSpecWithBases, and additionally associatesthe provided module object with the new type. (In CPython, this will setht_module described below.)

Additionally, an accessor,PyObject*PyType_GetModule(PyTypeObject*)will be provided.It will return the type’s associated module if one is set,otherwise it will setTypeError and return NULL.When given a static type, it will always setTypeError and return NULL.

To implement this in CPython, thePyHeapTypeObject struct will get anew member,PyObject*ht_module, that will store a pointer to theassociated module.It will beNULL by default and should not be modified after the typeobject is created.

Theht_module member will not be inherited by subclasses; it needs to beset usingPyType_FromSpecWithBases for each individual type that needs it.

Usually, creating a class withht_module set will create a referencecycle involving the class and the module.This is not a problem, as tearing down modules is not a performance-sensitiveoperation, and module-level functions typically also create reference cycles.The existing “set all module globals to None” code that breaks function cyclesthroughf_globals will also break the new cycles throughht_module.

Passing the defining class to extension methods

A new signature flag,METH_METHOD, will be added for use inPyMethodDef.ml_flags. Conceptually, it addsdefining_classto the function signature.To make the initial implementation easier, the flag can only be used as(METH_FASTCALL|METH_KEYWORDS|METH_METHOD).(It can’t be used with other flags likeMETH_O or bareMETH_FASTCALL,though it may be combined withMETH_CLASS orMETH_STATIC).

C functions for methods defined using this flag combination will be calledusing a new C signature calledPyCMethod:

PyObject*PyCMethod(PyObject*self,PyTypeObject*defining_class,PyObject*const*args,size_tnargsf,PyObject*kwnames)

Additional combinations like(METH_VARARGS|METH_METHOD) may be addedin the future (or even in the initial implementation of this PEP).However,METH_METHOD should always be anadditional flag, i.e., thedefining class should only be passed in if needed.

In CPython, a new structure extendingPyCFunctionObject will be addedto hold the extra information:

typedefstruct{PyCFunctionObjectfunc;PyTypeObject*mm_class;/*Passedas'defining_class'argtotheCfunc*/}PyCMethodObject;

ThePyCFunction implementation will passmm_class into aPyCMethod C function when it finds theMETH_METHOD flag being set.A new macroPyCFunction_GET_CLASS(cls) will be added for easier accesstomm_class.

C methods may continue to use the otherMETH_* signatures if they donot require access to their defining class/module.IfMETH_METHOD is not set, casting toPyCMethodObject is invalid.

Argument Clinic

To support passing the defining class to methods using Argument Clinic,a new converter calleddefining_class will be added to CPython’s ArgumentClinic tool.

Each method may only have one argument using this converter, and it mustappear afterself, or, ifself is not used, as the first argument.The argument will be of typePyTypeObject*.

When used, Argument Clinic will selectMETH_FASTCALL|METH_KEYWORDS|METH_METHOD as the calling convention.The argument will not appear in__text_signature__.

The new converter will initially not be compatible with__init__ and__new__ methods, which cannot use theMETH_METHOD convention.

Helpers

Getting toper-module state from a heap type is a very common task. To makethis easier, a helper will be added:

void*PyType_GetModuleState(PyObject*type)

This function takes a heap type and on success, it returns pointer to the stateof the module that the heap type belongs to.

On failure, two scenarios may occur. When a non-type object, or a type without amodule is passed in,TypeError is set andNULL returned. If the moduleis found, the pointer to the state, which may beNULL, is returned withoutsetting any exception.

Modules Converted in the Initial Implementation

To validate the approach, the_elementtree module will be modified duringthe initial implementation.

Summary of API Changes and Additions

The following will be added to Python C-API:

  • PyType_FromModuleAndSpec function
  • PyType_GetModule function
  • PyType_GetModuleState function
  • METH_METHOD call flag
  • PyCMethod function signature

The following additions will be added as CPython implementation details,and won’t be documented:

  • PyCFunction_GET_CLASS macro
  • PyCMethodObject struct
  • ht_module member of_heaptypeobject
  • defining_class converter in Argument Clinic

Backwards Compatibility

One new pointer is added to all heap types.All other changes are adding new functions and structures,or changes to private implementation details.

Implementation

An initial implementation is available in a Github repository[3];a patchset is at[4].

Possible Future Extensions

Slot methods

A way of passing defining class (or module state) to slot methods may beadded in the future.

A previous version of this PEP proposed a helper function that would determinea defining class by searching the MRO for a class that defines a slot to aparticular function. However, this approach would fail if a class is mutated(which is, for heap types, possible from Python code).Solving this problem is left to future discussions.

Easy creation of types with module references

It would be possible to add aPEP 489 execution slot type to makecreating heap types significantly easier than callingPyType_FromModuleAndSpec.This is left to a future PEP.

It may be good to add a good way to create static exception types from thelimited API. Such exception types could be shared between subinterpreters,but instantiated without needing specific module state.This is also left to possible future discussions.

Optimization

As proposed here, methods defined with theMETH_METHOD flag only supportone specific signature.

If it turns out that other signatures are needed for performance reasons,they may be added.

References

[1]
https://docs.python.org/3/c-api/typeobj.html#tp-slots
[2]
[Import-SIG] On singleton modules, heap types, and subinterpreters(https://mail.python.org/pipermail/import-sig/2015-July/001035.html)
[3]
https://github.com/Dormouse759/cpython/tree/pep-c-rebase_newer
[4]
https://github.com/Dormouse759/cpython/compare/master…Dormouse759:pep-c-rebase_newer

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0573.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp