Introduce C API incompatible changes to hide implementation details.
Once most implementation details will be hidden, evolution of CPythoninternals would be less limited by C API backward compatibility issues.It will be way easier to add new features.
It becomes possible to experiment with more advanced optimizations inCPython than just micro-optimizations, like tagged pointers.
Define a process to reduce the number of broken C extensions.
The implementation of this PEP is expected to be done carefully overmultiple Python versions. It already started in Python 3.7 and mostchanges are already completed. TheProcess to reduce the number ofbroken C extensions dictates the rhythm.
This PEP was withdrawn by its author since the scope is too broad and the work isdistributed over multiple Python versions, which makes it difficult to makea decision on the overall PEP. It was split into new PEPs withnarrower and better defined scopes, likePEP 670.
Adding or removing members of C structures is causing multiple backwardcompatibility issues.
Adding a new member breaks the stable ABI (PEP 384), especially fortypes declared statically (e.g.staticPyTypeObjectMyType={...};). In Python 3.4, thePEP 442 “Safe object finalization” addedthetp_finalize member at the end of thePyTypeObject structure.For ABI backward compatibility, a newPy_TPFLAGS_HAVE_FINALIZE typeflag was required to announce if the type structure contains thetp_finalize member. The flag was removed in Python 3.8 (bpo-32388).
ThePyTypeObject.tp_print member, deprecated since Python 3.0released in 2009, has been removed in the Python 3.8 development cycle.But the change broke too many C extensions and had to be reverted before3.8 final release. Finally, the member was removed again in Python 3.9.
C extensions rely on the ability to access structure members,indirectly through the C API, or even directly. Modifying structureslikePyListObject cannot be even considered.
ThePyTypeObject structure is the one which evolved the most, simplybecause there was no other way to evolve CPython than modifying it.
A C extension can technically dereference aPyObject* pointer andaccessPyObject members. This prevents experiments like taggedpointers (storing small values asPyObject* which does not point toa validPyObject structure).
Replacing Python garbage collector with a tracing garbage collectorwould also need to removePyObject.ob_refcnt reference counter,whereas currentlyPy_INCREF() andPy_DECREF() macros accessdirectly toPyObject.ob_refcnt.
When the CPython project was created, it was written with one principle:keep the implementation simple enough so it can be maintained by asingle developer. CPython complexity grew a lot and manymicro-optimizations have been implemented, but CPython core design hasnot changed.
Members ofPyObject andPyTupleObject structures have notchanged since the “Initial revision” commit (1990):
#define OB_HEAD \unsignedintob_refcnt; \struct_typeobject*ob_type;typedefstruct_object{OB_HEAD}object;typedefstruct{OB_VARHEADobject*ob_item[1];}tupleobject;
Only names changed:object was renamed toPyObject andtupleobject was renamed toPyTupleObject.
CPython still tracks Python objects lifetime using reference countinginternally and for third party C extensions (through the Python C API).
All Python objects must be allocated on the heap and cannot be moved.
The PyPy project is a Python implementation which is 4.2x faster thanCPython on average. PyPy developers chose to not fork CPython, but startfrom scratch to have more freedom in terms of optimization choices.
PyPy does not use reference counting, but a tracing garbage collectorwhich moves objects. Objects can be allocated on the stack (or even notat all), rather than always having to be allocated on the heap.
Objects layouts are designed with performance in mind. For example, alist strategy stores integers directly as integers, rather than objects.
Moreover, PyPy also has a JIT compiler which emits fast code thanks tothe efficient PyPy design.
While PyPy is way more efficient than CPython to run pure Python code,it is as efficient or slower than CPython to run C extensions.
Since the C API requiresPyObject* and allows to access directlystructure members, PyPy has to associate a CPython object to PyPyobjects and maintain both consistent. Converting a PyPy object to aCPython object is inefficient. Moreover, reference counting also has tobe implemented on top of PyPy tracing garbage collector.
These conversions are required because the Python C API is too close tothe CPython implementation: there is no high-level abstraction.For example, structures members are part of the public C API and nothingprevents a C extension to get or set directlyPyTupleObject.ob_item[0] (the first item of a tuple).
SeeInside cpyext: Why emulating CPython C API is so Hard(Sept 2018) by Antonio Cuni for more details.
Hiding implementation details from the C API has multiple advantages:
ThePEP 384 “Defining a Stable ABI” is implemented in Python 3.4. It introduces the“limited C API”: a subset of the C API. When the limited C API is used,it becomes possible to build a C extension only once and use it onmultiple Python versions: that’s the stable ABI.
The main limitation of thePEP 384 is that C extensions have to opt-infor the limited C API. Only very few projects made this choice,usually to ease distribution of binaries, especially on Windows.
This PEP moves the C API towards the limited C API.
Ideally, the C API will become the limited C API and all C extensionswill use the stable ABI, but this is out of this PEP scope.
Include/cpython/ andInclude/internal/ subdirectories.Py_SET_TYPE(),Py_SET_REFCNT() andPy_SET_SIZE(). ThePy_TYPE(),Py_REFCNT() andPy_SIZE() macros become functions which cannot be used as l-value.pythoncapi_compat.h header file.PySequence_Fast_ITEMS().PyTuple_GET_ITEM() andPyList_GET_ITEM() macros to static inline functions.The first consumer of the C API was Python itself. There is no clearseparation between APIs which must not be used outside Python, and APIwhich are public on purpose.
Header files must be reorganized in 3 API:
Include/ directory is the limited C API: no implementationdetails, structures are opaque. C extensions using it get a stableABI.Include/cpython/ directory is the CPython C API: less “portable”API, depends more on the Python version, expose some implementationdetails, few incompatible changes can happen.Include/internal/ directory is the internal C API: implementationdetails, incompatible changes are likely at each Python release.The creation of theInclude/cpython/ directory is fully backwardcompatible.Include/cpython/ header files cannot be includeddirectly and are included automatically byInclude/ header fileswhen thePy_LIMITED_API macro is not defined.
The internal C API is installed and can be used for specific usage likedebuggers and profilers which must access structures members withoutexecuting code. C extensions using the internal C API are tightlycoupled to a Python version and must be recompiled at each Pythonversion.
STATUS: Completed (in Python 3.8)
The reorganization of header files started in Python 3.7 and wascompleted in Python 3.8:
Private functions which expose implementation details must be moved tothe internal C API.
If a C extension relies on a CPython private function which exposesCPython implementation details, other Python implementations have tore-implement this private function to support this C extension.
STATUS: Completed (in Python 3.9)
Private functions moved to the internal C API in Python 3.8:
_PyObject_GC_TRACK(),_PyObject_GC_UNTRACK()Macros and functions excluded from the limited C API in Python 3.9:
_PyObject_SIZE(),_PyObject_VAR_SIZE()PyThreadState_DeleteCurrent()PyFPE_START_PROTECT(),PyFPE_END_PROTECT()_Py_NewReference(),_Py_ForgetReference()_PyTraceMalloc_NewReference()_Py_GetRefTotal()Private functions moved to the internal C API in Python 3.9:
_Py_AS_GC(),_PyObject_GC_IS_TRACKED()and_PyGCHead_NEXT()_Py_AddToAllObjects() (not exported)_PyDebug_PrintTotalRefs(),_Py_PrintReferences(),_Py_PrintReferenceAddresses() (not exported)Public “clear free list” functions moved to the internal C API andrenamed to private functions in Python 3.9:
PyAsyncGen_ClearFreeLists()PyContext_ClearFreeList()PyDict_ClearFreeList()PyFloat_ClearFreeList()PyFrame_ClearFreeList()PyList_ClearFreeList()PyTuple_ClearFreeList()PyMethod_ClearFreeList() andPyCFunction_ClearFreeList():bound method free list removed in Python 3.9.PySet_ClearFreeList(): set free list removed in Python 3.4.PyUnicode_ClearFreeList(): Unicode free list removedin Python 3.3.Converting macros to static inline functions has multiple advantages:
Converting macros to static inline functions should only impact very fewC extensions that use macros in unusual ways.
For backward compatibility, functions must continue to accept any type,not onlyPyObject*, to avoid compiler warnings, since most macroscast their parameters toPyObject*.
Python 3.6 requires C compilers to support static inline functions: thePEP 7 requires a subset of C99.
STATUS: Completed (in Python 3.9)
Macros converted to static inline functions in Python 3.8:
Py_INCREF(),Py_DECREF()Py_XINCREF(),Py_XDECREF()PyObject_INIT(),PyObject_INIT_VAR()_PyObject_GC_TRACK(),_PyObject_GC_UNTRACK(),_Py_Dealloc()Macros converted to regular functions in Python 3.9:
Py_EnterRecursiveCall(),Py_LeaveRecursiveCall()(added to the limited C API)PyObject_INIT(),PyObject_INIT_VAR()PyObject_GET_WEAKREFS_LISTPTR()PyObject_CheckBuffer()PyIndex_Check()PyObject_IS_GC()PyObject_NEW() (alias toPyObject_New()),PyObject_NEW_VAR() (alias toPyObject_NewVar())PyType_HasFeature() (always callPyType_GetFlags())Py_TRASHCAN_BEGIN_CONDITION() andPy_TRASHCAN_END() macrosnow call functions which hide implementation details, rather thanaccessing directly members of thePyThreadState structure.The following structures of the C API become opaque:
PyInterpreterStatePyThreadStatePyGC_HeadPyTypeObjectPyObject andPyVarObjectPyTypeObjectPyObject orPyVarObjectC extensions must use getter or setter functions to get or set structuremembers. For example,tuple->ob_item[0] must be replaced withPyTuple_GET_ITEM(tuple,0).
To be able to move away from reference counting,PyObject mustbecome opaque. Currently, the reference counterPyObject.ob_refcntis exposed in the C API. All structures must become opaque, since they“inherit” from PyObject. For,PyFloatObject inherits fromPyObject:
typedefstruct{PyObjectob_base;doubleob_fval;}PyFloatObject;
MakingPyObject fully opaque requires convertingPy_INCREF() andPy_DECREF() macros to function calls. This change has an impact onperformance. It is likely to be one of the very last changes when makingstructures opaque.
MakingPyTypeObject structure opaque breaks C extensions declaringtypes statically (e.g.staticPyTypeObjectMyType={...};). Cextensions must usePyType_FromSpec() to allocate types on the heapinstead. Using heap types has other advantages like being compatiblewith subinterpreters. Combined withPEP 489 “Multi-phase extensionmodule initialization”, it makes a C extension behavior closer to aPython module, like allowing to create more than one module instance.
MakingPyThreadState structure opaque requires adding getter andsetter functions for members used by C extensions.
STATUS: In Progress (started in Python 3.8)
ThePyInterpreterState structure was made opaque in Python 3.8(bpo-35886) and thePyGC_Head structure (bpo-40241) was made opaque in Python 3.9.
Issues tracking the work to prepare the C API to make followingstructures opaque:
PyObject:bpo-39573PyTypeObject:bpo-40170PyFrameObject:bpo-40421PyFrame_GetCode() andPyFrame_GetBack()getter functions, and movesPyFrame_GetLineNumber to the limitedC API.PyThreadState:bpo-39947PyThreadState_GetFrame(),PyThreadState_GetID(),PyThreadState_GetInterpreter().ThePy_TYPE() function gets an object type, itsPyObject.ob_typemember. It is implemented as a macro which can be used as an l-value toset the type:Py_TYPE(obj)=new_type. This code relies on theassumption thatPyObject.ob_type can be modified directly. Itprevents making thePyObject structure opaque.
New setter functionsPy_SET_TYPE(),Py_SET_REFCNT() andPy_SET_SIZE() are added and must be used instead.
ThePy_TYPE(),Py_REFCNT() andPy_SIZE() macros must beconverted to static inline functions which can not be used as l-value.
For example, thePy_TYPE() macro:
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)becomes:
#define _PyObject_CAST_CONST(op) ((const PyObject*)(op))staticinlinePyTypeObject*_Py_TYPE(constPyObject*ob){returnob->ob_type;}#define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob))
STATUS: Completed (in Python 3.10)
New functionsPy_SET_TYPE(),Py_SET_REFCNT() andPy_SET_SIZE() were added to Python 3.9.
In Python 3.10,Py_TYPE(),Py_REFCNT() andPy_SIZE() can nolonger be used as l-value and the new setter functions must be usedinstead.
When a function returns a borrowed reference, Python cannot track whenthe caller stops using this reference.
For example, if the Pythonlist type is specialized for smallintegers, store directly “raw” numbers rather than Python objects,PyList_GetItem() has to create a temporary Python object. Theproblem is to decide when it is safe to delete the temporary object.
The general guidelines is to avoid returning borrowed references for newC API functions.
No function returning borrowed references is scheduled for removal bythis PEP.
STATUS: Completed (in Python 3.9)
In Python 3.9, new C API functions returning Python objects only returnstrong references:
PyFrame_GetBack()PyFrame_GetCode()PyObject_CallNoArgs()PyObject_CallOneArg()PyThreadState_GetFrame()ThePySequence_Fast_ITEMS() function gives a direct access to anarray ofPyObject* objects. The function is deprecated in favor ofPyTuple_GetItem() andPyList_GetItem().
PyTuple_GET_ITEM() can be abused to access directly thePyTupleObject.ob_item member:
PyObject**items=&PyTuple_GET_ITEM(0);
ThePyTuple_GET_ITEM() andPyList_GET_ITEM() macros areconverted to static inline functions to disallow that.
STATUS: Not Started
Making structures opaque requires modifying C extensions touse getter and setter functions. The practical issue is how to keepsupport for old Python versions which don’t have these functions.
For example, in Python 3.10, it is no longer possible to usePy_TYPE() as an l-value. The newPy_SET_TYPE() function must beused instead:
#if PY_VERSION_HEX >= 0x030900A4Py_SET_TYPE(&MyType,&PyType_Type);#elsePy_TYPE(&MyType)=&PyType_Type;#endif
This code may ring a bell to developers who ported their Python codebase from Python 2 to Python 3.
Python will distribute a newpythoncapi_compat.h header file whichprovides new C API functions to old Python versions. Example:
#if PY_VERSION_HEX < 0x030900A4staticinlinevoid_Py_SET_TYPE(PyObject*ob,PyTypeObject*type){ob->ob_type=type;}#define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type)#endif // PY_VERSION_HEX < 0x030900A4
Using this header file,Py_SET_TYPE() can be used on old Pythonversions as well.
Developers can copy this file in their project, or even to onlycopy/paste the few functions needed by their C extension.
STATUS: In Progress (implemented but not distributed by CPython yet)
Thepythoncapi_compat.h header file is currently developed at:https://github.com/pythoncapi/pythoncapi_compat
Process to reduce the number of broken C extensions when introducing CAPI incompatible changes listed in this PEP:
The coordination usually means reporting issues to the projects, or evenproposing changes. It does not require waiting for a new release includingfixes for every broken project.
Since more and more C extensions are written using Cython, ratherdirectly using the C API, it is important to ensure that Cython isprepared in advance for incompatible changes. It gives more time for Cextension maintainers to release a new version with code generated withthe updated Cython (for C extensions distributing the code generated byCython).
Future incompatible changes can be announced by deprecating a functionin the documentation and by annotating the function withPy_DEPRECATED(). But making a structure opaque and preventing theusage of a macro as l-value cannot be deprecated withPy_DEPRECATED().
The important part is coordination and finding a balance between CPythonevolutions and backward compatibility. For example, breaking a random,old, obscure and unmaintained C extension on PyPI is less severe thanbreaking numpy.
If a change is reverted, we move back to the coordination step to betterprepare the change. Once more C extensions are ready, the incompatiblechange can be reconsidered.
pythoncapi_compat.h header and a process isdefined to reduce the number of broken C extensions when introducing CAPI incompatible changes listed in this PEP.This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0620.rst
Last modified:2025-02-01 08:55:40 GMT