Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 620 – Hide implementation details from the C API

Author:
Victor Stinner <vstinner at python.org>
Status:
Withdrawn
Type:
Standards Track
Created:
19-Jun-2020
Python-Version:
3.12

Table of Contents

Abstract

Introduce C API incompatible changes to hide implementation details.

Once most implementation details will be hidden, evolution of CPythoninternals would be less limited by C API backward compatibility issues.It will be way easier to add new features.

It becomes possible to experiment with more advanced optimizations inCPython than just micro-optimizations, like tagged pointers.

Define a process to reduce the number of broken C extensions.

The implementation of this PEP is expected to be done carefully overmultiple Python versions. It already started in Python 3.7 and mostchanges are already completed. TheProcess to reduce the number ofbroken C extensions dictates the rhythm.

PEP withdrawn

This PEP was withdrawn by its author since the scope is too broad and the work isdistributed over multiple Python versions, which makes it difficult to makea decision on the overall PEP. It was split into new PEPs withnarrower and better defined scopes, likePEP 670.

Motivation

The C API blocks CPython evolutions

Adding or removing members of C structures is causing multiple backwardcompatibility issues.

Adding a new member breaks the stable ABI (PEP 384), especially fortypes declared statically (e.g.staticPyTypeObjectMyType={...};). In Python 3.4, thePEP 442 “Safe object finalization” addedthetp_finalize member at the end of thePyTypeObject structure.For ABI backward compatibility, a newPy_TPFLAGS_HAVE_FINALIZE typeflag was required to announce if the type structure contains thetp_finalize member. The flag was removed in Python 3.8 (bpo-32388).

ThePyTypeObject.tp_print member, deprecated since Python 3.0released in 2009, has been removed in the Python 3.8 development cycle.But the change broke too many C extensions and had to be reverted before3.8 final release. Finally, the member was removed again in Python 3.9.

C extensions rely on the ability to access structure members,indirectly through the C API, or even directly. Modifying structureslikePyListObject cannot be even considered.

ThePyTypeObject structure is the one which evolved the most, simplybecause there was no other way to evolve CPython than modifying it.

A C extension can technically dereference aPyObject* pointer andaccessPyObject members. This prevents experiments like taggedpointers (storing small values asPyObject* which does not point toa validPyObject structure).

Replacing Python garbage collector with a tracing garbage collectorwould also need to removePyObject.ob_refcnt reference counter,whereas currentlyPy_INCREF() andPy_DECREF() macros accessdirectly toPyObject.ob_refcnt.

Same CPython design since 1990: structures and reference counting

When the CPython project was created, it was written with one principle:keep the implementation simple enough so it can be maintained by asingle developer. CPython complexity grew a lot and manymicro-optimizations have been implemented, but CPython core design hasnot changed.

Members ofPyObject andPyTupleObject structures have notchanged since the “Initial revision” commit (1990):

#define OB_HEAD \unsignedintob_refcnt; \struct_typeobject*ob_type;typedefstruct_object{OB_HEAD}object;typedefstruct{OB_VARHEADobject*ob_item[1];}tupleobject;

Only names changed:object was renamed toPyObject andtupleobject was renamed toPyTupleObject.

CPython still tracks Python objects lifetime using reference countinginternally and for third party C extensions (through the Python C API).

All Python objects must be allocated on the heap and cannot be moved.

Why is PyPy more efficient than CPython?

The PyPy project is a Python implementation which is 4.2x faster thanCPython on average. PyPy developers chose to not fork CPython, but startfrom scratch to have more freedom in terms of optimization choices.

PyPy does not use reference counting, but a tracing garbage collectorwhich moves objects. Objects can be allocated on the stack (or even notat all), rather than always having to be allocated on the heap.

Objects layouts are designed with performance in mind. For example, alist strategy stores integers directly as integers, rather than objects.

Moreover, PyPy also has a JIT compiler which emits fast code thanks tothe efficient PyPy design.

PyPy bottleneck: the Python C API

While PyPy is way more efficient than CPython to run pure Python code,it is as efficient or slower than CPython to run C extensions.

Since the C API requiresPyObject* and allows to access directlystructure members, PyPy has to associate a CPython object to PyPyobjects and maintain both consistent. Converting a PyPy object to aCPython object is inefficient. Moreover, reference counting also has tobe implemented on top of PyPy tracing garbage collector.

These conversions are required because the Python C API is too close tothe CPython implementation: there is no high-level abstraction.For example, structures members are part of the public C API and nothingprevents a C extension to get or set directlyPyTupleObject.ob_item[0] (the first item of a tuple).

SeeInside cpyext: Why emulating CPython C API is so Hard(Sept 2018) by Antonio Cuni for more details.

Rationale

Hide implementation details

Hiding implementation details from the C API has multiple advantages:

  • It becomes possible to experiment with more advanced optimizations inCPython than just micro-optimizations. For example, tagged pointers,and replace the garbage collector with a tracing garbage collectorwhich can move objects.
  • Adding new features in CPython becomes easier.
  • PyPy should be able to avoid conversions to CPython objects in morecases: keep efficient PyPy objects.
  • It becomes easier to implement the C API for a new Pythonimplementation.
  • More C extensions will be compatible with Python implementations otherthan CPython.

Relationship with the limited C API

ThePEP 384 “Defining a Stable ABI” is implemented in Python 3.4. It introduces the“limited C API”: a subset of the C API. When the limited C API is used,it becomes possible to build a C extension only once and use it onmultiple Python versions: that’s the stable ABI.

The main limitation of thePEP 384 is that C extensions have to opt-infor the limited C API. Only very few projects made this choice,usually to ease distribution of binaries, especially on Windows.

This PEP moves the C API towards the limited C API.

Ideally, the C API will become the limited C API and all C extensionswill use the stable ABI, but this is out of this PEP scope.

Specification

Summary

  • (Completed) Reorganize the C API header files: createInclude/cpython/ andInclude/internal/ subdirectories.
  • (Completed) Move private functions exposing implementation details to the internalC API.
  • (Completed) Convert macros to static inline functions.
  • (Completed) Add new functionsPy_SET_TYPE(),Py_SET_REFCNT() andPy_SET_SIZE(). ThePy_TYPE(),Py_REFCNT() andPy_SIZE() macros become functions which cannot be used as l-value.
  • (Completed) New C API functions must not return borrowedreferences.
  • (In Progress) Providepythoncapi_compat.h header file.
  • (In Progress) Make structures opaque, add getter and setterfunctions.
  • (Not Started) DeprecatePySequence_Fast_ITEMS().
  • (Not Started) ConvertPyTuple_GET_ITEM() andPyList_GET_ITEM() macros to static inline functions.

Reorganize the C API header files

The first consumer of the C API was Python itself. There is no clearseparation between APIs which must not be used outside Python, and APIwhich are public on purpose.

Header files must be reorganized in 3 API:

  • Include/ directory is the limited C API: no implementationdetails, structures are opaque. C extensions using it get a stableABI.
  • Include/cpython/ directory is the CPython C API: less “portable”API, depends more on the Python version, expose some implementationdetails, few incompatible changes can happen.
  • Include/internal/ directory is the internal C API: implementationdetails, incompatible changes are likely at each Python release.

The creation of theInclude/cpython/ directory is fully backwardcompatible.Include/cpython/ header files cannot be includeddirectly and are included automatically byInclude/ header fileswhen thePy_LIMITED_API macro is not defined.

The internal C API is installed and can be used for specific usage likedebuggers and profilers which must access structures members withoutexecuting code. C extensions using the internal C API are tightlycoupled to a Python version and must be recompiled at each Pythonversion.

STATUS: Completed (in Python 3.8)

The reorganization of header files started in Python 3.7 and wascompleted in Python 3.8:

  • bpo-35134: Add a newInclude/cpython/ subdirectory for the “CPython API” withimplementation details.
  • bpo-35081: Move internalheaders toInclude/internal/

Move private functions to the internal C API

Private functions which expose implementation details must be moved tothe internal C API.

If a C extension relies on a CPython private function which exposesCPython implementation details, other Python implementations have tore-implement this private function to support this C extension.

STATUS: Completed (in Python 3.9)

Private functions moved to the internal C API in Python 3.8:

  • _PyObject_GC_TRACK(),_PyObject_GC_UNTRACK()

Macros and functions excluded from the limited C API in Python 3.9:

  • _PyObject_SIZE(),_PyObject_VAR_SIZE()
  • PyThreadState_DeleteCurrent()
  • PyFPE_START_PROTECT(),PyFPE_END_PROTECT()
  • _Py_NewReference(),_Py_ForgetReference()
  • _PyTraceMalloc_NewReference()
  • _Py_GetRefTotal()

Private functions moved to the internal C API in Python 3.9:

  • GC functions like_Py_AS_GC(),_PyObject_GC_IS_TRACKED()and_PyGCHead_NEXT()
  • _Py_AddToAllObjects() (not exported)
  • _PyDebug_PrintTotalRefs(),_Py_PrintReferences(),_Py_PrintReferenceAddresses() (not exported)

Public “clear free list” functions moved to the internal C API andrenamed to private functions in Python 3.9:

  • PyAsyncGen_ClearFreeLists()
  • PyContext_ClearFreeList()
  • PyDict_ClearFreeList()
  • PyFloat_ClearFreeList()
  • PyFrame_ClearFreeList()
  • PyList_ClearFreeList()
  • PyTuple_ClearFreeList()
  • Functions simply removed:
    • PyMethod_ClearFreeList() andPyCFunction_ClearFreeList():bound method free list removed in Python 3.9.
    • PySet_ClearFreeList(): set free list removed in Python 3.4.
    • PyUnicode_ClearFreeList(): Unicode free list removedin Python 3.3.

Convert macros to static inline functions

Converting macros to static inline functions has multiple advantages:

  • Functions have well defined parameter types and return type.
  • Functions can use variables with a well defined scope (the function).
  • Debugger can be put breakpoints on functions and profilers can displaythe function name in the call stacks. In most cases, it works evenwhen a static inline function is inlined.
  • Functions don’t havemacros pitfalls.

Converting macros to static inline functions should only impact very fewC extensions that use macros in unusual ways.

For backward compatibility, functions must continue to accept any type,not onlyPyObject*, to avoid compiler warnings, since most macroscast their parameters toPyObject*.

Python 3.6 requires C compilers to support static inline functions: thePEP 7 requires a subset of C99.

STATUS: Completed (in Python 3.9)

Macros converted to static inline functions in Python 3.8:

  • Py_INCREF(),Py_DECREF()
  • Py_XINCREF(),Py_XDECREF()
  • PyObject_INIT(),PyObject_INIT_VAR()
  • _PyObject_GC_TRACK(),_PyObject_GC_UNTRACK(),_Py_Dealloc()

Macros converted to regular functions in Python 3.9:

  • Py_EnterRecursiveCall(),Py_LeaveRecursiveCall()(added to the limited C API)
  • PyObject_INIT(),PyObject_INIT_VAR()
  • PyObject_GET_WEAKREFS_LISTPTR()
  • PyObject_CheckBuffer()
  • PyIndex_Check()
  • PyObject_IS_GC()
  • PyObject_NEW() (alias toPyObject_New()),PyObject_NEW_VAR() (alias toPyObject_NewVar())
  • PyType_HasFeature() (always callPyType_GetFlags())
  • Py_TRASHCAN_BEGIN_CONDITION() andPy_TRASHCAN_END() macrosnow call functions which hide implementation details, rather thanaccessing directly members of thePyThreadState structure.

Make structures opaque

The following structures of the C API become opaque:

  • PyInterpreterState
  • PyThreadState
  • PyGC_Head
  • PyTypeObject
  • PyObject andPyVarObject
  • PyTypeObject
  • All types which inherit fromPyObject orPyVarObject

C extensions must use getter or setter functions to get or set structuremembers. For example,tuple->ob_item[0] must be replaced withPyTuple_GET_ITEM(tuple,0).

To be able to move away from reference counting,PyObject mustbecome opaque. Currently, the reference counterPyObject.ob_refcntis exposed in the C API. All structures must become opaque, since they“inherit” from PyObject. For,PyFloatObject inherits fromPyObject:

typedefstruct{PyObjectob_base;doubleob_fval;}PyFloatObject;

MakingPyObject fully opaque requires convertingPy_INCREF() andPy_DECREF() macros to function calls. This change has an impact onperformance. It is likely to be one of the very last changes when makingstructures opaque.

MakingPyTypeObject structure opaque breaks C extensions declaringtypes statically (e.g.staticPyTypeObjectMyType={...};). Cextensions must usePyType_FromSpec() to allocate types on the heapinstead. Using heap types has other advantages like being compatiblewith subinterpreters. Combined withPEP 489 “Multi-phase extensionmodule initialization”, it makes a C extension behavior closer to aPython module, like allowing to create more than one module instance.

MakingPyThreadState structure opaque requires adding getter andsetter functions for members used by C extensions.

STATUS: In Progress (started in Python 3.8)

ThePyInterpreterState structure was made opaque in Python 3.8(bpo-35886) and thePyGC_Head structure (bpo-40241) was made opaque in Python 3.9.

Issues tracking the work to prepare the C API to make followingstructures opaque:

  • PyObject:bpo-39573
  • PyTypeObject:bpo-40170
  • PyFrameObject:bpo-40421
    • Python 3.9 addsPyFrame_GetCode() andPyFrame_GetBack()getter functions, and movesPyFrame_GetLineNumber to the limitedC API.
  • PyThreadState:bpo-39947
    • Python 3.9 adds 3 getter functions:PyThreadState_GetFrame(),PyThreadState_GetID(),PyThreadState_GetInterpreter().

Disallow using Py_TYPE() as l-value

ThePy_TYPE() function gets an object type, itsPyObject.ob_typemember. It is implemented as a macro which can be used as an l-value toset the type:Py_TYPE(obj)=new_type. This code relies on theassumption thatPyObject.ob_type can be modified directly. Itprevents making thePyObject structure opaque.

New setter functionsPy_SET_TYPE(),Py_SET_REFCNT() andPy_SET_SIZE() are added and must be used instead.

ThePy_TYPE(),Py_REFCNT() andPy_SIZE() macros must beconverted to static inline functions which can not be used as l-value.

For example, thePy_TYPE() macro:

#define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)

becomes:

#define _PyObject_CAST_CONST(op) ((const PyObject*)(op))staticinlinePyTypeObject*_Py_TYPE(constPyObject*ob){returnob->ob_type;}#define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob))

STATUS: Completed (in Python 3.10)

New functionsPy_SET_TYPE(),Py_SET_REFCNT() andPy_SET_SIZE() were added to Python 3.9.

In Python 3.10,Py_TYPE(),Py_REFCNT() andPy_SIZE() can nolonger be used as l-value and the new setter functions must be usedinstead.

New C API functions must not return borrowed references

When a function returns a borrowed reference, Python cannot track whenthe caller stops using this reference.

For example, if the Pythonlist type is specialized for smallintegers, store directly “raw” numbers rather than Python objects,PyList_GetItem() has to create a temporary Python object. Theproblem is to decide when it is safe to delete the temporary object.

The general guidelines is to avoid returning borrowed references for newC API functions.

No function returning borrowed references is scheduled for removal bythis PEP.

STATUS: Completed (in Python 3.9)

In Python 3.9, new C API functions returning Python objects only returnstrong references:

  • PyFrame_GetBack()
  • PyFrame_GetCode()
  • PyObject_CallNoArgs()
  • PyObject_CallOneArg()
  • PyThreadState_GetFrame()

Avoid functions returning PyObject**

ThePySequence_Fast_ITEMS() function gives a direct access to anarray ofPyObject* objects. The function is deprecated in favor ofPyTuple_GetItem() andPyList_GetItem().

PyTuple_GET_ITEM() can be abused to access directly thePyTupleObject.ob_item member:

PyObject**items=&PyTuple_GET_ITEM(0);

ThePyTuple_GET_ITEM() andPyList_GET_ITEM() macros areconverted to static inline functions to disallow that.

STATUS: Not Started

New pythoncapi_compat.h header file

Making structures opaque requires modifying C extensions touse getter and setter functions. The practical issue is how to keepsupport for old Python versions which don’t have these functions.

For example, in Python 3.10, it is no longer possible to usePy_TYPE() as an l-value. The newPy_SET_TYPE() function must beused instead:

#if PY_VERSION_HEX >= 0x030900A4Py_SET_TYPE(&MyType,&PyType_Type);#elsePy_TYPE(&MyType)=&PyType_Type;#endif

This code may ring a bell to developers who ported their Python codebase from Python 2 to Python 3.

Python will distribute a newpythoncapi_compat.h header file whichprovides new C API functions to old Python versions. Example:

#if PY_VERSION_HEX < 0x030900A4staticinlinevoid_Py_SET_TYPE(PyObject*ob,PyTypeObject*type){ob->ob_type=type;}#define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type)#endif  // PY_VERSION_HEX < 0x030900A4

Using this header file,Py_SET_TYPE() can be used on old Pythonversions as well.

Developers can copy this file in their project, or even to onlycopy/paste the few functions needed by their C extension.

STATUS: In Progress (implemented but not distributed by CPython yet)

Thepythoncapi_compat.h header file is currently developed at:https://github.com/pythoncapi/pythoncapi_compat

Process to reduce the number of broken C extensions

Process to reduce the number of broken C extensions when introducing CAPI incompatible changes listed in this PEP:

  • Estimate how many popular C extensions are affected by theincompatible change.
  • Coordinate with maintainers of broken C extensions to prepare theircode for the future incompatible change.
  • Introduce the incompatible changes in Python. The documentation mustexplain how to port existing code. It is recommended to merge suchchanges at the beginning of a development cycle to have more time fortests.
  • Changes which are the most likely to break a large number of Cextensions should be announced on the capi-sig mailing list to notifyC extensions maintainers to prepare their project for the next Python.
  • If the change breaks too many projects, reverting the change should bediscussed, taking in account the number of broken packages, theirimportance in the Python community, and the importance of the change.

The coordination usually means reporting issues to the projects, or evenproposing changes. It does not require waiting for a new release includingfixes for every broken project.

Since more and more C extensions are written using Cython, ratherdirectly using the C API, it is important to ensure that Cython isprepared in advance for incompatible changes. It gives more time for Cextension maintainers to release a new version with code generated withthe updated Cython (for C extensions distributing the code generated byCython).

Future incompatible changes can be announced by deprecating a functionin the documentation and by annotating the function withPy_DEPRECATED(). But making a structure opaque and preventing theusage of a macro as l-value cannot be deprecated withPy_DEPRECATED().

The important part is coordination and finding a balance between CPythonevolutions and backward compatibility. For example, breaking a random,old, obscure and unmaintained C extension on PyPI is less severe thanbreaking numpy.

If a change is reverted, we move back to the coordination step to betterprepare the change. Once more C extensions are ready, the incompatiblechange can be reconsidered.

Version History

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0620.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp