Python Enhancement Proposals

Python »
PEP Index »
PEP 523

PEP 523 – Adding a frame evaluation API to CPython

Author:: Brett Cannon <brett at python.org>,Dino Viehland <dinoviehland at gmail.com>
Status:

Table of Contents

Abstract

This PEP proposes to expand CPython’s C API[2] to allow forthe specification of a per-interpreter function pointer to handle theevaluation of frames[5]. This proposal alsosuggests adding a new field to code objects[3] to storearbitrary data for use by the frame evaluation function.

Rationale

One place where flexibility has been lacking in Python is in the directexecution of Python code. While CPython’s C API[2] allows forconstructing the data going into a frame object and then evaluating itviaPyEval_EvalFrameEx()[5], control over theexecution of Python code comes down to individual objects instead of aholistic control of execution at the frame level.

While wanting to have influence over frame evaluation may seem a bittoo low-level, it does open the possibility for things such as amethod-level JIT to be introduced into CPython without CPython itselfhaving to provide one. By allowing external C code to control frameevaluation, a JIT can participate in the execution of Python code atthe key point where evaluation occurs. This then allows for a JIT toconditionally recompile Python bytecode to machine code as desiredwhile still allowing for executing regular CPython bytecode whenrunning the JIT is not desired. This can be accomplished by allowinginterpreters to specify what function to call to evaluate a frame. Andby placing the API at the frame evaluation level it allows for acomplete view of the execution environment of the code for the JIT.

This ability to specify a frame evaluation function also allows forother use-cases beyond just opening CPython up to a JIT. For instance,it would not be difficult to implement a tracing or profiling functionat the call level with this API. While CPython does provide theability to set a tracing or profiling function at the Python level,this would be able to match the data collection of the profiler andquite possibly be faster for tracing by simply skipping per-linetracing support.

It also opens up the possibility of debugging where the frameevaluation function only performs special debugging work when itdetects it is about to execute a specific code object. In thatinstance the bytecode could be theoretically rewritten in-place toinject a breakpoint function call at the proper point for help indebugging while not having to do a heavy-handed approach asrequired bysys.settrace().

To help facilitate these use-cases, we are also proposing the addingof a “scratch space” on code objects via a new field. This will allowper-code object data to be stored with the code object itself for easyretrieval by the frame evaluation function as necessary. The fielditself will simply be aPyObject* type so that any data stored inthe field will participate in normal object memory management.

Proposal

All proposed C API changes below will not be part of the stable ABI.

Expanding`PyCodeObject`

One field is to be added to thePyCodeObject struct[3]:

typedefstruct{...void*co_extra;/*"Scratch space"forthecodeobject.*/}PyCodeObject;

Theco_extra will beNULL by default and only filled in asneeded. Values stored in the field are expected to not be requiredin order for the code object to function, allowing the loss of thedata of the field to be acceptable.

A private API has been introduced to work with the field:

PyAPI_FUNC(Py_ssize_t)_PyEval_RequestCodeExtraIndex(freefunc);PyAPI_FUNC(int)_PyCode_GetExtra(PyObject*code,Py_ssize_tindex,void**extra);PyAPI_FUNC(int)_PyCode_SetExtra(PyObject*code,Py_ssize_tindex,void*extra);

Users of the field are expected to call_PyEval_RequestCodeExtraIndex() to receive (what should beconsidered) an opaque index value to adding data intoco-extra.With that index, users can set data using_PyCode_SetExtra() andlater retrieve the data with_PyCode_GetExtra(). The API ispurposefully listed as private to communicate the fact that there areno semantic guarantees of the API between Python releases.

Using a list and tuple were considered but was found to be lessperformant, and with a key use-case being JIT usage the performanceconsideration won out for using a custom struct instead of a Pythonobject.

A dict was also considered, but once again performance was moreimportant. While a dict will have constant overhead in looking updata, the overhead for the common case of a single object being storedin the data structure leads to a tuple having better performancecharacteristics (i.e. iterating a tuple of length 1 is faster thanthe overhead of hashing and looking up an object in a dict).

Expanding`PyInterpreterState`

The entrypoint for the frame evaluation function is per-interpreter:

//SametypesignatureasPyEval_EvalFrameEx().typedefPyObject*(*_PyFrameEvalFunction)(PyFrameObject*,int);typedefstruct{..._PyFrameEvalFunctioneval_frame;}PyInterpreterState;

By default, theeval_frame field will be initialized to a functionpointer that represents whatPyEval_EvalFrameEx() currently is(called_PyEval_EvalFrameDefault(), discussed later in this PEP).Third-party code may then set their own frame evaluation functioninstead to control the execution of Python code. A pointer comparisoncan be used to detect if the field is set to_PyEval_EvalFrameDefault() and thus has not been mutated yet.

Changes to`Python/ceval.c`

PyEval_EvalFrameEx()[5] as it currently standswill be renamed to_PyEval_EvalFrameDefault(). The newPyEval_EvalFrameEx() will then become:

PyObject*PyEval_EvalFrameEx(PyFrameObject*frame,intthrowflag){PyThreadState*tstate=PyThreadState_GET();returntstate->interp->eval_frame(frame,throwflag);}

This allows third-party code to place themselves directly in the pathof Python code execution while being backwards-compatible with codealready using the pre-existing C API.

Updating`python-gdb.py`

The generatedpython-gdb.py file used for Python support in GDBmakes some hard-coded assumptions aboutPyEval_EvalFrameEx(), e.g.the names of local variables. It will need to be updated to work withthe proposed changes.

Performance impact

As this PEP is proposing an API to add pluggability, performanceimpact is considered only in the case where no third-party code hasmade any changes.

Several runs of pybench[14] consistently showed no performancecost from the API change alone.

A run of the Python benchmark suite[9] showed nomeasurable cost in performance.

In terms of memory impact, since there are typically not many CPythoninterpreters executing in a single process that means the impact ofco_extra being added toPyCodeObject is the only worry.According to[8], a run of the Python test suiteresults in about 72,395 code objects being created. On a 64-bitCPU that would result in 579,160 bytes of extra memory being used ifall code objects were alive at once and had nothing set in theirco_extra fields.

Example Usage

A JIT for CPython

Pyjion

The Pyjion project[1] has used this proposed API to implementa JIT for CPython using the CoreCLR’s JIT[4]. Each codeobject has itsco_extra field set to aPyjionJittedCode objectwhich stores four pieces of information:

Execution count
A boolean representing whether a previous attempt to JIT failed
A function pointer to a trampoline (which can be type tracing or not)
A void pointer to any JIT-compiled machine code

The frame evaluation function has (roughly) the following algorithm:

defeval_frame(frame,throw_flag):pyjion_code=frame.code.co_extraifnotpyjion_code:frame.code.co_extra=PyjionJittedCode()elifnotpyjion_code.jit_failed:ifnotpyjion_code.jit_code:returnpyjion_code.eval(pyjion_code.jit_code,frame)elifpyjion_code.exec_count>20_000:ifjit_compile(frame):returnpyjion_code.eval(pyjion_code.jit_code,frame)else:pyjion_code.jit_failed=Truepyjion_code.exec_count+=1return_PyEval_EvalFrameDefault(frame,throw_flag)

The key point, though, is that all of this work and logic is separatefrom CPython and yet with the proposed API changes it is able toprovide a JIT that is compliant with Python semantics (as of thiswriting, performance is almost equivalent to CPython without the newAPI). This means there’s nothing technically preventing others fromimplementing their own JITs for CPython by utilizing the proposed API.

Other JITs

It should be mentioned that the Pyston team was consulted on anearlier version of this PEP that was more JIT-specific and they werenot interested in utilizing the changes proposed because they wantcontrol over memory layout they had no interest in directly supportingCPython itself. An informal discussion with a developer on the PyPyteam led to a similar comment.

Numba[6], on the other hand, suggested that they would beinterested in the proposed change in a post-1.0 future forthemselves[7].

The experimental Coconut JIT[13] could have benefitted fromthis PEP. In private conversations with Coconut’s creator we were toldthat our API was probably superior to the one they developed forCoconut to add JIT support to CPython.

Debugging

In conversations with the Python Tools for Visual Studio team (PTVS)[12], they thought they would find these API changes useful forimplementing more performant debugging. As mentioned in theRationalesection, this API would allow for switching on debugging functionalityonly in frames where it is needed. This could allow for eitherskipping information thatsys.settrace() normally provides andeven go as far as to dynamically rewrite bytecode prior to executionto inject e.g. breakpoints in the bytecode.

It also turns out that Google provides a very similar APIinternally. It has been used for performant debugging purposes.

Implementation

A set of patches implementing the proposed API is available throughthe Pyjion project[1]. In its current form it has morechanges to CPython than just this proposed API, but that is for easeof development instead of strict requirements to accomplish its goals.

Open Issues

Allow`eval_frame` to be`NULL`

Currently the frame evaluation function is expected to always be set.It could very easily simply default toNULL instead which wouldsignal to use_PyEval_EvalFrameDefault(). The current proposal ofnot special-casing the field seemed the most straightforward, but itdoes require that the field not accidentally be cleared, else a crashmay occur.

Rejected Ideas

A JIT-specific C API

Originally this PEP was going to propose a much larger API changewhich was more JIT-specific. After soliciting feedback from the Numbateam[6], though, it became clear that the API was unnecessarilylarge. The realization was made that all that was truly needed was theopportunity to provide a trampoline function to handle execution ofPython code that had been JIT-compiled and a way to attach thatcompiled machine code along with other critical data to thecorresponding Python code object. Once it was shown that there was noloss in functionality or in performance while minimizing the APIchanges required, the proposal was changed to its current form.

Is co_extra needed?

While discussing this PEP at PyCon US 2016, some core developersexpressed their worry of theco_extra field making code objectsmutable. The thinking seemed to be that having a field that wasmutated after the creation of the code object made the object seemmutable, even though no other aspect of code objects changed.

The view of this PEP is that theco_extra field doesn’t change thefact that code objects are immutable. The field is specified in thisPEP to not contain information required to make the code objectusable, making it more of a caching field. It could be viewed assimilar to the UTF-8 cache that string objects have internally;strings are still considered immutable even though they have a fieldthat is conditionally set.

Performance measurements were also made where the field was notavailable for JIT workloads. The loss of the field was deemed toocostly to performance when using an unordered map from C++ or Python’sdict to associated a code object with JIT-specific data objects.