This PEP was rejected by its author since the design didn’t show anysignificant speedup, but also because of the lack of time to implementthe most advanced and complex optimizations.
Add functions to the Python C API to specialize pure Python functions:add specialized codes with guards. It allows to implement staticoptimizers respecting the Python semantics.
Python is hard to optimize because almost everything is mutable: builtinfunctions, function code, global variables, local variables, … can bemodified at runtime. Implement optimizations respecting the Pythonsemantics requires to detect when “something changes”, we will call thesechecks “guards”.
This PEP proposes to add a public API to the Python C API to addspecialized codes with guards to a function. When the function iscalled, a specialized code is used if nothing changed, otherwise use theoriginal bytecode.
Even if guards help to respect most parts of the Python semantics, it’shard to optimize Python without making subtle changes on the exactbehaviour. CPython has a long history and many applications rely onimplementation details. A compromise must be found between “everythingis mutable” and performance.
Writing an optimizer is out of the scope of this PEP.
There are multiple JIT compilers for Python actively developed:
Numba is specific to numerical computation. Pyston and Pyjion are stillyoung. PyPy is the most complete Python interpreter, it is generallyfaster than CPython in micro- and many macro-benchmarks and has a verygood compatibility with CPython (it respects the Python semantics).There are still issues with Python JIT compilers which avoid them to bewidely used instead of CPython.
Many popular libraries like numpy, PyGTK, PyQt, PySide and wxPython areimplemented in C or C++ and use the Python C API. To have a small memoryfootprint and better performances, Python JIT compilers do not usereference counting to use a faster garbage collector, do not use Cstructures of CPython objects and manage memory allocations differently.PyPy has acpyext module which emulates the Python C API but it hasworse performances than CPython and does not support the full Python CAPI.
New features are first developed in CPython. In January 2016, thelatest CPython stable version is 3.5, whereas PyPy only supports Python2.7 and 3.2, and Pyston only supports Python 2.7.
Even if PyPy has a very good compatibility with Python, some modules arestill not compatible with PyPy: seePyPy Compatibility Wiki. The incompletesupport of the Python C API is part of this problem. There are alsosubtle differences between PyPy and CPython like reference counting:object destructors are always called in PyPy, but can be called “later”than in CPython. Using context managers helps to control when resourcesare released.
Even if PyPy is much faster than CPython in a wide range of benchmarks,some users still report worse performances than CPython on some specificuse cases or unstable performances.
When Python is used as a scripting program for programs running lessthan 1 minute, JIT compilers can be slower because their startup time ishigher and the JIT compiler takes time to optimize the code. Forexample, most Mercurial commands take a few seconds.
Numba now supports ahead of time compilation, but it requires decoratorto specify arguments types and it only supports numerical types.
CPython 3.5 has almost no optimization: the peephole optimizer onlyimplements basic optimizations. A static compiler is a compromisebetween CPython 3.5 and PyPy.
Note
There was also the Unladen Swallow project, but it was abandoned in2011.
Following examples are not written to show powerful optimizationspromising important speedup, but to be short and easy to understand,just to explain the principle.
Examples in this PEP uses a hypotheticalmyoptimizer module whichprovides the following functions and types:
specialize(func,code,guards): add the specialized codecodewith guardsguards to the functionfuncget_specialized(func): get the list of specialized codes as a listof(code,guards) tuples wherecode is a callable or code objectandguards is a list of a guardsGuardBuiltins(name): guard watching forbuiltins.__dict__[name] andglobals()[name]. The guard failsifbuiltins.__dict__[name] is replaced, or ifglobals()[name]is set.Add specialized bytecode where the call to the pure builtin functionchr(65) is replaced with its result"A":
importmyoptimizerdeffunc():returnchr(65)deffast_func():return"A"myoptimizer.specialize(func,fast_func.__code__,[myoptimizer.GuardBuiltins("chr")])delfast_func
Example showing the behaviour of the guard:
print("func():%s"%func())print("#specialized:%s"%len(myoptimizer.get_specialized(func)))print()importbuiltinsbuiltins.chr=lambdaobj:"mock"print("func():%s"%func())print("#specialized:%s"%len(myoptimizer.get_specialized(func)))
Output:
func():A#specialized: 1func():mock#specialized: 0
The first call uses the specialized bytecode which returns the string"A". The second call removes the specialized code because thebuiltinchr() function was replaced, and executes the originalbytecode callingchr(65).
On a microbenchmark, calling the specialized bytecode takes 88 ns,whereas the original function takes 145 ns (+57 ns): 1.6 times as fast.
Add the C builtinchr() function as the specialized code instead ofa bytecode callingchr(obj):
importmyoptimizerdeffunc(arg):returnchr(arg)myoptimizer.specialize(func,chr,[myoptimizer.GuardBuiltins("chr")])
Example showing the behaviour of the guard:
print("func(65):%s"%func(65))print("#specialized:%s"%len(myoptimizer.get_specialized(func)))print()importbuiltinsbuiltins.chr=lambdaobj:"mock"print("func(65):%s"%func(65))print("#specialized:%s"%len(myoptimizer.get_specialized(func)))
Output:
func():A#specialized: 1func():mock#specialized: 0
The first call calls the C builtinchr() function (without creatinga Python frame). The second call removes the specialized code becausethe builtinchr() function was replaced, and executes the originalbytecode.
On a microbenchmark, calling the C builtin takes 95 ns, whereas theoriginal bytecode takes 155 ns (+60 ns): 1.6 times as fast. Callingdirectlychr(65) takes 76 ns.
Pseudo-code to choose the specialized code to call a pure Pythonfunction:
defcall_func(func,args,kwargs):specialized=myoptimizer.get_specialized(func)nspecialized=len(specialized)index=0whileindex<nspecialized:specialized_code,guards=specialized[index]forguardinguards:check=guard(args,kwargs)ifcheck:breakifnotcheck:# all guards succeeded:# use the specialized codereturnspecialized_codeelifcheck==1:# a guard failed temporarily:# try the next specialized codeindex+=1else:assertcheck==2# a guard will always fail:# remove the specialized codedelspecialized[index]# if a guard of each specialized code failed, or if the function# has no specialized code, use original bytecodecode=func.__code__
Changes to the Python C API:
PyFuncGuardObject object and aPyFuncGuard_Type typePySpecializedCode structurePyFunctionObject structure:Py_ssize_tnb_specialized;PySpecializedCode*specialized;
PyFunction_Specialize()PyFunction_GetSpecializedCodes()PyFunction_GetSpecializedCode()PyFunction_RemoveSpecialized()PyFunction_RemoveAllSpecialized()None of these function and types are exposed at the Python level.
All these additions are explicitly excluded of the stable ABI.
When a function code is replaced (func.__code__=new_code), allspecialized codes and guards are removed.
Add a function guard object:
typedefstruct{PyObjectob_base;int(*init)(PyObject*guard,PyObject*func);int(*check)(PyObject*guard,PyObject**stack,intna,intnk);}PyFuncGuardObject;
Theinit() function initializes a guard:
0 on success1 if the guard will always fail:PyFunction_Specialize()must ignore the specialized code-1 on errorThecheck() function checks a guard:
0 on success1 if the guard failed temporarily2 if the guard will always fail: the specialized code mustbe removed-1 on errorstack is an array of arguments: indexed arguments followed by (key,value) pairs of keyword arguments.na is the number of indexedarguments.nk is the number of keyword arguments: the number of (key,value) pairs.stack containsna+nk*2 objects.
Add a specialized code structure:
typedefstruct{PyObject*code;/*callableorcodeobject*/Py_ssize_tnb_guard;PyObject**guards;/*PyFuncGuardObjectobjects*/}PySpecializedCode;
Add a function method to specialize the function, add a specialized codewith guards:
intPyFunction_Specialize(PyObject*func,PyObject*code,PyObject*guards)
Ifcode is a Python function, the code object of thecode functionis used as the specialized code. The specialized Python function musthave the same parameter defaults, the same keyword parameter defaults,and must not have specialized code.
Ifcode is a Python function or a code object, a new code object iscreated and the code name and first line number of the code object offunc are copied. The specialized code must have the same cellvariables and the same free variables.
Result:
0 on success1 if the specialization has been ignored-1 on errorAdd a function method to get the list of specialized codes:
PyObject*PyFunction_GetSpecializedCodes(PyObject*func)
Return a list of (code,guards) tuples wherecode is a callable orcode object andguards is a list ofPyFuncGuard objects. Raise anexception and returnNULL on error.
Add a function method checking guards to choose a specialized code:
PyObject*PyFunction_GetSpecializedCode(PyObject*func,PyObject**stack,intna,intnk)
Seecheck() function of guards forstack,na andnk arguments.Return a callable or a code object on success. Raise an exception andreturnNULL on error.
Add a function method to remove a specialized code with its guards byits index:
intPyFunction_RemoveSpecialized(PyObject*func,Py_ssize_tindex)
Return0 on success or if the index does not exist. Raise an exception andreturn-1 on error.
Add a function method to remove all specialized codes and guards of afunction:
intPyFunction_RemoveAllSpecialized(PyObject*func)
Return0 on success. Raise an exception and return-1 iffunc is nota function.
Microbenchmark onpython3.6-mtimeit-s'deff():pass''f()' (bestof 3 runs):
According to this microbenchmark, the changes has no overhead on callinga Python function without specialization.
Theissue #26098: PEP 510: Specialize functions with guards contains a patch which implementsthis PEP.
This PEP only contains changes to the Python C API, the Python API isunchanged. Other implementations of Python are free to not implement newadditions, or implement added functions as no-op:
PyFunction_Specialize(): always return1 (the specializationhas been ignored)PyFunction_GetSpecializedCodes(): always return an empty listPyFunction_GetSpecializedCode(): return the function code object,as the existingPyFunction_GET_CODE() macroThread on the python-ideas mailing list:RFC: PEP: Specializedfunctions with guards.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0510.rst
Last modified:2025-02-01 08:59:27 GMT