Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 510 – Specialize functions with guards

Author:
Victor Stinner <vstinner at python.org>
Status:
Rejected
Type:
Standards Track
Created:
04-Jan-2016
Python-Version:
3.6

Table of Contents

Rejection Notice

This PEP was rejected by its author since the design didn’t show anysignificant speedup, but also because of the lack of time to implementthe most advanced and complex optimizations.

Abstract

Add functions to the Python C API to specialize pure Python functions:add specialized codes with guards. It allows to implement staticoptimizers respecting the Python semantics.

Rationale

Python semantics

Python is hard to optimize because almost everything is mutable: builtinfunctions, function code, global variables, local variables, … can bemodified at runtime. Implement optimizations respecting the Pythonsemantics requires to detect when “something changes”, we will call thesechecks “guards”.

This PEP proposes to add a public API to the Python C API to addspecialized codes with guards to a function. When the function iscalled, a specialized code is used if nothing changed, otherwise use theoriginal bytecode.

Even if guards help to respect most parts of the Python semantics, it’shard to optimize Python without making subtle changes on the exactbehaviour. CPython has a long history and many applications rely onimplementation details. A compromise must be found between “everythingis mutable” and performance.

Writing an optimizer is out of the scope of this PEP.

Why not a JIT compiler?

There are multiple JIT compilers for Python actively developed:

Numba is specific to numerical computation. Pyston and Pyjion are stillyoung. PyPy is the most complete Python interpreter, it is generallyfaster than CPython in micro- and many macro-benchmarks and has a verygood compatibility with CPython (it respects the Python semantics).There are still issues with Python JIT compilers which avoid them to bewidely used instead of CPython.

Many popular libraries like numpy, PyGTK, PyQt, PySide and wxPython areimplemented in C or C++ and use the Python C API. To have a small memoryfootprint and better performances, Python JIT compilers do not usereference counting to use a faster garbage collector, do not use Cstructures of CPython objects and manage memory allocations differently.PyPy has acpyext module which emulates the Python C API but it hasworse performances than CPython and does not support the full Python CAPI.

New features are first developed in CPython. In January 2016, thelatest CPython stable version is 3.5, whereas PyPy only supports Python2.7 and 3.2, and Pyston only supports Python 2.7.

Even if PyPy has a very good compatibility with Python, some modules arestill not compatible with PyPy: seePyPy Compatibility Wiki. The incompletesupport of the Python C API is part of this problem. There are alsosubtle differences between PyPy and CPython like reference counting:object destructors are always called in PyPy, but can be called “later”than in CPython. Using context managers helps to control when resourcesare released.

Even if PyPy is much faster than CPython in a wide range of benchmarks,some users still report worse performances than CPython on some specificuse cases or unstable performances.

When Python is used as a scripting program for programs running lessthan 1 minute, JIT compilers can be slower because their startup time ishigher and the JIT compiler takes time to optimize the code. Forexample, most Mercurial commands take a few seconds.

Numba now supports ahead of time compilation, but it requires decoratorto specify arguments types and it only supports numerical types.

CPython 3.5 has almost no optimization: the peephole optimizer onlyimplements basic optimizations. A static compiler is a compromisebetween CPython 3.5 and PyPy.

Note

There was also the Unladen Swallow project, but it was abandoned in2011.

Examples

Following examples are not written to show powerful optimizationspromising important speedup, but to be short and easy to understand,just to explain the principle.

Hypothetical myoptimizer module

Examples in this PEP uses a hypotheticalmyoptimizer module whichprovides the following functions and types:

  • specialize(func,code,guards): add the specialized codecodewith guardsguards to the functionfunc
  • get_specialized(func): get the list of specialized codes as a listof(code,guards) tuples wherecode is a callable or code objectandguards is a list of a guards
  • GuardBuiltins(name): guard watching forbuiltins.__dict__[name] andglobals()[name]. The guard failsifbuiltins.__dict__[name] is replaced, or ifglobals()[name]is set.

Using bytecode

Add specialized bytecode where the call to the pure builtin functionchr(65) is replaced with its result"A":

importmyoptimizerdeffunc():returnchr(65)deffast_func():return"A"myoptimizer.specialize(func,fast_func.__code__,[myoptimizer.GuardBuiltins("chr")])delfast_func

Example showing the behaviour of the guard:

print("func():%s"%func())print("#specialized:%s"%len(myoptimizer.get_specialized(func)))print()importbuiltinsbuiltins.chr=lambdaobj:"mock"print("func():%s"%func())print("#specialized:%s"%len(myoptimizer.get_specialized(func)))

Output:

func():A#specialized: 1func():mock#specialized: 0

The first call uses the specialized bytecode which returns the string"A". The second call removes the specialized code because thebuiltinchr() function was replaced, and executes the originalbytecode callingchr(65).

On a microbenchmark, calling the specialized bytecode takes 88 ns,whereas the original function takes 145 ns (+57 ns): 1.6 times as fast.

Using builtin function

Add the C builtinchr() function as the specialized code instead ofa bytecode callingchr(obj):

importmyoptimizerdeffunc(arg):returnchr(arg)myoptimizer.specialize(func,chr,[myoptimizer.GuardBuiltins("chr")])

Example showing the behaviour of the guard:

print("func(65):%s"%func(65))print("#specialized:%s"%len(myoptimizer.get_specialized(func)))print()importbuiltinsbuiltins.chr=lambdaobj:"mock"print("func(65):%s"%func(65))print("#specialized:%s"%len(myoptimizer.get_specialized(func)))

Output:

func():A#specialized: 1func():mock#specialized: 0

The first call calls the C builtinchr() function (without creatinga Python frame). The second call removes the specialized code becausethe builtinchr() function was replaced, and executes the originalbytecode.

On a microbenchmark, calling the C builtin takes 95 ns, whereas theoriginal bytecode takes 155 ns (+60 ns): 1.6 times as fast. Callingdirectlychr(65) takes 76 ns.

Choose the specialized code

Pseudo-code to choose the specialized code to call a pure Pythonfunction:

defcall_func(func,args,kwargs):specialized=myoptimizer.get_specialized(func)nspecialized=len(specialized)index=0whileindex<nspecialized:specialized_code,guards=specialized[index]forguardinguards:check=guard(args,kwargs)ifcheck:breakifnotcheck:# all guards succeeded:# use the specialized codereturnspecialized_codeelifcheck==1:# a guard failed temporarily:# try the next specialized codeindex+=1else:assertcheck==2# a guard will always fail:# remove the specialized codedelspecialized[index]# if a guard of each specialized code failed, or if the function# has no specialized code, use original bytecodecode=func.__code__

Changes

Changes to the Python C API:

  • Add aPyFuncGuardObject object and aPyFuncGuard_Type type
  • Add aPySpecializedCode structure
  • Add the following fields to thePyFunctionObject structure:
    Py_ssize_tnb_specialized;PySpecializedCode*specialized;
  • Add function methods:
    • PyFunction_Specialize()
    • PyFunction_GetSpecializedCodes()
    • PyFunction_GetSpecializedCode()
    • PyFunction_RemoveSpecialized()
    • PyFunction_RemoveAllSpecialized()

None of these function and types are exposed at the Python level.

All these additions are explicitly excluded of the stable ABI.

When a function code is replaced (func.__code__=new_code), allspecialized codes and guards are removed.

Function guard

Add a function guard object:

typedefstruct{PyObjectob_base;int(*init)(PyObject*guard,PyObject*func);int(*check)(PyObject*guard,PyObject**stack,intna,intnk);}PyFuncGuardObject;

Theinit() function initializes a guard:

  • Return0 on success
  • Return1 if the guard will always fail:PyFunction_Specialize()must ignore the specialized code
  • Raise an exception and return-1 on error

Thecheck() function checks a guard:

  • Return0 on success
  • Return1 if the guard failed temporarily
  • Return2 if the guard will always fail: the specialized code mustbe removed
  • Raise an exception and return-1 on error

stack is an array of arguments: indexed arguments followed by (key,value) pairs of keyword arguments.na is the number of indexedarguments.nk is the number of keyword arguments: the number of (key,value) pairs.stack containsna+nk*2 objects.

Specialized code

Add a specialized code structure:

typedefstruct{PyObject*code;/*callableorcodeobject*/Py_ssize_tnb_guard;PyObject**guards;/*PyFuncGuardObjectobjects*/}PySpecializedCode;

Function methods

PyFunction_Specialize

Add a function method to specialize the function, add a specialized codewith guards:

intPyFunction_Specialize(PyObject*func,PyObject*code,PyObject*guards)

Ifcode is a Python function, the code object of thecode functionis used as the specialized code. The specialized Python function musthave the same parameter defaults, the same keyword parameter defaults,and must not have specialized code.

Ifcode is a Python function or a code object, a new code object iscreated and the code name and first line number of the code object offunc are copied. The specialized code must have the same cellvariables and the same free variables.

Result:

  • Return0 on success
  • Return1 if the specialization has been ignored
  • Raise an exception and return-1 on error

PyFunction_GetSpecializedCodes

Add a function method to get the list of specialized codes:

PyObject*PyFunction_GetSpecializedCodes(PyObject*func)

Return a list of (code,guards) tuples wherecode is a callable orcode object andguards is a list ofPyFuncGuard objects. Raise anexception and returnNULL on error.

PyFunction_GetSpecializedCode

Add a function method checking guards to choose a specialized code:

PyObject*PyFunction_GetSpecializedCode(PyObject*func,PyObject**stack,intna,intnk)

Seecheck() function of guards forstack,na andnk arguments.Return a callable or a code object on success. Raise an exception andreturnNULL on error.

PyFunction_RemoveSpecialized

Add a function method to remove a specialized code with its guards byits index:

intPyFunction_RemoveSpecialized(PyObject*func,Py_ssize_tindex)

Return0 on success or if the index does not exist. Raise an exception andreturn-1 on error.

PyFunction_RemoveAllSpecialized

Add a function method to remove all specialized codes and guards of afunction:

intPyFunction_RemoveAllSpecialized(PyObject*func)

Return0 on success. Raise an exception and return-1 iffunc is nota function.

Benchmark

Microbenchmark onpython3.6-mtimeit-s'deff():pass''f()' (bestof 3 runs):

  • Original Python: 79 ns
  • Patched Python: 79 ns

According to this microbenchmark, the changes has no overhead on callinga Python function without specialization.

Implementation

Theissue #26098: PEP 510: Specialize functions with guards contains a patch which implementsthis PEP.

Other implementations of Python

This PEP only contains changes to the Python C API, the Python API isunchanged. Other implementations of Python are free to not implement newadditions, or implement added functions as no-op:

  • PyFunction_Specialize(): always return1 (the specializationhas been ignored)
  • PyFunction_GetSpecializedCodes(): always return an empty list
  • PyFunction_GetSpecializedCode(): return the function code object,as the existingPyFunction_GET_CODE() macro

Discussion

Thread on the python-ideas mailing list:RFC: PEP: Specializedfunctions with guards.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0510.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp