Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 509 – Add a private version to dict

Author:
Victor Stinner <vstinner at python.org>
Status:
Superseded
Type:
Standards Track
Created:
04-Jan-2016
Python-Version:
3.6
Post-History:
08-Jan-2016,11-Jan-2016,14-Apr-2016,19-Apr-2016
Superseded-By:
699
Resolution:
Python-Dev message

Table of Contents

Abstract

Add a new private version to the builtindict type, incremented ateach dictionary creation and at each dictionary change, to implementfast guards on namespaces.

Rationale

In Python, the builtindict type is used by many instructions. Forexample, theLOAD_GLOBAL instruction looks up a variable in theglobal namespace, or in the builtins namespace (two dict lookups).Python usesdict for the builtins namespace, globals namespace, typenamespaces, instance namespaces, etc. The local namespace (functionnamespace) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtinfunctions, function code, global variables, local variables, … can bemodified at runtime. Implementing optimizations respecting the Pythonsemantics requires to detect when “something changes”: we will callthese checks “guards”.

The speedup of optimizations depends on the speed of guard checks. ThisPEP proposes to add a private version to dictionaries to implement fastguards on namespaces.

Dictionary lookups can be skipped if the version does not change, whichis the common case for most namespaces. The version is globally unique,so checking the version is also enough to verify that the namespacedictionary was not replaced with a new dictionary.

When the dictionary version does not change, the performance of a guarddoes not depend on the number of watched dictionary entries: thecomplexity is O(1).

Example of optimization: copy the value of a global variable to functionconstants. This optimization requires a guard on the global variable tocheck if it was modified after it was copied. If the global variable isnot modified, the function uses the cached copy. If the global variableis modified, the function uses a regular lookup, and maybe alsodeoptimizes the function (to remove the overhead of the guard check fornext function calls).

See the510 – Specialized functions with guardsfor concrete usage ofguards to specialize functions and for a more general rationale onPython static optimizers.

Guard example

Pseudo-code of a fast guard to check if a dictionary entry was modified(created, updated or deleted) using a hypotheticaldict_get_version(dict) function:

UNSET=object()classGuardDictKey:def__init__(self,dict,key):self.dict=dictself.key=keyself.value=dict.get(key,UNSET)self.version=dict_get_version(dict)defcheck(self):"""Return True if the dictionary entry did not change        and the dictionary was not replaced."""# read the version of the dictionaryversion=dict_get_version(self.dict)ifversion==self.version:# Fast-path: dictionary lookup avoidedreturnTrue# lookup in the dictionaryvalue=self.dict.get(self.key,UNSET)ifvalueisself.value:# another key was modified:# cache the new dictionary versionself.version=versionreturnTrue# the key was modifiedreturnFalse

Usage of the dict version

Speedup method calls

Yury Selivanov wrote apatch to optimize method calls. The patch depends on the“implement per-opcode cache in ceval” patch which requires dictionaryversions to invalidate the cache if the globals dictionary or thebuiltins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.It is possible to define a function in a namespace and call it in adifferent namespace, usingexec() with theglobals parameter forexample. In this case, the globals dictionary was replaced and the cachemust also be invalidated.

Specialized functions using guards

PEP 510 proposes an API to supportspecialized functions with guards. It allows to implement staticoptimizers for Python without breaking the Python semantics.

Thefatoptimizer of theFATPython projectis an example of a static Python optimizer. It implements manyoptimizations which require guards on namespaces:

  • Call pure builtins: to replacelen("abc") with3, guards onbuiltins.__dict__['len'] andglobals()['len'] are required
  • Loop unrolling: to unroll the loopforiinrange(...):...,guards onbuiltins.__dict__['range'] andglobals()['range']are required
  • etc.

Pyjion

According of Brett Cannon, one of the two main developers of Pyjion,Pyjion can benefit from dictionary version to implement optimizations.

Pyjion is a JIT compiler forPython based upon CoreCLR (Microsoft .NET Core runtime).

Cython

Cython can benefit from dictionary version to implement optimizations.

Cython is an optimising static compiler for boththe Python programming language and the extended Cython programminglanguage.

Unladen Swallow

Even if dictionary version was not explicitly mentioned, optimizingglobals and builtins lookup was part of the Unladen Swallow plan:“Implement one of the several proposed schemes for speeding lookups ofglobals and builtins.” (source:Unladen Swallow ProjectPlan).

Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compilerimplemented with LLVM. The project stopped in 2011:Unladen SwallowRetrospective.

Changes

Add ama_version_tag field to thePyDictObject structure withthe C typePY_UINT64_T, 64-bit unsigned integer. Add also a globaldictionary version.

Each time a dictionary is created, the global version is incremented andthe dictionary version is initialized to the global version.

Each time the dictionary content is modified, the global version must beincremented and copied to the dictionary version. Dictionary methodswhich can modify its content:

  • clear()
  • pop(key)
  • popitem()
  • setdefault(key,value)
  • __delitem__(key)
  • __setitem__(key,value)
  • update(...)

The choice of increasing or not the version when a dictionary methoddoes not change its content is left to the Python implementation. APython implementation can decide to not increase the version to avoiddictionary lookups in guards. Examples of cases when dictionary methodsdon’t modify its content:

  • clear() if the dict is already empty
  • pop(key) if the key does not exist
  • popitem() if the dict is empty
  • setdefault(key,value) if the key already exists
  • __delitem__(key) if the key does not exist
  • __setitem__(key,value) if the new value is identical to thecurrent value
  • update() if called without argument or if new values are identicalto current values

Setting a key to a new value equals to the old value is also consideredas an operation modifying the dictionary content.

Two different empty dictionaries must have a different version to beable to identify a dictionary just by its version. It allows to verifyin a guard that a namespace was not replaced without storing a strongreference to the dictionary. Using a borrowed reference does not work:if the old dictionary is destroyed, it is possible that a new dictionaryis allocated at the same memory address. By the way, dictionaries don’tsupport weak references.

The version increase must be atomic. In CPython, the Global InterpreterLock (GIL) already protectsdict methods to make changes atomic.

Example using a hypotheticaldict_get_version(dict) function:

>>>d={}>>>dict_get_version(d)100>>>d['key']='value'>>>dict_get_version(d)101>>>d['key']='new value'>>>dict_get_version(d)102>>>deld['key']>>>dict_get_version(d)103

The field is calledma_version_tag, rather thanma_version, tosuggest to compare it usingversion_tag==old_version_tag, ratherthanversion<=old_version which becomes wrong after an integeroverflow.

Backwards Compatibility

Since thePyDictObject structure is not part of the stable ABI andthe new dictionary version not exposed at the Python scope, changes arebackward compatible.

Implementation and Performance

Theissue #26058: PEP 509: Add ma_version_tag to PyDictObject contains a patch implementingthis PEP.

On pybench and timeit microbenchmarks, the patch does not seem to addany overhead on dictionary operations. For example, the following timeitmicro-benchmarks takes 318 nanoseconds before and after the change:

python3.6-mtimeit'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'

When the version does not change,PyDict_GetItem() takes 14.8 ns fora dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,a guard can watch for multiple keys. For example, for an optimizationusing 10 global variables in a function, 10 dictionary lookups costs 148ns, whereas the guard still only costs 3.8 ns when the version does notchange (39x as fast).

Thefat module implementssuch guards:fat.GuardDict is based on the dictionary version.

Integer overflow

The implementation uses the C typePY_UINT64_T to store the version:a 64 bits unsigned integer. The C code usesversion++. On integeroverflow, the version is wrapped to0 (and then continues to beincremented) according to the C standard.

After an integer overflow, a guard can succeed whereas the watcheddictionary key was modified. The bug only occurs at a guard check ifthere are exactly2**64 dictionary creations or modificationssince the previous guard check.

If a dictionary is modified every nanosecond,2**64 modificationstakes longer than 584 years. Using a 32-bit version, it only takes 4seconds. That’s why a 64-bit unsigned type is also used on 32-bitsystems. A dictionary lookup at the C level takes 14.8 ns.

A risk of a bug every 584 years is acceptable.

Alternatives

Expose the version at Python level as a read-only __version__ property

The first version of the PEP proposed to expose the dictionary versionas a read-only__version__ property at Python level, and also to addthe property tocollections.UserDict (since this type must mimicthedict API).

There are multiple issues:

  • To be consistent and avoid bad surprises, the version must be added toall mapping types. Implementing a new mapping type would require extrawork for no benefit, since the version is only required on thedict type in practice.
  • All Python implementations would have to implement this new property,it gives more work to other implementations, whereas they may not usethe dictionary version at all.
  • Exposing the dictionary version at the Python level can lead thefalse assumption on performances. Checkingdict.__version__ atthe Python level is not faster than a dictionary lookup. A dictionarylookup in Python has a cost of 48.7 ns and checking the version has acost of 47.5 ns, the difference is only 1.2 ns (3%):
    $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'10000000 loops, best of 3: 0.0487 usec per loop$ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'10000000 loops, best of 3: 0.0475 usec per loop
  • The__version__ can be wrapped on integer overflow. It is errorprone: usingdict.__version__<=guard_version is wrong,dict.__version__==guard_version must be used instead to reducethe risk of bug on integer overflow (even if the integer overflow isunlikely in practice).

Mandatory bikeshedding on the property name:

  • __cache_token__: name proposed by Alyssa Coghlan, name coming fromabc.get_cache_token().
  • __version__
  • __version_tag__
  • __timestamp__

Add a version to each dict entry

A single version per dictionary requires to keep a strong reference tothe value which can keep the value alive longer than expected. If we addalso a version per dictionary entry, the guard can only store the entryversion (a simple integer) to avoid the strong reference to the value:only strong references to the dictionary and to the key are needed.

Changes: add ame_version_tag field to thePyDictKeyEntrystructure, the field has the C typePY_UINT64_T. When a key iscreated or modified, the entry version is set to the dictionary versionwhich is incremented at any change (create, modify, delete).

Pseudo-code of a fast guard to check if a dictionary key was modifiedusing hypotheticaldict_get_version(dict) anddict_get_entry_version(dict) functions:

UNSET=object()classGuardDictKey:def__init__(self,dict,key):self.dict=dictself.key=keyself.dict_version=dict_get_version(dict)self.entry_version=dict_get_entry_version(dict,key)defcheck(self):"""Return True if the dictionary entry did not change        and the dictionary was not replaced."""# read the version of the dictionarydict_version=dict_get_version(self.dict)ifdict_version==self.version:# Fast-path: dictionary lookup avoidedreturnTrue# lookup in the dictionary to read the entry versionentry_version=get_dict_key_version(dict,key)ifentry_version==self.entry_version:# another key was modified:# cache the new dictionary versionself.dict_version=dict_versionself.entry_version=entry_versionreturnTrue# the key was modifiedreturnFalse

The main drawback of this option is the impact on the memory footprint.It increases the size of each dictionary entry, so the overhead dependson the number of buckets (dictionary entries, used or not used). Forexample, it increases the size of each dictionary entry by 8 bytes on64-bit system.

In Python, the memory footprint matters and the trend is to reduce it.Examples:

  • PEP 393 – Flexible String Representation
  • PEP 412 – Key-Sharing Dictionary

Add a new dict subtype

Add a newverdict type, subtype ofdict. When guards are needed,use theverdict for namespaces (module namespace, type namespace,instance namespace, etc.) instead ofdict.

Leave thedict type unchanged to not add any overhead (CPU, memoryfootprint) when guards are not used.

Technical issue: a lot of C code in the wild, including CPython core,expecting the exactdict type. Issues:

  • exec() requires adict for globals and locals. A lot of codeuseglobals={}. It is not possible to cast thedict to adict subtype because the caller expects theglobals parameterto be modified (dict is mutable).
  • C functions call directlyPyDict_xxx() functions, instead of callingPyObject_xxx() if the object is adict subtype
  • PyDict_CheckExact() check fails ondict subtype, whereas somefunctions require the exactdict type.
  • Python/ceval.c does not completely supports dict subtypes fornamespaces

Theexec() issue is a blocker issue.

Other issues:

  • The garbage collector has a special code to “untrack”dictinstances. If adict subtype is used for namespaces, the garbagecollector can be unable to break some reference cycles.
  • Some functions have a fast-path fordict which would not be takenfordict subtypes, and so it would make Python a little bitslower.

Prior Art

Method cache and type version tag

In 2007, Armin Rigo wrote a patch to implement a cache of methods. Itwas merged into Python 2.6. The patch adds a “type attribute cacheversion tag” (tp_version_tag) and a “valid version tag” flag totypes (thePyTypeObject structure).

The type version tag is not exposed at the Python level.

The version tag has the C typeunsignedint. The cache is a globalhash table of 4096 entries, shared by all types. The cache is global to“make it fast, have a deterministic and low memory footprint, and beeasy to invalidate”. Each cache entry has a version tag. A globalversion tag is used to create the next version tag, it also has the Ctypeunsignedint.

By default, a type has its “valid version tag” flag cleared to indicatethat the version tag is invalid. When the first method of the type iscached, the version tag and the “valid version tag” flag are set. When atype is modified, the “valid version tag” flag of the type and itssubclasses is cleared. Later, when a cache entry of these types is used,the entry is removed because its version tag is outdated.

On integer overflow, the whole cache is cleared and the global versiontag is reset to0.

SeeMethod cache (issue #1685986) andArmin’s method cacheoptimization updated for Python 2.6 (issue #1700288).

Globals / builtins cache

In 2010, Antoine Pitrou proposed aGlobals / builtins cache (issue#10401) which adds a privatema_version field to thePyDictObject structure (dict type),the field has the C typePy_ssize_t.

The patch adds a “global and builtin cache” to functions and frames, andchangesLOAD_GLOBAL andSTORE_GLOBAL instructions to use thecache.

The change on thePyDictObject structure is very similar to thisPEP.

Cached globals+builtins lookup

In 2006, Andrea Griffini proposed a patch implementing aCachedglobals+builtins lookup optimization. The patch adds a privatetimestamp field to thePyDictObject structure (dict type),the field has the C typesize_t.

Thread on python-dev:About dictionary lookup caching(December 2006).

Guard against changing dict during iteration

In 2013, Serhiy Storchaka proposedGuard against changing dict duringiteration (issue #19332) whichadds ama_count field to thePyDictObject structure (dicttype), the field has the C typesize_t. This field is incrementedwhen the dictionary is modified.

PySizer

PySizer: a memory profiler for Python,Google Summer of Code 2005 project by Nick Smallbone.

This project has a patch for CPython 2.4 which addskey_time andvalue_time fields to dictionary entries. It uses a globalprocess-wide counter for dictionaries, incremented each time that adictionary is modified. The times are used to decide when child objectsfirst appeared in their parent objects.

Discussion

Thread on the mailing lists:

Acceptance

The PEP wasaccepted on 2016-09-07 by Guido van Rossum.The PEP implementation has since been committed to the repository.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0509.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp