The bindings for most global variables and attributes of other modulestypically never change during the execution of a Python program, but becauseof Python’s dynamic nature, code which accesses such global objects must runthrough a full lookup each time the object is needed. This PEP proposes amechanism that allows code that accesses most global objects to treat them aslocal objects and places the burden of updating references on the code thatchanges the name bindings of such objects.
Consider the workhorse functionsre_compile._compile. It is the internalcompilation function for thesre module. It consists almost entirely of aloop over the elements of the pattern being compiled, comparing opcodes withknown constant values and appending tokens to an output list. Most of thecomparisons are with constants imported from thesre_constants module.This means there are lots ofLOAD_GLOBAL bytecodes in the compiled outputof this module. Just by reading the code it’s apparent that the authorintendedLITERAL,NOT_LITERAL,OPCODES and many other symbols tobe constants. Still, each time they are involved in an expression, they mustbe looked up anew.
Most global accesses are actually to objects that are “almost constants”.This includes global variables in the current module as well as the attributesof other imported modules. Since they rarely change, it seems reasonable toplace the burden of updating references to such objects on the code thatchanges the name bindings. Ifsre_constants.LITERAL is changed to referto another object, perhaps it would be worthwhile for the code that modifiesthesre_constants module dict to correct any active references to thatobject. By doing so, in many cases global variables and the attributes ofmany objects could be cached as local variables. If the bindings between thenames given to the objects and the objects themselves changes rarely, the costof keeping track of such objects should be low and the potential payoff fairlylarge.
In an attempt to gauge the effect of this proposal, I modified the Pystonebenchmark program included in the Python distribution to cache globalfunctions. Its main function,Proc0, makes calls to ten differentfunctions inside itsfor loop. In addition,Func2 callsFunc1repeatedly inside a loop. If local copies of these 11 global identifiers aremade before the functions’ loops are entered, performance on this particularbenchmark improves by about two percent (from 5561 pystones to 5685 on mylaptop). It gives some indication that performance would be improved bycaching most global variable access. Note also that the pystone benchmarkmakes essentially no accesses of global module attributes, an anticipated areaof improvement for this PEP.
I propose that the Python virtual machine be modified to includeTRACK_OBJECT andUNTRACK_OBJECT opcodes.TRACK_OBJECT wouldassociate a global name or attribute of a global name with a slot in the localvariable array and perform an initial lookup of the associated object to fillin the slot with a valid value. The association it creates would be noted bythe code responsible for changing the name-to-object binding to cause theassociated local variable to be updated. TheUNTRACK_OBJECT opcode woulddelete any association between the name and the local variable slot.
Operation of this code in threaded programs will be no different than inunthreaded programs. If you need to lock an object to access it, you wouldhave had to do that beforeTRACK_OBJECT would have been executed andretain that lock until after you stop using it.
FIXME: I suspect I need more here.
Global variables and attributes rarely change. For example, once a functionimports the math module, the binding between the namemath and themodule it refers to aren’t likely to change. Similarly, if the function thatuses themath module refers to itssin attribute, it’s unlikely tochange. Still, every time the module wants to call themath.sin function,it must first execute a pair of instructions:
LOAD_GLOBALmathLOAD_ATTRsin
If the client module always assumed thatmath.sin was a local constant andit was the responsibility of “external forces” outside the function to keepthe reference correct, we might have code like this:
TRACK_OBJECTmath.sin...LOAD_FASTmath.sin...UNTRACK_OBJECTmath.sin
If theLOAD_FAST was in a loop the payoff in reduced global loads andattribute lookups could be significant.
This technique could, in theory, be applied to any global variable access orattribute lookup. Consider this code:
l=[]foriinrange(10):l.append(math.sin(i))returnl
Even thoughl is a local variable, you still pay the cost of loadingl.append ten times in the loop. The compiler (or an optimizer) couldrecognize that bothmath.sin andl.append are being called in the loopand decide to generate the tracked local code, avoiding it for the builtinrange() function because it’s only called once during loop setup.Performance issues related to accessing local variables make trackingl.append less attractive than tracking globals such asmath.sin.
According to a post to python-dev by Marc-Andre Lemburg[1],LOAD_GLOBALopcodes account for over 7% of all instructions executed by the Python virtualmachine. This can be a very expensive instruction, at least relative to aLOAD_FAST instruction, which is a simple array index and requires no extrafunction calls by the virtual machine. I believe manyLOAD_GLOBALinstructions andLOAD_GLOBAL/LOAD_ATTR pairs could be converted toLOAD_FAST instructions.
Code that uses global variables heavily often resorts to various tricks toavoid global variable and attribute lookup. The aforementionedsre_compile._compile function caches theappend method of the growingoutput list. Many people commonly abuse functions’ default argument featureto cache global variable lookups. Both of these schemes are hackish andrarely address all the available opportunities for optimization. (Forexample,sre_compile._compile does not cache the two globals that it usesmost frequently: the builtinlen function and the globalOPCODES arraythat it imports fromsre_constants.py.
math.sin changes while in cache?I believe the global interpreter lock will protect values from beingcorrupted. In any case, the situation would be no worse than it is today.If one thread modifiedmath.sin after another thread had already executedLOAD_GLOBALmath, but before it executedLOAD_ATTRsin, the clientthread would see the old value ofmath.sin.
The idea is this. I use a multi-attribute load below as an example, notbecause it would happen very often, but because by demonstrating the recursivenature with an extra call hopefully it will become clearer what I have inmind. Suppose a function defined in modulefoo wants to accessspam.eggs.ham and thatspam is a module imported at the module levelinfoo:
importspam...defsomefunc():...x=spam.eggs.ham
Upon entry tosomefunc, aTRACK_GLOBAL instruction will be executed:
TRACK_GLOBALspam.eggs.hamn
spam.eggs.ham is a string literal stored in the function’s constantsarray.n is a fastlocals index.&fastlocals[n] is a reference toslotn in the executing frame’sfastlocals array, the location inwhich thespam.eggs.ham reference will be stored. Here’s what I envisionhappening:
TRACK_GLOBAL instruction locates the object referred to by the namespam and finds it in its module scope. It then executes a C functionlike:_PyObject_TrackName(m,"spam.eggs.ham",&fastlocals[n])
wherem is the module object with an attributespam.
&fastlocals[n]) in case its binding for thenameeggs changes. It then locates the object referred to by the keyeggs in its dict and recursively calls:_PyObject_TrackName(eggs,"eggs.ham",&fastlocals[n])
eggs object strips the leadingeggs., stores the(ham, &fastlocals[n]) info, locates the object in its namespace calledham and calls_PyObject_TrackName once again:_PyObject_TrackName(ham,"ham",&fastlocals[n])
ham object strips the leading string (no “.” this time, but that’sa minor point), sees that the result is empty, then uses its own value(self, probably) to update the location it was handed:Py_XDECREF(&fastlocals[n]);&fastlocals[n]=self;Py_INCREF(&fastlocals[n]);
At this point, each object involved in resolvingspam.eggs.hamknows which entry in its namespace needs to be tracked and what locationto update if that name changes. Furthermore, if the one name it istracking in its local storage changes, it can call_PyObject_TrackNameusing the new object once the change has been made. At the bottom end ofthe food chain, the last object will always strip a name, see the emptystring and know that its value should be stuffed into the location it’sbeen passed.
When the object referred to by the dotted expressionspam.eggs.hamis going to go out of scope, anUNTRACK_GLOBALspam.eggs.hamninstruction is executed. It has the effect of deleting all the trackinginformation thatTRACK_GLOBAL established.
The tracking operation may seem expensive, but recall that the objectsbeing tracked are assumed to be “almost constant”, so the setup cost willbe traded off against hopefully multiple local instead of global loads.For globals with attributes the tracking setup cost grows but is offset byavoiding the extraLOAD_ATTR cost. TheTRACK_GLOBAL instructionneeds to perform aPyDict_GetItemString for the first name in the chainto determine where the top-level object resides. Each object in the chainhas to store a string and an address somewhere, probably in a dict thatuses storage locations as keys (e.g. the&fastlocals[n]) and strings asvalues. (This dict could possibly be a central dict of dicts whose keysare object addresses instead of a per-object dict.) It shouldn’t be theother way around because multiple active frames may want to trackspam.eggs.ham, but only one frame will want to associate that name withone of its fast locals slots.
What about this (dumb) code?:
l=[]lock=threading.Lock()...deffill_l()::foriinrange(1000)::lock.acquire()l.append(math.sin(i))lock.release()...defconsume_l()::while1::lock.acquire()ifl::elt=l.pop()lock.release()fiddle(elt)
It’s not clear from a static analysis of the code what the lock is protecting.(You can’t tell at compile-time that threads are even involved can you?)Would or should it affect attempts to trackl.append ormath.sin inthefill_l function?
If we annotate the code with mythicaltrack_object anduntrack_objectbuiltins (I’m not proposing this, just illustrating where stuff would go!), weget:
l=[]lock=threading.Lock()...deffill_l()::track_object("l.append",append)track_object("math.sin",sin)foriinrange(1000)::lock.acquire()append(sin(i))lock.release()untrack_object("math.sin",sin)untrack_object("l.append",append)...defconsume_l()::while1::lock.acquire()ifl::elt=l.pop()lock.release()fiddle(elt)
Is that correct both with and without threads (or at least equally incorrectwith and without threads)?
The presence of nested scopes will affect whereTRACK_GLOBAL finds aglobal variable, but shouldn’t affect anything after that. (I think.)
Suppose I am tracking the object referred to byspam.eggs.ham andspam.eggs is rebound to an object that does not have aham attribute.It’s clear this will be anAttributeError if the programmer attempts toresolvespam.eggs.ham in the current Python virtual machine, but supposethe programmer has anticipated this case:
ifhasattr(spam.eggs,"ham"):printspam.eggs.hamelifhasattr(spam.eggs,"bacon"):printspam.eggs.baconelse:print"what? no meat?"
You can’t raise anAttributeError when the tracking information isrecalculated. If it does not raiseAttributeError and instead lets thetracking stand, it may be setting the programmer up for a very subtle error.
One solution to this problem would be to track the shortest possible root ofeach dotted expression the function refers to directly. In the above example,spam.eggs would be tracked, butspam.eggs.ham andspam.eggs.baconwould not.
In the Questions section I postulated the existence of a_PyObject_TrackName function. While the API is fairly easy to specify,the implementation behind-the-scenes is not so obvious. A central dictionarycould be used to track the name/location mappings, but it appears that allsetattr functions might need to be modified to accommodate this newfunctionality.
If all types used thePyObject_GenericSetAttr function to set attributesthat would localize the update code somewhat. They don’t however (which isnot too surprising), so it seems that allgetattrfunc andgetattrofuncfunctions will have to be updated. In addition, this would place an absoluterequirement on C extension module authors to call some function when anattribute changes value (PyObject_TrackUpdate?).
Finally, it’s quite possible that some attributes will be set by side effectand not by any direct call to asetattr method of some sort. Consider adevice interface module that has an interrupt routine that copies the contentsof a device register into a slot in the object’sstruct whenever itchanges. In these situations, more extensive modifications would have to bemade by the module author. To identify such situations at compile time wouldbe impossible. I think an extra slot could be added toPyTypeObjects toindicate if an object’s code is safe for global tracking. It would have adefault value of 0 (Py_TRACKING_NOT_SAFE). If an extension module authorhas implemented the necessary tracking support, that field could beinitialized to 1 (Py_TRACKING_SAFE)._PyObject_TrackName could checkthat field and issue a warning if it is asked to track an object that theauthor has not explicitly said was safe for tracking.
Jeremy Hylton has an alternate proposal on the table[2]. His proposal seeksto create a hybrid dictionary/list object for use in global name lookups thatwould make global variable access look more like local variable access. Whilethere is no C code available to examine, the Python implementation given inhis proposal still appears to require dictionary key lookup. It doesn’tappear that his proposal could speed local variable attribute lookup, whichmight be worthwhile in some situations if potential performance burdens couldbe addressed.
I don’t believe there will be any serious issues of backward compatibility.Obviously, Python bytecode that containsTRACK_OBJECT opcodes could not beexecuted by earlier versions of the interpreter, but breakage at the bytecodelevel is often assumed between versions.
TBD. This is where I need help. I believe there should be either a centralname/location registry or the code that modifies object attributes should bemodified, but I’m not sure the best way to go about this. If you look at thecode that implements theSTORE_GLOBAL andSTORE_ATTR opcodes, it seemslikely that some changes will be required toPyDict_SetItem andPyObject_SetAttr or their String variants. Ideally, there’d be a fairlycentral place to localize these changes. If you begin considering trackingattributes of local variables you get into issues of modifyingSTORE_FASTas well, which could be a problem, since the name bindings for local variablesare changed much more frequently. (I think an optimizer could avoid insertingthe tracking code for the attributes for any local variables where thevariable’s name binding changes.)
I believe (though I have no code to prove it at this point), that implementingTRACK_OBJECT will generally not be much more expensive than a singleLOAD_GLOBAL instruction or aLOAD_GLOBAL/LOAD_ATTR pair. Anoptimizer should be able to avoid convertingLOAD_GLOBAL andLOAD_GLOBAL/LOAD_ATTR to the new scheme unless the object accessoccurred within a loop. Further down the line, a register-orientedreplacement for the current Python virtual machine[3] could conceivablyeliminate most of theLOAD_FAST instructions as well.
The number of tracked objects should be relatively small. All active framesof all active threads could conceivably be tracking objects, but this seemssmall compared to the number of functions defined in a given application.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0266.rst
Last modified:2025-02-01 08:55:40 GMT