Python Enhancement Proposals

Python »
PEP Index »
PEP 205

PEP 205 – Weak References

Author:: Fred L. Drake, Jr. <fred at fdrake.net>
Status:

Motivation

There are two basic applications for weak references which havebeen noted by Python programmers: object caches and reduction ofpain from circular references.

There is a need to allow objects to be maintained that representexternal state, mapping a single instance to the externalreality, where allowing multiple instances to be mapped to thesame external resource would create unnecessary difficultymaintaining synchronization among instances. In these cases,a common idiom is to support a cache of instances; a factoryfunction is used to return either a new or existing instance.

The difficulty in this approach is that one of two things mustbe tolerated: either the cache grows without bound, or thereneeds to be explicit management of the cache elsewhere in theapplication. The later can be very tedious and leads to morecode than is really necessary to solve the problem at hand,and the former can be unacceptable for long-running processesor even relatively short processes with substantial memoryrequirements.

External objects that need to be represented by a singleinstance, no matter how many internal users there are. Thiscan be useful for representing files that need to be writtenback to disk in whole rather than locked & modified forevery use.
Objects that are expensive to create, but may be needed bymultiple internal consumers. Similar to the first case, butnot necessarily bound to external resources, and possiblynot an issue for shared state. Weak references are onlyuseful in this case if there is some flavor of “soft”references or if there is a high likelihood that users ofindividual objects will overlap in lifespan.

Circular references

DOMs require a huge amount of circular (to parent & documentnodes) references, but these could be eliminated using a weakdictionary mapping from each node to its parent. Thismight be especially useful in the context of something likexml.dom.pulldom, allowing the.unlink() operation to becomea no-op.

This proposal is divided into the following sections:

Proposed Solution
Implementation Strategy
Possible Applications
Previous Weak Reference Work in Python
Weak References in Java

The full text of one early proposal is included as an appendixsince it does not appear to be available on the net.

Aspects of the Solution Space

There are two distinct aspects to the weak references problem:

Invalidation of weak references
Presentation of weak references to Python code

Invalidation

Past approaches to weak reference invalidation have often hingedon storing a strong reference and being able to examine all theinstances of weak reference objects, and invalidating them whenthe reference count of their referent goes to one (indicating thatthe reference stored by the weak reference is the last remainingreference). This has the advantage that the memory managementmachinery in Python need not change, and that any type can beweakly referenced.

The disadvantage of this approach to invalidation is that itassumes that the management of the weak references is calledsufficiently frequently that weakly-referenced objects are noticedwithin a reasonably short time frame; since this means a scan oversome data structure to invalidate references, an operation whichis O(N) on the number of weakly referenced objects, this is noteffectively amortized for any single object which is weaklyreferenced. This also assumes that the application is callinginto code which handles weakly-referenced objects with somefrequency, which makes weak-references less attractive for librarycode.

An alternate approach to invalidation is that the de-allocationcode to be aware of the possibility of weak references and make aspecific call into the weak-reference management code to allinvalidation whenever an object is deallocated. This requires achange in the tp_dealloc handler for weakly-referencable objects;an additional call is needed at the “top” of the handler forobjects which support weak-referencing, and an efficient way tomap from an object to a chain of weak references for that objectis needed as well.

Presentation

Two ways that weak references are presented to the Python layerhave been as explicit reference objects upon which some operationis required in order to retrieve a usable reference to theunderlying object, and proxy objects which masquerade as theoriginal objects as much as possible.

Reference objects are easy to work with when some additional layerof object management is being added in Python; references can bechecked for liveness explicitly, without having to invokeoperations on the referents and catching some special exceptionraised when an invalid weak reference is used.

However, a number of users favor the proxy approach simply becausethe weak reference looks so much like the original object.

Proposed Solution

Weak references should be able to point to any Python object thatmay have substantial memory size (directly or indirectly), or holdreferences to external resources (database connections, openfiles, etc.).

A new module, weakref, will contain new functions used to createweak references.weakref.ref() will create a “weak referenceobject” and optionally attach a callback which will be called whenthe object is about to be finalized.weakref.mapping() willcreate a “weak dictionary”. A third function,weakref.proxy(),will create a proxy object that behaves somewhat like the originalobject.

A weak reference object will allow access to the referenced objectif it hasn’t been collected and to determine if the object stillexists in memory. Retrieving the referent is done by calling thereference object. If the referent is no longer alive, this willreturn None instead.

A weak dictionary maps arbitrary keys to values, but does not owna reference to the values. When the values are finalized, the(key, value) pairs for which it is a value are removed from allthe mappings containing such pairs. Like dictionaries, weakdictionaries are not hashable.

Proxy objects are weak references that attempt to behave like theobject they proxy, as much as they can. Regardless of theunderlying type, proxies are not hashable since their ability toact as a weak reference relies on a fundamental mutability thatwill cause failures when used as dictionary keys – even if theproper hash value is computed before the referent dies, theresulting proxy cannot be used as a dictionary key since it cannotbe compared once the referent has expired, and comparability isnecessary for dictionary keys. Operations on proxy objects afterthe referent dies cause weakref.ReferenceError to be raised inmost cases. “is” comparisons,type(), andid() will continue towork, but always refer to the proxy and not the referent.

The callbacks registered with weak references must accept a singleparameter, which will be the weak reference or proxy objectitself. The object cannot be accessed or resurrected in thecallback.

Implementation Strategy

The implementation of weak references will include a list ofreference containers that must be cleared for each weakly-referencableobject. If the reference is from a weak dictionary,the dictionary entry is cleared first. Then, any associatedcallback is called with the object passed as a parameter. Onceall callbacks have been called, the object is finalized anddeallocated.

Many built-in types will participate in the weak-referencemanagement, and any extension type can elect to do so. The typestructure will contain an additional field which provides anoffset into the instance structure which contains a list of weakreference structures. If the value of the field is <= 0, theobject does not participate. In this case,weakref.ref(),<weakdict>.__setitem__() and.setdefault(), and item assignment willraiseTypeError. If the value of the field is > 0, a new weakreference can be generated and added to the list.

This approach is taken to allow arbitrary extension types toparticipate, without taking a memory hit for numbers or othersmall types.

Standard types which support weak references include instances,functions, and bound & unbound methods. With the addition ofclass types (“new-style classes”) in Python 2.2, types grewsupport for weak references. Instances of class types are weaklyreferencable if they have a base type which is weakly referencable,the class not specify__slots__, or a slot is named__weakref__.Generators also support weak references.

Possible Applications

PyGTK+ bindings?

Tkinter – could avoid circular references by using weakreferences from widgets to their parents. Objects won’t bediscarded any sooner in the typical case, but there won’t be somuch dependence on the programmer calling.destroy() beforereleasing a reference. This would mostly benefit long-runningapplications.

DOM trees.

Previous Weak Reference Work in Python

Dianne Hackborn has proposed something called “virtual references”.‘vref’ objects are very similar to java.lang.ref.WeakReferenceobjects, except there is no equivalent to the invalidationqueues. Implementing a “weak dictionary” would be just asdifficult as using only weak references (without the invalidationqueue) in Java. Information on this has disappeared from the Web,but is included below as an Appendix.

Marc-André Lemburg’s mx.Proxy package:

http://www.lemburg.com/files/python/mxProxy.html

The weakdict module by Dieter Maurer is implemented in C andPython. It appears that the Web pages have not been updated sincePython 1.5.2a, so I’m not yet sure if the implementation iscompatible with Python 2.0.

http://www.handshake.de/~dieter/weakdict.html

PyWeakReference by Alex Shindich:

http://sourceforge.net/projects/pyweakreference/

Eric Tiedemann has a weak dictionary implementation:

http://www.hyperreal.org/~est/python/weak/

Weak References in Java

http://java.sun.com/j2se/1.3/docs/api/java/lang/ref/package-summary.html

Java provides three forms of weak references, and one interestinghelper class. The three forms are called “weak”, “soft”, and“phantom” references. The relevant classes are defined in thejava.lang.ref package.

For each of the reference types, there is an option to add thereference to a queue when it is invalidated by the memoryallocator. The primary purpose of this facility seems to be thatit allows larger structures to be composed to incorporateweak-reference semantics without having to impose substantialadditional locking requirements. For instance, it would not bedifficult to use this facility to create a “weak” hash table whichremoves keys and referents when a reference is no longer usedelsewhere. Using weak references for the objects without somesort of notification queue for invalidations leads to much moretedious implementation of the various operations required on hashtables. This can be a performance bottleneck if deallocations ofthe stored objects are infrequent.

Java’s “weak” references are most like Dianne Hackborn’s old vrefproposal: a reference object refers to a single Python object,but does not own a reference to that object. When that object isdeallocated, the reference object is invalidated. Users of thereference object can easily determine that the reference has beeninvalidated, or a NullObjectDereferenceError can be raised whenan attempt is made to use the referred-to object.

The “soft” references are similar, but are not invalidated as soonas all other references to the referred-to object have beenreleased. The “soft” reference does own a reference, but allowsthe memory allocator to free the referent if the memory is neededelsewhere. It is not clear whether this means soft references arereleased before themalloc() implementation callssbrk() or itsequivalent, or if soft references are only cleared whenmalloc()returnsNULL.

“Phantom” references are a little different; unlike weak and softreferences, the referent is not cleared when the reference isadded to its queue. When all phantom references for an objectare dequeued, the object is cleared. This can be used to keep anobject alive until some additional cleanup is performed whichneeds to happen before the objects.finalize() method is called.

Unlike the other two reference types, “phantom” references must beassociated with an invalidation queue.

Appendix – Dianne Hackborn’s vref proposal (1995)

[This has been indented and paragraphs reflowed, but there have beno content changes. –Fred]

Proposal: Virtual References

In an attempt to partly address the recurring discussionconcerning reference counting vs. garbage collection, I would liketo propose an extension to Python which should help in thecreation of “well structured” cyclic graphs. In particular, itshould allow at least trees with parent back-pointers anddoubly-linked lists to be created without worry about cycles.

The basic mechanism I’d like to propose is that of a “virtualreference,” or a “vref” from here on out. A vref is essentially ahandle on an object that does not increment the object’s referencecount. This means that holding a vref on an object will not keepthe object from being destroyed. This would allow the Pythonprogrammer, for example, to create the aforementioned treestructure, which is automatically destroyed when itis no longer in use – by making all of the parent back-referencesinto vrefs, they no longer create reference cycles which keep thetree from being destroyed.

In order to implement this mechanism, the Python core must ensurethat no -real- pointers are ever left referencing objects that nolonger exist. The implementation I would like to propose involvestwo basic additions to the current Python system:

A new “vref” type, through which the Python programmer createsand manipulates virtual references. Internally, it isbasically a C-level Python object with a pointer to the Pythonobject it is a reference to. Unlike all other Python code,however, it does not change the reference count of this object.In addition, it includes two pointers to implement adoubly-linked list, which is used below.
The addition of a new field to the basic Python object[PyObject_Head in object.h], which is eitherNULL, or points tothe head of a list of all vref objects that reference it. Whena vref object attaches itself to another object, it adds itselfto this linked list. Then, if an object with any vrefs on itis deallocated, it may walk this list and ensure that all ofthe vrefs on it point to some safe value, e.g. Nothing.

This implementation should hopefully have a minimal impact on thecurrent Python core – when no vrefs exist, it should only add onepointer to all objects, and a check for aNULL pointer every timean object is deallocated.

Back at the Python language level, I have considered two possiblesemantics for the vref object –

Pointer semantics

In this model, a vref behaves essentially like a Python-levelpointer; the Python program must explicitly dereference the vrefto manipulate the actual object it references.

An example vref module using this model could include thefunction “new”; When used as ‘MyVref = vref.new(MyObject)’, itreturns a new vref object such thatMyVref.object==MyObject.MyVref.object would then change to Nothing ifMyObject is ever deallocated.

For a concrete example, we may introduce some new C-style syntax:

& – unary operator, creates a vref on an object, same asvref.new().
* – unary operator, dereference a vref, same asVrefObject.object.

We can then define:

1.type(&MyObject)==vref.VrefType2.*(&MyObject)==MyObject3.(*(&MyObject)).attr==MyObject.attr4.&&MyObject==Nothing5.*MyObject->exception

Rule #4 is subtle, but comes about because we have made a vrefto (a vref with no real references). Thus the outer vref iscleared to Nothing when the inner one inevitably disappears.

Proxy semantics

In this model, the Python programmer manipulates vref objectsjust as if she were manipulating the object it is a referenceof. This is accomplished by implementing the vref so that alloperations on it are redirected to its referenced object. Withthis model, the dereference operator (*) no longer makes sense;instead, we have only the reference operator (&), and define:

1.type(&MyObject)==type(MyObject)2.&MyObject==MyObject3.(&MyObject).attr==MyObject.attr4.&&MyObject==MyObject

Again, rule #4 is important – here, the outer vref is in fact areference to the original object, and -not- the inner vref.This is because all operations applied to a vref actually applyto its object, so that creating a vref of a vref actuallyresults in creating a vref of the latter’s object.

The first, pointer semantics, has the advantage that it would bevery easy to implement; the vref type is extremely simple,requiring at minimum a single attribute, object, and a function tocreate a reference.

However, I really like the proxy semantics. Not only does it putless of a burden on the Python programmer, but it allows you to donice things like use a vref anywhere you would use the actualobject. Unfortunately, it would probably an extreme pain, if notpractically impossible, to implement in the current Pythonimplementation. I do have some thoughts, though, on how to dothis, if it seems interesting; one possibility is to introduce newtype-checking functions which handle the vref. This wouldhopefully older C modules which don’t expect vrefs to simplyreturn a type error, until they can be fixed.

Finally, there are some other additional capabilities that thissystem could provide. One that seems particularly interesting tome involves allowing the Python programmer to add “destructor”function to a vref – this Python function would be calledimmediately prior to the referenced object being deallocated,allowing a Python program to invisibly attach itself to anotherobject and watch for it to disappear. This seems neat, though Ihaven’t actually come up with any practical uses for it, yet… :)

– Dianne

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-0205.rst

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換