Python Enhancement Proposals

Python »
PEP Index »
PEP 253

PEP 253 – Subtyping Built-in Types

Author:: Guido van Rossum <guido at python.org>
Status:

Abstract

This PEP proposes additions to the type object API that will allowthe creation of subtypes of built-in types, in C and in Python.

[Editor’s note: the ideas described in this PEP have been incorporatedinto Python. The PEP no longer accurately describes the implementation.]

Traditionally, types in Python have been created statically, bydeclaring a global variable of type PyTypeObject and initializingit with a static initializer. The slots in the type objectdescribe all aspects of a Python type that are relevant to thePython interpreter. A few slots contain dimensional information(like the basic allocation size of instances), others containvarious flags, but most slots are pointers to functions toimplement various kinds of behaviors. A NULL pointer means thatthe type does not implement the specific behavior; in that casethe system may provide a default behavior or raise an exceptionwhen the behavior is invoked for an instance of the type. Somecollections of function pointers that are usually defined togetherare obtained indirectly via a pointer to an additional structurecontaining more function pointers.

While the details of initializing a PyTypeObject structure haven’tbeen documented as such, they are easily gleaned from the examplesin the source code, and I am assuming that the reader issufficiently familiar with the traditional way of creating newPython types in C.

This PEP will introduce the following features:

a type can be a factory function for its instances
types can be subtyped in C
types can be subtyped in Python with the class statement
multiple inheritance from types is supported (insofar aspractical – you still can’t multiply inherit from list anddictionary)
the standard coercion functions (int, tuple, str etc.) willbe redefined to be the corresponding type objects, which serveas their own factory functions
a class statement can contain a__metaclass__ declaration,specifying the metaclass to be used to create the new class
a class statement can contain a__slots__ declaration,specifying the specific names of the instance variablessupported

This PEP builds onPEP 252, which adds standard introspection totypes; for example, when a particular type object initializes thetp_hash slot, that type object has a__hash__ method whenintrospected.PEP 252 also adds a dictionary to type objectswhich contains all methods. At the Python level, this dictionaryis read-only for built-in types; at the C level, it is accessibledirectly (but it should not be modified except as part ofinitialization).

For binary compatibility, a flag bit in the tp_flags slotindicates the existence of the various new slots in the typeobject introduced below. Types that don’t have thePy_TPFLAGS_HAVE_CLASS bit set in theirtp_flags slot are assumedto have NULL values for all the subtyping slots. (Warning: thecurrent implementation prototype is not yet consistent in itschecking of this flag bit. This should be fixed before the finalrelease.)

In current Python, a distinction is made between types andclasses. This PEP together withPEP 254 will remove thatdistinction. However, for backwards compatibility the distinctionwill probably remain for years to come, and withoutPEP 254, thedistinction is still large: types ultimately have a built-in typeas a base class, while classes ultimately derive from auser-defined class. Therefore, in the rest of this PEP, I willuse the word type whenever I can – including base type orsupertype, derived type or subtype, and metatype. However,sometimes the terminology necessarily blends, for example anobject’s type is given by its__class__ attribute, and subtypingin Python is spelled with a class statement. If furtherdistinction is necessary, user-defined classes can be referred toas “classic” classes.

About metatypes

Inevitably the discussion comes to metatypes (or metaclasses).Metatypes are nothing new in Python: Python has always been ableto talk about the type of a type:

>>>a=0>>>type(a)<type 'int'>>>>type(type(a))<type 'type'>>>>type(type(type(a)))<type 'type'>>>>

In this example,type(a) is a “regular” type, andtype(type(a)) isa metatype. While as distributed all types have the same metatype(PyType_Type, which is also its own metatype), this is not arequirement, and in fact a useful and relevant 3rd party extension(ExtensionClasses by Jim Fulton) creates an additional metatype.The type of classic classes, known astypes.ClassType, can also beconsidered a distinct metatype.

A feature closely connected to metatypes is the “Don Beaudryhook”, which says that if a metatype is callable, its instances(which are regular types) can be subclassed (really subtyped)using a Python class statement. I will use this rule to supportsubtyping of built-in types, and in fact it greatly simplifies thelogic of class creation to always simply call the metatype. Whenno base class is specified, a default metatype is called – thedefault metatype is the “ClassType” object, so the class statementwill behave as before in the normal case. (This default can bechanged per module by setting the global variable__metaclass__.)

Python uses the concept of metatypes or metaclasses in a differentway than Smalltalk. In Smalltalk-80, there is a hierarchy ofmetaclasses that mirrors the hierarchy of regular classes,metaclasses map 1-1 to classes (except for some funny business atthe root of the hierarchy), and each class statement creates botha regular class and its metaclass, putting class methods in themetaclass and instance methods in the regular class.

Nice though this may be in the context of Smalltalk, it’s notcompatible with the traditional use of metatypes in Python, and Iprefer to continue in the Python way. This means that Pythonmetatypes are typically written in C, and may be shared betweenmany regular types. (It will be possible to subtype metatypes inPython, so it won’t be absolutely necessary to write C to usemetatypes; but the power of Python metatypes will be limited. Forexample, Python code will never be allowed to allocate raw memoryand initialize it at will.)

Metatypes determine variouspolicies for types, such as whathappens when a type is called, how dynamic types are (whether atype’s__dict__ can be modified after it is created), what themethod resolution order is, how instance attributes are lookedup, and so on.

I’ll argue that left-to-right depth-first is not the bestsolution when you want to get the most use from multipleinheritance.

I’ll argue that with multiple inheritance, the metatype of thesubtype must be a descendant of the metatypes of all base types.

I’ll come back to metatypes later.

Making a type a factory for its instances

Traditionally, for each type there is at least one C factoryfunction that creates instances of the type (PyTuple_New(),PyInt_FromLong() and so on). These factory functions take care ofboth allocating memory for the object and initializing thatmemory. As of Python 2.0, they also have to interface with thegarbage collection subsystem, if the type chooses to participatein garbage collection (which is optional, but strongly recommendedfor so-called “container” types: types that may contain referencesto other objects, and hence may participate in reference cycles).

In this proposal, type objects can be factory functions for theirinstances, making the types directly callable from Python. Thismimics the way classes are instantiated. The C APIs for creatinginstances of various built-in types will remain valid and in somecases more efficient. Not all types will become their own factoryfunctions.

The type object has a new slot, tp_new, which can act as a factoryfor instances of the type. Types are now callable, because thetp_call slot is set inPyType_Type (the metatype); the functionlooks for the tp_new slot of the type that is being called.

Explanation: thetp_call slot of a regular type object (such asPyInt_Type orPyList_Type) defines what happens wheninstancesof that type are called; in particular, thetp_call slot in thefunction type,PyFunction_Type, is the key to making functionscallable. As another example,PyInt_Type.tp_call isNULL, becauseintegers are not callable. The new paradigm makestype objectscallable. Since type objects are instances of their metatype(PyType_Type), the metatype’stp_call slot (PyType_Type.tp_call)points to a function that is invoked when any type object iscalled. Now, since each type has to do something different tocreate an instance of itself,PyType_Type.tp_call immediatelydefers to thetp_new slot of the type that is being called.PyType_Type itself is also callable: itstp_new slot creates a newtype. This is used by the class statement (formalizing the DonBeaudry hook, see above). And what makesPyType_Type callable?Thetp_call slot ofits metatype – but since it is its ownmetatype, that is its owntp_call slot!

If the type’stp_new slot is NULL, an exception is raised.Otherwise, the tp_new slot is called. The signature for thetp_new slot is

PyObject*tp_new(PyTypeObject*type,PyObject*args,PyObject*kwds)

where ‘type’ is the type whosetp_new slot is called, and ‘args’and ‘kwds’ are the sequential and keyword arguments to the call,passed unchanged from tp_call. (The ‘type’ argument is used incombination with inheritance, see below.)

There are no constraints on the object type that is returned,although by convention it should be an instance of the giventype. It is not necessary that a new object is returned; areference to an existing object is fine too. The return valueshould always be a new reference, owned by the caller.

Once thetp_new slot has returned an object, further initializationis attempted by calling thetp_init() slot of the resultingobject’s type, if not NULL. This has the following signature:

inttp_init(PyObject*self,PyObject*args,PyObject*kwds)

It corresponds more closely to the__init__() method of classicclasses, and in fact is mapped to that by the slot/special-methodcorrespondence rules. The difference in responsibilities betweenthetp_new() slot and thetp_init() slot lies in the invariantsthey ensure. Thetp_new() slot should ensure only the mostessential invariants, without which the C code that implements theobjects would break. Thetp_init() slot should be used foroverridable user-specific initializations. Take for example thedictionary type. The implementation has an internal pointer to ahash table which should never be NULL. This invariant is takencare of by thetp_new() slot for dictionaries. The dictionarytp_init() slot, on the other hand, could be used to give thedictionary an initial set of keys and values based on thearguments passed in.

Note that for immutable object types, the initialization cannot bedone by thetp_init() slot: this would provide the Python userwith a way to change the initialization. Therefore, immutableobjects typically have an emptytp_init() implementation and doall their initialization in theirtp_new() slot.

You may wonder why thetp_new() slot shouldn’t call thetp_init()slot itself. The reason is that in certain circumstances (likesupport for persistent objects), it is important to be able tocreate an object of a particular type without initializing it anyfurther than necessary. This may conveniently be done by callingthetp_new() slot without callingtp_init(). It is also possiblethattp_init() is not called, or called more than once – itsoperation should be robust even in these anomalous cases.

For some objects,tp_new() may return an existing object. Forexample, the factory function for integers caches the integers -1through 99. This is permissible only when the type argument totp_new() is the type that defined thetp_new() function (in theexample, iftype==&PyInt_Type), and when thetp_init() slot forthis type does nothing. If the type argument differs, thetp_new() call is initiated by a derived type’stp_new() tocreate the object and initialize the base type portion of theobject; in this casetp_new() should always return a new object(or raise an exception).

Bothtp_new() andtp_init() should receive exactly the same ‘args’and ‘kwds’ arguments, and both should check that the arguments areacceptable, because they may be called independently.

There’s a third slot related to object creation:tp_alloc(). Itsresponsibility is to allocate the memory for the object,initialize the reference count (ob_refcnt) and the type pointer(ob_type), and initialize the rest of the object to all zeros. Itshould also register the object with the garbage collectionsubsystem if the type supports garbage collection. This slotexists so that derived types can override the memory allocationpolicy (like which heap is being used) separately from theinitialization code. The signature is:

PyObject*tp_alloc(PyTypeObject*type,intnitems)

The type argument is the type of the new object. The nitemsargument is normally zero, except for objects with a variableallocation size (basically strings, tuples, and longs). Theallocation size is given by the following expression:

type->tp_basicsize+nitems*type->tp_itemsize

Thetp_alloc slot is only used for subclassable types. Thetp_new()function of the base class must call thetp_alloc() slot of thetype passed in as its first argument. It is thetp_new()function’s responsibility to calculate the number of items. Thetp_alloc() slot will set the ob_size member of the new object ifthetype->tp_itemsize member is nonzero.

(Note: in certain debugging compilation modes, the type structureused to have members namedtp_alloc and atp_free slot already,counters for the number of allocations and deallocations. Theseare renamed totp_allocs andtp_deallocs.)

Standard implementations fortp_alloc() andtp_new() areavailable.PyType_GenericAlloc() allocates an object from thestandard heap and initializes it properly. It uses the aboveformula to determine the amount of memory to allocate, and takescare of GC registration. The only reason not to use thisimplementation would be to allocate objects from a different heap(as is done by some very small frequently used objects like intsand tuples).PyType_GenericNew() adds very little: it just callsthe type’stp_alloc() slot with zero for nitems. But for mutabletypes that do all their initialization in theirtp_init() slot,this may be just the ticket.

Preparing a type for subtyping

The idea behind subtyping is very similar to that of singleinheritance in C++. A base type is described by a structuredeclaration (similar to the C++ class declaration) plus a typeobject (similar to the C++ vtable). A derived type can extend thestructure (but must leave the names, order and type of the membersof the base structure unchanged) and can override certain slots inthe type object, leaving others the same. (Unlike C++ vtables,all Python type objects have the same memory layout.)

The base type must do the following:

Add the flag valuePy_TPFLAGS_BASETYPE totp_flags.
Declare and usetp_new(),tp_alloc() and optionaltp_init()slots.
Declare and usetp_dealloc() andtp_free().
Export its object structure declaration.
Export a subtyping-aware type-checking macro.

The requirements and signatures fortp_new(),tp_alloc() andtp_init() have already been discussed above:tp_alloc() shouldallocate the memory and initialize it to mostly zeros;tp_new()should call thetp_alloc() slot and then proceed to do theminimally required initialization;tp_init() should be used formore extensive initialization of mutable objects.

It should come as no surprise that there are similar conventionsat the end of an object’s lifetime. The slots involved aretp_dealloc() (familiar to all who have ever implemented a Pythonextension type) andtp_free(), the new kid on the block. (Thenames aren’t quite symmetric;tp_free() corresponds totp_alloc(),which is fine, buttp_dealloc() corresponds totp_new(). Maybethe tp_dealloc slot should be renamed?)

Thetp_free() slot should be used to free the memory andunregister the object with the garbage collection subsystem, andcan be overridden by a derived class;tp_dealloc() shoulddeinitialize the object (usually by callingPy_XDECREF() forvarious sub-objects) and then calltp_free() to deallocate thememory. The signature fortp_dealloc() is the same as it alwayswas:

voidtp_dealloc(PyObject*object)

The signature for tp_free() is the same:

voidtp_free(PyObject*object)

(In a previous version of this PEP, there was also a role reservedfor thetp_clear() slot. This turned out to be a bad idea.)

To be usefully subtyped in C, a type must export the structuredeclaration for its instances through a header file, as it isneeded to derive a subtype. The type object for the base typemust also be exported.

If the base type has a type-checking macro (likePyDict_Check()),this macro should be made to recognize subtypes. This can be doneby using the newPyObject_TypeCheck(object,type) macro, whichcalls a function that follows the base class links.

ThePyObject_TypeCheck() macro contains a slight optimization: itfirst comparesobject->ob_type directly to the type argument, andif this is a match, bypasses the function call. This should makeit fast enough for most situations.

Note that this change in the type-checking macro means that Cfunctions that require an instance of the base type may be invokedwith instances of the derived type. Before enabling subtyping ofa particular type, its code should be checked to make sure thatthis won’t break anything. It has proved useful in the prototypeto add another type-checking macro for the built-in Python objecttypes, to check for exact type match too (for example,PyDict_Check(x) is true if x is an instance of dictionary or of adictionary subclass, whilePyDict_CheckExact(x) is true only if xis a dictionary).

Creating a subtype of a built-in type in C

The simplest form of subtyping is subtyping in C. It is thesimplest form because we can require the C code to be aware ofsome of the problems, and it’s acceptable for C code that doesn’tfollow the rules to dump core. For added simplicity, it islimited to single inheritance.

Let’s assume we’re deriving from a mutable base type whosetp_itemsize is zero. The subtype code is not GC-aware, althoughit may inherit GC-awareness from the base type (this isautomatic). The base type’s allocation uses the standard heap.

The derived type begins by declaring a type structure whichcontains the base type’s structure. For example, here’s the typestructure for a subtype of the built-in list type:

typedefstruct{PyListObjectlist;intstate;}spamlistobject;

Note that the base type structure member (herePyListObject) mustbe the first member of the structure; any following members areadditions. Also note that the base type is not referenced via apointer; the actual contents of its structure must be included!(The goal is for the memory layout of the beginning of thesubtype instance to be the same as that of the base typeinstance.)

Next, the derived type must declare a type object and initializeit. Most of the slots in the type object may be initialized tozero, which is a signal that the base type slot must be copiedinto it. Some slots that must be initialized properly:

The object header must be filled in as usual; the type shouldbe&PyType_Type.
The tp_basicsize slot must be set to the size of the subtypeinstance struct (in the above example:sizeof(spamlistobject)).
The tp_base slot must be set to the address of the base type’stype object.
If the derived slot defines any pointer members, thetp_dealloc slot function requires special attention, seebelow; otherwise, it can be set to zero, to inherit the basetype’s deallocation function.
Thetp_flags slot must be set to the usualPy_TPFLAGS_DEFAULTvalue.
Thetp_name slot must be set; it is recommended to settp_docas well (these are not inherited).

If the subtype defines no additional structure members (it onlydefines new behavior, no new data), thetp_basicsize and thetp_dealloc slots may be left set to zero.

The subtype’stp_dealloc slot deserves special attention. If thederived type defines no additional pointer members that need to beDECREF’ed or freed when the object is deallocated, it can be setto zero. Otherwise, the subtype’stp_dealloc() function must callPy_XDECREF() for anyPyObject* members and the correct memoryfreeing function for any other pointers it owns, and then call thebase class’stp_dealloc() slot. This call has to be made via thebase type’s type structure, for example, when deriving from thestandard list type:

PyList_Type.tp_dealloc(self);

If the subtype wants to use a different allocation heap than thebase type, the subtype must override both thetp_alloc() and thetp_free() slots. These will be called by the base class’stp_new() andtp_dealloc() slots, respectively.

To complete the initialization of the type,PyType_InitDict() mustbe called. This replaces slots initialized to zero in the subtypewith the value of the corresponding base type slots. (It alsofills intp_dict, the type’s dictionary, and does various otherinitializations necessary for type objects.)

A subtype is not usable untilPyType_InitDict() is called for it;this is best done during module initialization, assuming thesubtype belongs to a module. An alternative for subtypes added tothe Python core (which don’t live in a particular module) would beto initialize the subtype in their constructor function. It isallowed to callPyType_InitDict() more than once; the second andfurther calls have no effect. To avoid unnecessary calls, a testfortp_dict==NULL can be made.

(During initialization of the Python interpreter, some types areactually used before they are initialized. As long as the slotsthat are actually needed are initialized, especiallytp_dealloc,this works, but it is fragile and not recommended as a generalpractice.)

To create a subtype instance, the subtype’stp_new() slot iscalled. This should first call the base type’stp_new() slot andthen initialize the subtype’s additional data members. To furtherinitialize the instance, thetp_init() slot is typically called.Note that thetp_new() slot shouldnot call thetp_init() slot;this is up totp_new()’s caller (typically a factory function).There are circumstances where it is appropriate not to calltp_init().

If a subtype defines atp_init() slot, thetp_init() slot shouldnormally first call the base type’stp_init() slot.

(XXX There should be a paragraph or two about argument passinghere.)

Subtyping in Python

The next step is to allow subtyping of selected built-in typesthrough a class statement in Python. Limiting ourselves to singleinheritance for now, here is what happens for a simple classstatement:

classC(B):var1=1defmethod1(self):pass# etc.

The body of the class statement is executed in a fresh environment(basically, a new dictionary used as local namespace), and then Cis created. The following explains how C is created.

Assume B is a type object. Since type objects are objects, andevery object has a type, B has a type. Since B is itself a type,we also call its type its metatype. B’s metatype is accessibleviatype(B) orB.__class__ (the latter notation is new for types;it is introduced inPEP 252). Let’s say this metatype is M (forMetatype). The class statement will create a new type, C. SinceC will be a type object just like B, we view the creation of C asan instantiation of the metatype, M. The information that needsto be provided for the creation of a subclass is:

its name (in this example the string “C”);
its bases (a singleton tuple containing B);
the results of executing the class body, in the form of adictionary (for example{"var1":1,"method1":<functionmethod1at...>,...}).

The class statement will result in the following call:

C=M("C",(B,),dict)

where dict is the dictionary resulting from execution of theclass body. In other words, the metatype (M) is called.

Note that even though the example has only one base, we still passin a (singleton) sequence of bases; this makes the interfaceuniform with the multiple-inheritance case.

In current Python, this is called the “Don Beaudry hook” after itsinventor; it is an exceptional case that is only invoked when abase class is not a regular class. For a regular base class (orwhen no base class is specified), current Python callsPyClass_New(), the C level factory function for classes, directly.

Under the new system this is changed so that Pythonalwaysdetermines a metatype and calls it as given above. When one ormore bases are given, the type of the first base is used as themetatype; when no base is given, a default metatype is chosen. Bysetting the default metatype toPyClass_Type, the metatype of“classic” classes, the classic behavior of the class statement isretained. This default can be changed per module by setting theglobal variable__metaclass__.

There are two further refinements here. First, a useful featureis to be able to specify a metatype directly. If the classsuite defines a variable__metaclass__, that is the metatypeto call. (Note that setting__metaclass__ at the module levelonly affects class statements without a base class and without anexplicit__metaclass__ declaration; but setting__metaclass__ in aclass suite overrides the default metatype unconditionally.)

Second, with multiple bases, not all bases need to have the samemetatype. This is called a metaclass conflict[1]. Somemetaclass conflicts can be resolved by searching through the setof bases for a metatype that derives from all other givenmetatypes. If such a metatype cannot be found, an exception israised and the class statement fails.

This conflict resolution can be implemented by the metatypeconstructors: the class statement just calls the metatype of the firstbase (or that specified by the__metaclass__ variable), and thismetatype’s constructor looks for the most derived metatype. Ifthat is itself, it proceeds; otherwise, it calls that metatype’sconstructor. (Ultimate flexibility: another metatype might chooseto require that all bases have the same metatype, or that there’sonly one base class, or whatever.)

(In[1], a new metaclass is automatically derived that is asubclass of all given metaclasses. But since it is questionablein Python how conflicting method definitions of the variousmetaclasses should be merged, I don’t think this is feasible.Should the need arise, the user can derive such a metaclassmanually and specify it using the__metaclass__ variable. It isalso possible to have a new metaclass that does this.)

Note that calling M requires that M itself has a type: themeta-metatype. And the meta-metatype has a type, themeta-meta-metatype. And so on. This is normally cut short atsome level by making a metatype be its own metatype. This isindeed what happens in Python: theob_type reference inPyType_Type is set to&PyType_Type. In the absence of third partymetatypes,PyType_Type is the only metatype in the Pythoninterpreter.

(In a previous version of this PEP, there was one additionalmeta-level, and there was a meta-metatype called “turtle”. Thisturned out to be unnecessary.)

In any case, the work for creating C is done by M’stp_new() slot.It allocates space for an “extended” type structure, containing:the type object; the auxiliary structures (as_sequence etc.); thestring object containing the type name (to ensure that this objectisn’t deallocated while the type object is still referencing it); andsome auxiliary storage (to be described later). It initializes thisstorage to zeros except for a few crucial slots (for example, tp_nameis set to point to the type name) and then sets the tp_base slot topoint to B. ThenPyType_InitDict() is called to inherit B’s slots.Finally, C’stp_dict slot is updated with the contents of thenamespace dictionary (the third argument to the call to M).

Multiple inheritance

The Python class statement supports multiple inheritance, and wewill also support multiple inheritance involving built-in types.

However, there are some restrictions. The C runtime architecturedoesn’t make it feasible to have a meaningful subtype of twodifferent built-in types except in a few degenerate cases.Changing the C runtime to support fully general multipleinheritance would be too much of an upheaval of the code base.

The main problem with multiple inheritance from different built-intypes stems from the fact that the C implementation of built-intypes accesses structure members directly; the C compilergenerates an offset relative to the object pointer and that’sthat. For example, the list and dictionary type structures eachdeclare a number of different but overlapping structure members.A C function accessing an object expecting a list won’t work whenpassed a dictionary, and vice versa, and there’s not much we coulddo about this without rewriting all code that accesses lists anddictionaries. This would be too much work, so we won’t do this.

The problem with multiple inheritance is caused by conflictingstructure member allocations. Classes defined in Python normallydon’t store their instance variables in structure members: theyare stored in an instance dictionary. This is the key to apartial solution. Suppose we have the following two classes:

classA(dictionary):deffoo(self):passclassB(dictionary):defbar(self):passclassC(A,B):pass

(Here, ‘dictionary’ is the type of built-in dictionary objects,a.k.a.type({}) or{}.__class__ ortypes.DictType.) If we look atthe structure layout, we find that an A instance has the layoutof a dictionary followed by the__dict__ pointer, and a B instancehas the same layout; since there are no structure member layoutconflicts, this is okay.

Here’s another example:

classX(object):deffoo(self):passclassY(dictionary):defbar(self):passclassZ(X,Y):pass

(Here, ‘object’ is the base for all built-in types; its structurelayout only contains theob_refcnt andob_type members.) Thisexample is more complicated, because the__dict__ pointer for Xinstances has a different offset than that for Y instances. Whereis the__dict__ pointer for Z instances? The answer is that theoffset for the__dict__ pointer is not hardcoded, it is stored inthe type object.

Suppose on a particular machine an ‘object’ structure is 8 byteslong, and a ‘dictionary’ struct is 60 bytes, and an object pointeris 4 bytes. Then an X structure is 12 bytes (an object structurefollowed by a__dict__ pointer), and a Y structure is 64 bytes (adictionary structure followed by a__dict__ pointer). The Zstructure has the same layout as the Y structure in this example.Each type object (X, Y and Z) has a “__dict__ offset” which isused to find the__dict__ pointer. Thus, the recipe for lookingup an instance variable is:

get the type of the instance
get the__dict__ offset from the type object
add the__dict__ offset to the instance pointer
look in the resulting address to find a dictionary reference
look up the instance variable name in that dictionary

Of course, this recipe can only be implemented in C, and I haveleft out some details. But this allows us to use multipleinheritance patterns similar to the ones we can use with classicclasses.

XXX I should write up the complete algorithm here to determinebase class compatibility, but I can’t be bothered right now. Lookatbest_base() in typeobject.c in the implementation mentionedbelow.

MRO: Method resolution order (the lookup rule)

With multiple inheritance comes the question of method resolutionorder: the order in which a class or type and its bases aresearched looking for a method of a given name.

In classic Python, the rule is given by the following recursivefunction, also known as the left-to-right depth-first rule:

defclassic_lookup(cls,name):ifcls.__dict__.has_key(name):returncls.__dict__[name]forbaseincls.__bases__:try:returnclassic_lookup(base,name)exceptAttributeError:passraiseAttributeError,name

The problem with this becomes apparent when we consider a “diamonddiagram”:

classA:^^defsave(self):.../   \/     \/       \/         \classBclassC:^^defsave(self):...     \/      \/       \/        \/classD

Arrows point from a subtype to its basetype(s). This particulardiagram means B and C derive from A, and D derives from B and C(and hence also, indirectly, from A).

Assume that C overrides the methodsave(), which is defined in thebase A. (C.save() probably callsA.save() and then saves some ofits own state.) B and D don’t overridesave(). When we invokesave() on a D instance, which method is called? According to theclassic lookup rule,A.save() is called, ignoringC.save()!

This is not good. It probably breaks C (its state doesn’t getsaved), defeating the whole purpose of inheriting from C in thefirst place.

Why was this not a problem in classic Python? Diamond diagramsare rarely found in classic Python class hierarchies. Most classhierarchies use single inheritance, and multiple inheritance isusually confined to mix-in classes. In fact, the problem shownhere is probably the reason why multiple inheritance is unpopularin classic Python.

Why will this be a problem in the new system? The ‘object’ typeat the top of the type hierarchy defines a number of methods thatcan usefully be extended by subtypes, for example__getattr__().

(Aside: in classic Python, the__getattr__() method is not reallythe implementation for the get-attribute operation; it is a hookthat only gets invoked when an attribute cannot be found by normalmeans. This has often been cited as a shortcoming – some classdesigns have a legitimate need for a__getattr__() method thatgets called forall attribute references. But then of coursethis method has to be able to invoke the default implementationdirectly. The most natural way is to make the defaultimplementation available asobject.__getattr__(self,name).)

Thus, a classic class hierarchy like this:

classBclassC:^^def__getattr__(self,name):...     \/      \/       \/        \/classD

will change into a diamond diagram under the new system:

object:^^__getattr__()/   \/     \/       \/         \classBclassC:^^def__getattr__(self,name):...     \/      \/       \/        \/classD

and while in the original diagramC.__getattr__() is invoked,under the new system with the classic lookup rule,object.__getattr__() would be invoked!

Fortunately, there’s a lookup rule that’s better. It’s a bitdifficult to explain, but it does the right thing in the diamonddiagram, and it is the same as the classic lookup rule when thereare no diamonds in the inheritance graph (when it is a tree).

The new lookup rule constructs a list of all classes in theinheritance diagram in the order in which they will be searched.This construction is done at class definition time to save time.To explain the new lookup rule, let’s first consider what such alist would look like for the classic lookup rule. Note that inthe presence of diamonds the classic lookup visits some classesmultiple times. For example, in the ABCD diamond diagram above,the classic lookup rule visits the classes in this order:

D,B,A,C,A

Note how A occurs twice in the list. The second occurrence isredundant, since anything that could be found there would alreadyhave been found when searching the first occurrence.

We use this observation to explain our new lookup rule. Using theclassic lookup rule, construct the list of classes that would besearched, including duplicates. Now for each class that occurs inthe list multiple times, remove all occurrences except for thelast. The resulting list contains each ancestor class exactlyonce (including the most derived class, D in the example).

Searching for methods in this order will do the right thing forthe diamond diagram. Because of the way the list is constructed,it does not change the search order in situations where no diamondis involved.

Isn’t this backwards incompatible? Won’t it break existing code?It would, if we changed the method resolution order for allclasses. However, in Python 2.2, the new lookup rule will only beapplied to types derived from built-in types, which is a newfeature. Class statements without a base class create “classicclasses”, and so do class statements whose base classes arethemselves classic classes. For classic classes the classiclookup rule will be used. (To experiment with the new lookup rulefor classic classes, you will be able to specify a differentmetaclass explicitly.) We’ll also provide a tool that analyzes aclass hierarchy looking for methods that would be affected by achange in method resolution order.

XXX Another way to explain the motivation for the new MRO, due toDamian Conway: you never use the method defined in a base class ifit is defined in a derived class that you haven’t explored yet(using the old search order).

XXX To be done

Additional topics to be discussed in this PEP:

backwards compatibility issues!!!
class methods and static methods
cooperative methods andsuper()
mapping between type object slots (tp_foo) and special methods(__foo__) (actually, this may belong inPEP 252)
built-in names for built-in types (object, int, str, list etc.)
__dict__ and__dictoffset__
__slots__
theHEAPTYPE flag bit
GC support
API docs for all the new functions
how to use__new__
writing metaclasses (usingmro() etc.)
high level user overview

open issues

do we need__del__?
assignment to__dict__,__bases__
inconsistent naming(e.g. tp_dealloc/tp_new/tp_init/tp_alloc/tp_free)
add builtin alias ‘dict’ for ‘dictionary’?
when subclasses of dict/list etc. are passed to systemfunctions, the__getitem__ overrides (etc.) aren’t alwaysused

Implementation

A prototype implementation of this PEP (and forPEP 252) isavailable from CVS, and in the series of Python 2.2 alpha and betareleases. For some examples of the features described here, seethe file Lib/test/test_descr.py and the extension moduleModules/xxsubtype.c.