2.Defining Extension Types: Tutorial¶
Python allows the writer of a C extension module to define new types thatcan be manipulated from Python code, much like the built-instr
andlist
types. The code for all extension types follows apattern, but there are some details that you need to understand before youcan get started. This document is a gentle introduction to the topic.
2.1.The Basics¶
TheCPython runtime sees all Python objects as variables of typePyObject*, which serves as a “base type” for all Python objects.ThePyObject
structure itself only contains the object’sreference count and a pointer to the object’s “type object”.This is where the action is; the type object determines which (C) functionsget called by the interpreter when, for instance, an attribute gets looked upon an object, a method called, or it is multiplied by another object. TheseC functions are called “type methods”.
So, if you want to define a new extension type, you need to create a new typeobject.
This sort of thing can only be explained by example, so here’s a minimal, butcomplete, module that defines a new type namedCustom
inside a Cextension modulecustom
:
Note
What we’re showing here is the traditional way of definingstaticextension types. It should be adequate for most uses. The C API alsoallows defining heap-allocated extension types using thePyType_FromSpec()
function, which isn’t covered in this tutorial.
#define PY_SSIZE_T_CLEAN#include<Python.h>typedefstruct{PyObject_HEAD/* Type-specific fields go here. */}CustomObject;staticPyTypeObjectCustomType={.ob_base=PyVarObject_HEAD_INIT(NULL,0).tp_name="custom.Custom",.tp_doc=PyDoc_STR("Custom objects"),.tp_basicsize=sizeof(CustomObject),.tp_itemsize=0,.tp_flags=Py_TPFLAGS_DEFAULT,.tp_new=PyType_GenericNew,};staticPyModuleDefcustommodule={.m_base=PyModuleDef_HEAD_INIT,.m_name="custom",.m_doc="Example module that creates an extension type.",.m_size=-1,};PyMODINIT_FUNCPyInit_custom(void){PyObject*m;if(PyType_Ready(&CustomType)<0)returnNULL;m=PyModule_Create(&custommodule);if(m==NULL)returnNULL;if(PyModule_AddObjectRef(m,"Custom",(PyObject*)&CustomType)<0){Py_DECREF(m);returnNULL;}returnm;}
Now that’s quite a bit to take in at once, but hopefully bits will seem familiarfrom the previous chapter. This file defines three things:
What a
Custom
object contains: this is theCustomObject
struct, which is allocated once for eachCustom
instance.How the
Custom
type behaves: this is theCustomType
struct,which defines a set of flags and function pointers that the interpreterinspects when specific operations are requested.How to initialize the
custom
module: this is thePyInit_custom
function and the associatedcustommodule
struct.
The first bit is:
typedefstruct{PyObject_HEAD}CustomObject;
This is what a Custom object will contain.PyObject_HEAD
is mandatoryat the start of each object struct and defines a field calledob_base
of typePyObject
, containing a pointer to a type object and areference count (these can be accessed using the macrosPy_TYPE
andPy_REFCNT
respectively). The reason for the macro is toabstract away the layout and to enable additional fields indebug builds.
Note
There is no semicolon above after thePyObject_HEAD
macro.Be wary of adding one by accident: some compilers will complain.
Of course, objects generally store additional data besides the standardPyObject_HEAD
boilerplate; for example, here is the definition forstandard Python floats:
typedefstruct{PyObject_HEADdoubleob_fval;}PyFloatObject;
The second bit is the definition of the type object.
staticPyTypeObjectCustomType={.ob_base=PyVarObject_HEAD_INIT(NULL,0).tp_name="custom.Custom",.tp_doc=PyDoc_STR("Custom objects"),.tp_basicsize=sizeof(CustomObject),.tp_itemsize=0,.tp_flags=Py_TPFLAGS_DEFAULT,.tp_new=PyType_GenericNew,};
Note
We recommend using C99-style designated initializers as above, toavoid listing all thePyTypeObject
fields that you don’t careabout and also to avoid caring about the fields’ declaration order.
The actual definition ofPyTypeObject
inobject.h
hasmany morefields than the definition above. Theremaining fields will be filled with zeros by the C compiler, and it’scommon practice to not specify them explicitly unless you need them.
We’re going to pick it apart, one field at a time:
.ob_base=PyVarObject_HEAD_INIT(NULL,0)
This line is mandatory boilerplate to initialize theob_base
field mentioned above.
.tp_name="custom.Custom",
The name of our type. This will appear in the default textual representation ofour objects and in some error messages, for example:
>>>""+custom.Custom()Traceback (most recent call last): File"<stdin>", line1, in<module>TypeError:can only concatenate str (not "custom.Custom") to str
Note that the name is a dotted name that includes both the module name and thename of the type within the module. The module in this case iscustom
andthe type isCustom
, so we set the type name tocustom.Custom
.Using the real dotted import path is important to make your type compatiblewith thepydoc
andpickle
modules.
.tp_basicsize=sizeof(CustomObject),.tp_itemsize=0,
This is so that Python knows how much memory to allocate when creatingnewCustom
instances.tp_itemsize
isonly used for variable-sized objects and should otherwise be zero.
Note
If you want your type to be subclassable from Python, and your type has the sametp_basicsize
as its base type, you may have problems with multipleinheritance. A Python subclass of your type will have to list your type firstin its__bases__
, or else it will not be able to call your type’s__new__()
method without getting an error. You can avoid this problem byensuring that your type has a larger value fortp_basicsize
than itsbase type does. Most of the time, this will be true anyway, because either yourbase type will beobject
, or else you will be adding data members toyour base type, and therefore increasing its size.
We set the class flags toPy_TPFLAGS_DEFAULT
.
.tp_flags=Py_TPFLAGS_DEFAULT,
All types should include this constant in their flags. It enables all of themembers defined until at least Python 3.3. If you need further members,you will need to OR the corresponding flags.
We provide a doc string for the type intp_doc
.
.tp_doc=PyDoc_STR("Custom objects"),
To enable object creation, we have to provide atp_new
handler. This is the equivalent of the Python method__new__()
, buthas to be specified explicitly. In this case, we can just use the defaultimplementation provided by the API functionPyType_GenericNew()
.
.tp_new=PyType_GenericNew,
Everything else in the file should be familiar, except for some code inPyInit_custom()
:
if(PyType_Ready(&CustomType)<0)return;
This initializes theCustom
type, filling in a number of membersto the appropriate default values, includingob_type
that we initiallyset toNULL
.
if(PyModule_AddObjectRef(m,"Custom",(PyObject*)&CustomType)<0){Py_DECREF(m);returnNULL;}
This adds the type to the module dictionary. This allows us to createCustom
instances by calling theCustom
class:
>>>importcustom>>>mycustom=custom.Custom()
That’s it! All that remains is to build it; put the above code in a file calledcustom.c
,
[build-system]requires=["setuptools"]build-backend="setuptools.build_meta"[project]name="custom"version="1"
in a file calledpyproject.toml
, and
fromsetuptoolsimportExtension,setupsetup(ext_modules=[Extension("custom",["custom.c"])])
in a file calledsetup.py
; then typing
$python-mpipinstall.
in a shell should produce a filecustom.so
in a subdirectoryand install it; now fire up Python — you should be able toimportcustom
and play around withCustom
objects.
That wasn’t so hard, was it?
Of course, the current Custom type is pretty uninteresting. It has no data anddoesn’t do anything. It can’t even be subclassed.
2.2.Adding data and methods to the Basic example¶
Let’s extend the basic example to add some data and methods. Let’s also makethe type usable as a base class. We’ll create a new module,custom2
thatadds these capabilities:
#define PY_SSIZE_T_CLEAN#include<Python.h>#include<stddef.h> /* for offsetof() */typedefstruct{PyObject_HEADPyObject*first;/* first name */PyObject*last;/* last name */intnumber;}CustomObject;staticvoidCustom_dealloc(CustomObject*self){Py_XDECREF(self->first);Py_XDECREF(self->last);Py_TYPE(self)->tp_free((PyObject*)self);}staticPyObject*Custom_new(PyTypeObject*type,PyObject*args,PyObject*kwds){CustomObject*self;self=(CustomObject*)type->tp_alloc(type,0);if(self!=NULL){self->first=PyUnicode_FromString("");if(self->first==NULL){Py_DECREF(self);returnNULL;}self->last=PyUnicode_FromString("");if(self->last==NULL){Py_DECREF(self);returnNULL;}self->number=0;}return(PyObject*)self;}staticintCustom_init(CustomObject*self,PyObject*args,PyObject*kwds){staticchar*kwlist[]={"first","last","number",NULL};PyObject*first=NULL,*last=NULL;if(!PyArg_ParseTupleAndKeywords(args,kwds,"|OOi",kwlist,&first,&last,&self->number))return-1;if(first){Py_XSETREF(self->first,Py_NewRef(first));}if(last){Py_XSETREF(self->last,Py_NewRef(last));}return0;}staticPyMemberDefCustom_members[]={{"first",Py_T_OBJECT_EX,offsetof(CustomObject,first),0,"first name"},{"last",Py_T_OBJECT_EX,offsetof(CustomObject,last),0,"last name"},{"number",Py_T_INT,offsetof(CustomObject,number),0,"custom number"},{NULL}/* Sentinel */};staticPyObject*Custom_name(CustomObject*self,PyObject*Py_UNUSED(ignored)){if(self->first==NULL){PyErr_SetString(PyExc_AttributeError,"first");returnNULL;}if(self->last==NULL){PyErr_SetString(PyExc_AttributeError,"last");returnNULL;}returnPyUnicode_FromFormat("%S %S",self->first,self->last);}staticPyMethodDefCustom_methods[]={{"name",(PyCFunction)Custom_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};staticPyTypeObjectCustomType={.ob_base=PyVarObject_HEAD_INIT(NULL,0).tp_name="custom2.Custom",.tp_doc=PyDoc_STR("Custom objects"),.tp_basicsize=sizeof(CustomObject),.tp_itemsize=0,.tp_flags=Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE,.tp_new=Custom_new,.tp_init=(initproc)Custom_init,.tp_dealloc=(destructor)Custom_dealloc,.tp_members=Custom_members,.tp_methods=Custom_methods,};staticPyModuleDefcustommodule={.m_base=PyModuleDef_HEAD_INIT,.m_name="custom2",.m_doc="Example module that creates an extension type.",.m_size=-1,};PyMODINIT_FUNCPyInit_custom2(void){PyObject*m;if(PyType_Ready(&CustomType)<0)returnNULL;m=PyModule_Create(&custommodule);if(m==NULL)returnNULL;if(PyModule_AddObjectRef(m,"Custom",(PyObject*)&CustomType)<0){Py_DECREF(m);returnNULL;}returnm;}
This version of the module has a number of changes.
TheCustom
type now has three data attributes in its C struct,first,last, andnumber. Thefirst andlast variables are Pythonstrings containing first and last names. Thenumber attribute is a C integer.
The object structure is updated accordingly:
typedefstruct{PyObject_HEADPyObject*first;/* first name */PyObject*last;/* last name */intnumber;}CustomObject;
Because we now have data to manage, we have to be more careful about objectallocation and deallocation. At a minimum, we need a deallocation method:
staticvoidCustom_dealloc(CustomObject*self){Py_XDECREF(self->first);Py_XDECREF(self->last);Py_TYPE(self)->tp_free((PyObject*)self);}
which is assigned to thetp_dealloc
member:
.tp_dealloc=(destructor)Custom_dealloc,
This method first clears the reference counts of the two Python attributes.Py_XDECREF()
correctly handles the case where its argument isNULL
(which might happen here iftp_new
failed midway). It thencalls thetp_free
member of the object’s type(computed byPy_TYPE(self)
) to free the object’s memory. Note thatthe object’s type might not beCustomType
, because the object maybe an instance of a subclass.
Note
The explicit cast todestructor
above is needed because we definedCustom_dealloc
to take aCustomObject*
argument, but thetp_dealloc
function pointer expects to receive aPyObject*
argument. Otherwise,the compiler will emit a warning. This is object-oriented polymorphism,in C!
We want to make sure that the first and last names are initialized to emptystrings, so we provide atp_new
implementation:
staticPyObject*Custom_new(PyTypeObject*type,PyObject*args,PyObject*kwds){CustomObject*self;self=(CustomObject*)type->tp_alloc(type,0);if(self!=NULL){self->first=PyUnicode_FromString("");if(self->first==NULL){Py_DECREF(self);returnNULL;}self->last=PyUnicode_FromString("");if(self->last==NULL){Py_DECREF(self);returnNULL;}self->number=0;}return(PyObject*)self;}
and install it in thetp_new
member:
.tp_new=Custom_new,
Thetp_new
handler is responsible for creating (as opposed to initializing)objects of the type. It is exposed in Python as the__new__()
method.It is not required to define atp_new
member, and indeed many extensiontypes will simply reusePyType_GenericNew()
as done in the firstversion of theCustom
type above. In this case, we use thetp_new
handler to initialize thefirst
andlast
attributes to non-NULL
default values.
tp_new
is passed the type being instantiated (not necessarilyCustomType
,if a subclass is instantiated) and any arguments passed when the type wascalled, and is expected to return the instance created.tp_new
handlersalways accept positional and keyword arguments, but they often ignore thearguments, leaving the argument handling to initializer (a.k.a.tp_init
in C or__init__
in Python) methods.
Note
tp_new
shouldn’t calltp_init
explicitly, as the interpreterwill do it itself.
Thetp_new
implementation calls thetp_alloc
slot to allocate memory:
self=(CustomObject*)type->tp_alloc(type,0);
Since memory allocation may fail, we must check thetp_alloc
result againstNULL
before proceeding.
Note
We didn’t fill thetp_alloc
slot ourselves. RatherPyType_Ready()
fills it for us by inheriting it from our base class,which isobject
by default. Most types use the default allocationstrategy.
Note
If you are creating a co-operativetp_new
(onethat calls a base type’stp_new
or__new__()
),you mustnot try to determine what method to call using method resolutionorder at runtime. Always statically determine what type you are going tocall, and call itstp_new
directly, or viatype->tp_base->tp_new
. If you do not do this, Python subclasses of yourtype that also inherit from other Python-defined classes may not work correctly.(Specifically, you may not be able to create instances of such subclasseswithout getting aTypeError
.)
We also define an initialization function which accepts arguments to provideinitial values for our instance:
staticintCustom_init(CustomObject*self,PyObject*args,PyObject*kwds){staticchar*kwlist[]={"first","last","number",NULL};PyObject*first=NULL,*last=NULL,*tmp;if(!PyArg_ParseTupleAndKeywords(args,kwds,"|OOi",kwlist,&first,&last,&self->number))return-1;if(first){tmp=self->first;Py_INCREF(first);self->first=first;Py_XDECREF(tmp);}if(last){tmp=self->last;Py_INCREF(last);self->last=last;Py_XDECREF(tmp);}return0;}
by filling thetp_init
slot.
.tp_init=(initproc)Custom_init,
Thetp_init
slot is exposed in Python as the__init__()
method. It is used to initialize an object after it’screated. Initializers always accept positional and keyword arguments,and they should return either0
on success or-1
on error.
Unlike thetp_new
handler, there is no guarantee thattp_init
is called at all (for example, thepickle
module by defaultdoesn’t call__init__()
on unpickled instances). It can also becalled multiple times. Anyone can call the__init__()
method onour objects. For this reason, we have to be extra careful when assigningthe new attribute values. We might be tempted, for example to assign thefirst
member like this:
if(first){Py_XDECREF(self->first);Py_INCREF(first);self->first=first;}
But this would be risky. Our type doesn’t restrict the type of thefirst
member, so it could be any kind of object. It could have adestructor that causes code to be executed that tries to access thefirst
member; or that destructor could release theGlobal interpreter Lock and let arbitrary code run in otherthreads that accesses and modifies our object.
To be paranoid and protect ourselves against this possibility, we almostalways reassign members before decrementing their reference counts. Whendon’t we have to do this?
when we absolutely know that the reference count is greater than 1;
when we know that deallocation of the object[1] will neither releasetheGIL nor cause any calls back into our type’s code;
when decrementing a reference count in a
tp_dealloc
handler on a type which doesn’t support cyclic garbage collection[2].
We want to expose our instance variables as attributes. There are anumber of ways to do that. The simplest way is to define member definitions:
staticPyMemberDefCustom_members[]={{"first",Py_T_OBJECT_EX,offsetof(CustomObject,first),0,"first name"},{"last",Py_T_OBJECT_EX,offsetof(CustomObject,last),0,"last name"},{"number",Py_T_INT,offsetof(CustomObject,number),0,"custom number"},{NULL}/* Sentinel */};
and put the definitions in thetp_members
slot:
.tp_members=Custom_members,
Each member definition has a member name, type, offset, access flags anddocumentation string. See theGeneric Attribute Management sectionbelow for details.
A disadvantage of this approach is that it doesn’t provide a way to restrict thetypes of objects that can be assigned to the Python attributes. We expect thefirst and last names to be strings, but any Python objects can be assigned.Further, the attributes can be deleted, setting the C pointers toNULL
. Eventhough we can make sure the members are initialized to non-NULL
values, themembers can be set toNULL
if the attributes are deleted.
We define a single method,Custom.name()
, that outputs the objects name as theconcatenation of the first and last names.
staticPyObject*Custom_name(CustomObject*self,PyObject*Py_UNUSED(ignored)){if(self->first==NULL){PyErr_SetString(PyExc_AttributeError,"first");returnNULL;}if(self->last==NULL){PyErr_SetString(PyExc_AttributeError,"last");returnNULL;}returnPyUnicode_FromFormat("%S %S",self->first,self->last);}
The method is implemented as a C function that takes aCustom
(orCustom
subclass) instance as the first argument. Methods always take aninstance as the first argument. Methods often take positional and keywordarguments as well, but in this case we don’t take any and don’t need to accepta positional argument tuple or keyword argument dictionary. This method isequivalent to the Python method:
defname(self):return"%s%s"%(self.first,self.last)
Note that we have to check for the possibility that ourfirst
andlast
members areNULL
. This is because they can be deleted, in whichcase they are set toNULL
. It would be better to prevent deletion of theseattributes and to restrict the attribute values to be strings. We’ll see how todo that in the next section.
Now that we’ve defined the method, we need to create an array of methoddefinitions:
staticPyMethodDefCustom_methods[]={{"name",(PyCFunction)Custom_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};
(note that we used theMETH_NOARGS
flag to indicate that the methodis expecting no arguments other thanself)
and assign it to thetp_methods
slot:
.tp_methods=Custom_methods,
Finally, we’ll make our type usable as a base class for subclassing. We’vewritten our methods carefully so far so that they don’t make any assumptionsabout the type of the object being created or used, so all we need to do isto add thePy_TPFLAGS_BASETYPE
to our class flag definition:
.tp_flags=Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE,
We renamePyInit_custom()
toPyInit_custom2()
, update themodule name in thePyModuleDef
struct, and update the full classname in thePyTypeObject
struct.
Finally, we update oursetup.py
file to include the new module,
fromsetuptoolsimportExtension,setupsetup(ext_modules=[Extension("custom",["custom.c"]),Extension("custom2",["custom2.c"]),])
and then we re-install so that we canimportcustom2
:
$python-mpipinstall.
2.3.Providing finer control over data attributes¶
In this section, we’ll provide finer control over how thefirst
andlast
attributes are set in theCustom
example. In the previousversion of our module, the instance variablesfirst
andlast
could be set to non-string values or even deleted. We want to make sure thatthese attributes always contain strings.
#define PY_SSIZE_T_CLEAN#include<Python.h>#include<stddef.h> /* for offsetof() */typedefstruct{PyObject_HEADPyObject*first;/* first name */PyObject*last;/* last name */intnumber;}CustomObject;staticvoidCustom_dealloc(CustomObject*self){Py_XDECREF(self->first);Py_XDECREF(self->last);Py_TYPE(self)->tp_free((PyObject*)self);}staticPyObject*Custom_new(PyTypeObject*type,PyObject*args,PyObject*kwds){CustomObject*self;self=(CustomObject*)type->tp_alloc(type,0);if(self!=NULL){self->first=PyUnicode_FromString("");if(self->first==NULL){Py_DECREF(self);returnNULL;}self->last=PyUnicode_FromString("");if(self->last==NULL){Py_DECREF(self);returnNULL;}self->number=0;}return(PyObject*)self;}staticintCustom_init(CustomObject*self,PyObject*args,PyObject*kwds){staticchar*kwlist[]={"first","last","number",NULL};PyObject*first=NULL,*last=NULL;if(!PyArg_ParseTupleAndKeywords(args,kwds,"|UUi",kwlist,&first,&last,&self->number))return-1;if(first){Py_SETREF(self->first,Py_NewRef(first));}if(last){Py_SETREF(self->last,Py_NewRef(last));}return0;}staticPyMemberDefCustom_members[]={{"number",Py_T_INT,offsetof(CustomObject,number),0,"custom number"},{NULL}/* Sentinel */};staticPyObject*Custom_getfirst(CustomObject*self,void*closure){returnPy_NewRef(self->first);}staticintCustom_setfirst(CustomObject*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the first attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The first attribute value must be a string");return-1;}Py_SETREF(self->first,Py_NewRef(value));return0;}staticPyObject*Custom_getlast(CustomObject*self,void*closure){returnPy_NewRef(self->last);}staticintCustom_setlast(CustomObject*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the last attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The last attribute value must be a string");return-1;}Py_SETREF(self->last,Py_NewRef(value));return0;}staticPyGetSetDefCustom_getsetters[]={{"first",(getter)Custom_getfirst,(setter)Custom_setfirst,"first name",NULL},{"last",(getter)Custom_getlast,(setter)Custom_setlast,"last name",NULL},{NULL}/* Sentinel */};staticPyObject*Custom_name(CustomObject*self,PyObject*Py_UNUSED(ignored)){returnPyUnicode_FromFormat("%S %S",self->first,self->last);}staticPyMethodDefCustom_methods[]={{"name",(PyCFunction)Custom_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};staticPyTypeObjectCustomType={.ob_base=PyVarObject_HEAD_INIT(NULL,0).tp_name="custom3.Custom",.tp_doc=PyDoc_STR("Custom objects"),.tp_basicsize=sizeof(CustomObject),.tp_itemsize=0,.tp_flags=Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE,.tp_new=Custom_new,.tp_init=(initproc)Custom_init,.tp_dealloc=(destructor)Custom_dealloc,.tp_members=Custom_members,.tp_methods=Custom_methods,.tp_getset=Custom_getsetters,};staticPyModuleDefcustommodule={.m_base=PyModuleDef_HEAD_INIT,.m_name="custom3",.m_doc="Example module that creates an extension type.",.m_size=-1,};PyMODINIT_FUNCPyInit_custom3(void){PyObject*m;if(PyType_Ready(&CustomType)<0)returnNULL;m=PyModule_Create(&custommodule);if(m==NULL)returnNULL;if(PyModule_AddObjectRef(m,"Custom",(PyObject*)&CustomType)<0){Py_DECREF(m);returnNULL;}returnm;}
To provide greater control, over thefirst
andlast
attributes,we’ll use custom getter and setter functions. Here are the functions forgetting and setting thefirst
attribute:
staticPyObject*Custom_getfirst(CustomObject*self,void*closure){Py_INCREF(self->first);returnself->first;}staticintCustom_setfirst(CustomObject*self,PyObject*value,void*closure){PyObject*tmp;if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the first attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The first attribute value must be a string");return-1;}tmp=self->first;Py_INCREF(value);self->first=value;Py_DECREF(tmp);return0;}
The getter function is passed aCustom
object and a “closure”, which isa void pointer. In this case, the closure is ignored. (The closure supports anadvanced usage in which definition data is passed to the getter and setter. Thiscould, for example, be used to allow a single set of getter and setter functionsthat decide the attribute to get or set based on data in the closure.)
The setter function is passed theCustom
object, the new value, and theclosure. The new value may beNULL
, in which case the attribute is beingdeleted. In our setter, we raise an error if the attribute is deleted or if itsnew value is not a string.
We create an array ofPyGetSetDef
structures:
staticPyGetSetDefCustom_getsetters[]={{"first",(getter)Custom_getfirst,(setter)Custom_setfirst,"first name",NULL},{"last",(getter)Custom_getlast,(setter)Custom_setlast,"last name",NULL},{NULL}/* Sentinel */};
and register it in thetp_getset
slot:
.tp_getset=Custom_getsetters,
The last item in aPyGetSetDef
structure is the “closure” mentionedabove. In this case, we aren’t using a closure, so we just passNULL
.
We also remove the member definitions for these attributes:
staticPyMemberDefCustom_members[]={{"number",Py_T_INT,offsetof(CustomObject,number),0,"custom number"},{NULL}/* Sentinel */};
We also need to update thetp_init
handler to onlyallow strings[3] to be passed:
staticintCustom_init(CustomObject*self,PyObject*args,PyObject*kwds){staticchar*kwlist[]={"first","last","number",NULL};PyObject*first=NULL,*last=NULL,*tmp;if(!PyArg_ParseTupleAndKeywords(args,kwds,"|UUi",kwlist,&first,&last,&self->number))return-1;if(first){tmp=self->first;Py_INCREF(first);self->first=first;Py_DECREF(tmp);}if(last){tmp=self->last;Py_INCREF(last);self->last=last;Py_DECREF(tmp);}return0;}
With these changes, we can assure that thefirst
andlast
members areneverNULL
so we can remove checks forNULL
values in almost all cases.This means that most of thePy_XDECREF()
calls can be converted toPy_DECREF()
calls. The only place we can’t change these calls is inthetp_dealloc
implementation, where there is the possibility that theinitialization of these members failed intp_new
.
We also rename the module initialization function and module name in theinitialization function, as we did before, and we add an extra definition to thesetup.py
file.
2.4.Supporting cyclic garbage collection¶
Python has acyclic garbage collector (GC) thatcan identify unneeded objects even when their reference counts are not zero.This can happen when objects are involved in cycles. For example, consider:
>>>l=[]>>>l.append(l)>>>dell
In this example, we create a list that contains itself. When we delete it, itstill has a reference from itself. Its reference count doesn’t drop to zero.Fortunately, Python’s cyclic garbage collector will eventually figure out thatthe list is garbage and free it.
In the second version of theCustom
example, we allowed any kind ofobject to be stored in thefirst
orlast
attributes[4].Besides, in the second and third versions, we allowed subclassingCustom
, and subclasses may add arbitrary attributes. For any ofthose two reasons,Custom
objects can participate in cycles:
>>>importcustom3>>>classDerived(custom3.Custom):pass...>>>n=Derived()>>>n.some_attribute=n
To allow aCustom
instance participating in a reference cycle tobe properly detected and collected by the cyclic GC, ourCustom
typeneeds to fill two additional slots and to enable a flag that enables these slots:
#define PY_SSIZE_T_CLEAN#include<Python.h>#include<stddef.h> /* for offsetof() */typedefstruct{PyObject_HEADPyObject*first;/* first name */PyObject*last;/* last name */intnumber;}CustomObject;staticintCustom_traverse(CustomObject*self,visitprocvisit,void*arg){Py_VISIT(self->first);Py_VISIT(self->last);return0;}staticintCustom_clear(CustomObject*self){Py_CLEAR(self->first);Py_CLEAR(self->last);return0;}staticvoidCustom_dealloc(CustomObject*self){PyObject_GC_UnTrack(self);Custom_clear(self);Py_TYPE(self)->tp_free((PyObject*)self);}staticPyObject*Custom_new(PyTypeObject*type,PyObject*args,PyObject*kwds){CustomObject*self;self=(CustomObject*)type->tp_alloc(type,0);if(self!=NULL){self->first=PyUnicode_FromString("");if(self->first==NULL){Py_DECREF(self);returnNULL;}self->last=PyUnicode_FromString("");if(self->last==NULL){Py_DECREF(self);returnNULL;}self->number=0;}return(PyObject*)self;}staticintCustom_init(CustomObject*self,PyObject*args,PyObject*kwds){staticchar*kwlist[]={"first","last","number",NULL};PyObject*first=NULL,*last=NULL;if(!PyArg_ParseTupleAndKeywords(args,kwds,"|UUi",kwlist,&first,&last,&self->number))return-1;if(first){Py_SETREF(self->first,Py_NewRef(first));}if(last){Py_SETREF(self->last,Py_NewRef(last));}return0;}staticPyMemberDefCustom_members[]={{"number",Py_T_INT,offsetof(CustomObject,number),0,"custom number"},{NULL}/* Sentinel */};staticPyObject*Custom_getfirst(CustomObject*self,void*closure){returnPy_NewRef(self->first);}staticintCustom_setfirst(CustomObject*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the first attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The first attribute value must be a string");return-1;}Py_XSETREF(self->first,Py_NewRef(value));return0;}staticPyObject*Custom_getlast(CustomObject*self,void*closure){returnPy_NewRef(self->last);}staticintCustom_setlast(CustomObject*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the last attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The last attribute value must be a string");return-1;}Py_XSETREF(self->last,Py_NewRef(value));return0;}staticPyGetSetDefCustom_getsetters[]={{"first",(getter)Custom_getfirst,(setter)Custom_setfirst,"first name",NULL},{"last",(getter)Custom_getlast,(setter)Custom_setlast,"last name",NULL},{NULL}/* Sentinel */};staticPyObject*Custom_name(CustomObject*self,PyObject*Py_UNUSED(ignored)){returnPyUnicode_FromFormat("%S %S",self->first,self->last);}staticPyMethodDefCustom_methods[]={{"name",(PyCFunction)Custom_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};staticPyTypeObjectCustomType={.ob_base=PyVarObject_HEAD_INIT(NULL,0).tp_name="custom4.Custom",.tp_doc=PyDoc_STR("Custom objects"),.tp_basicsize=sizeof(CustomObject),.tp_itemsize=0,.tp_flags=Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE|Py_TPFLAGS_HAVE_GC,.tp_new=Custom_new,.tp_init=(initproc)Custom_init,.tp_dealloc=(destructor)Custom_dealloc,.tp_traverse=(traverseproc)Custom_traverse,.tp_clear=(inquiry)Custom_clear,.tp_members=Custom_members,.tp_methods=Custom_methods,.tp_getset=Custom_getsetters,};staticPyModuleDefcustommodule={.m_base=PyModuleDef_HEAD_INIT,.m_name="custom4",.m_doc="Example module that creates an extension type.",.m_size=-1,};PyMODINIT_FUNCPyInit_custom4(void){PyObject*m;if(PyType_Ready(&CustomType)<0)returnNULL;m=PyModule_Create(&custommodule);if(m==NULL)returnNULL;if(PyModule_AddObjectRef(m,"Custom",(PyObject*)&CustomType)<0){Py_DECREF(m);returnNULL;}returnm;}
First, the traversal method lets the cyclic GC know about subobjects that couldparticipate in cycles:
staticintCustom_traverse(CustomObject*self,visitprocvisit,void*arg){intvret;if(self->first){vret=visit(self->first,arg);if(vret!=0)returnvret;}if(self->last){vret=visit(self->last,arg);if(vret!=0)returnvret;}return0;}
For each subobject that can participate in cycles, we need to call thevisit()
function, which is passed to the traversal method. Thevisit()
function takes as arguments the subobject and the extra argumentarg passed to the traversal method. It returns an integer value that must bereturned if it is non-zero.
Python provides aPy_VISIT()
macro that automates calling visitfunctions. WithPy_VISIT()
, we can minimize the amount of boilerplateinCustom_traverse
:
staticintCustom_traverse(CustomObject*self,visitprocvisit,void*arg){Py_VISIT(self->first);Py_VISIT(self->last);return0;}
Note
Thetp_traverse
implementation must name itsarguments exactlyvisit andarg in order to usePy_VISIT()
.
Second, we need to provide a method for clearing any subobjects that canparticipate in cycles:
staticintCustom_clear(CustomObject*self){Py_CLEAR(self->first);Py_CLEAR(self->last);return0;}
Notice the use of thePy_CLEAR()
macro. It is the recommended and safeway to clear data attributes of arbitrary types while decrementingtheir reference counts. If you were to callPy_XDECREF()
insteadon the attribute before setting it toNULL
, there is a possibilitythat the attribute’s destructor would call back into code that reads theattribute again (especially if there is a reference cycle).
Note
You could emulatePy_CLEAR()
by writing:
PyObject*tmp;tmp=self->first;self->first=NULL;Py_XDECREF(tmp);
Nevertheless, it is much easier and less error-prone to alwaysusePy_CLEAR()
when deleting an attribute. Don’ttry to micro-optimize at the expense of robustness!
The deallocatorCustom_dealloc
may call arbitrary code when clearingattributes. It means the circular GC can be triggered inside the function.Since the GC assumes reference count is not zero, we need to untrack the objectfrom the GC by callingPyObject_GC_UnTrack()
before clearing members.Here is our reimplemented deallocator usingPyObject_GC_UnTrack()
andCustom_clear
:
staticvoidCustom_dealloc(CustomObject*self){PyObject_GC_UnTrack(self);Custom_clear(self);Py_TYPE(self)->tp_free((PyObject*)self);}
Finally, we add thePy_TPFLAGS_HAVE_GC
flag to the class flags:
.tp_flags=Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE|Py_TPFLAGS_HAVE_GC,
That’s pretty much it. If we had written customtp_alloc
ortp_free
handlers, we’d need to modify them for cyclicgarbage collection. Most extensions will use the versions automatically provided.
2.5.Subclassing other types¶
It is possible to create new extension types that are derived from existingtypes. It is easiest to inherit from the built in types, since an extension caneasily use thePyTypeObject
it needs. It can be difficult to sharethesePyTypeObject
structures between extension modules.
In this example we will create aSubList
type that inherits from thebuilt-inlist
type. The new type will be completely compatible withregular lists, but will have an additionalincrement()
method thatincreases an internal counter:
>>>importsublist>>>s=sublist.SubList(range(3))>>>s.extend(s)>>>print(len(s))6>>>print(s.increment())1>>>print(s.increment())2
#define PY_SSIZE_T_CLEAN#include<Python.h>typedefstruct{PyListObjectlist;intstate;}SubListObject;staticPyObject*SubList_increment(SubListObject*self,PyObject*unused){self->state++;returnPyLong_FromLong(self->state);}staticPyMethodDefSubList_methods[]={{"increment",(PyCFunction)SubList_increment,METH_NOARGS,PyDoc_STR("increment state counter")},{NULL},};staticintSubList_init(SubListObject*self,PyObject*args,PyObject*kwds){if(PyList_Type.tp_init((PyObject*)self,args,kwds)<0)return-1;self->state=0;return0;}staticPyTypeObjectSubListType={PyVarObject_HEAD_INIT(NULL,0).tp_name="sublist.SubList",.tp_doc=PyDoc_STR("SubList objects"),.tp_basicsize=sizeof(SubListObject),.tp_itemsize=0,.tp_flags=Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE,.tp_init=(initproc)SubList_init,.tp_methods=SubList_methods,};staticPyModuleDefsublistmodule={PyModuleDef_HEAD_INIT,.m_name="sublist",.m_doc="Example module that creates an extension type.",.m_size=-1,};PyMODINIT_FUNCPyInit_sublist(void){PyObject*m;SubListType.tp_base=&PyList_Type;if(PyType_Ready(&SubListType)<0)returnNULL;m=PyModule_Create(&sublistmodule);if(m==NULL)returnNULL;if(PyModule_AddObjectRef(m,"SubList",(PyObject*)&SubListType)<0){Py_DECREF(m);returnNULL;}returnm;}
As you can see, the source code closely resembles theCustom
examples inprevious sections. We will break down the main differences between them.
typedefstruct{PyListObjectlist;intstate;}SubListObject;
The primary difference for derived type objects is that the base type’sobject structure must be the first value. The base type will already includethePyObject_HEAD()
at the beginning of its structure.
When a Python object is aSubList
instance, itsPyObject*
pointercan be safely cast to bothPyListObject*
andSubListObject*
:
staticintSubList_init(SubListObject*self,PyObject*args,PyObject*kwds){if(PyList_Type.tp_init((PyObject*)self,args,kwds)<0)return-1;self->state=0;return0;}
We see above how to call through to the__init__()
method of the basetype.
This pattern is important when writing a type with customtp_new
andtp_dealloc
members. Thetp_new
handler should not actuallycreate the memory for the object with itstp_alloc
,but let the base class handle it by calling its owntp_new
.
ThePyTypeObject
struct supports atp_base
specifying the type’s concrete base class. Due to cross-platform compilerissues, you can’t fill that field directly with a reference toPyList_Type
; it should be done later in the module initializationfunction:
PyMODINIT_FUNCPyInit_sublist(void){PyObject*m;SubListType.tp_base=&PyList_Type;if(PyType_Ready(&SubListType)<0)returnNULL;m=PyModule_Create(&sublistmodule);if(m==NULL)returnNULL;if(PyModule_AddObjectRef(m,"SubList",(PyObject*)&SubListType)<0){Py_DECREF(m);returnNULL;}returnm;}
Before callingPyType_Ready()
, the type structure must have thetp_base
slot filled in. When we are deriving anexisting type, it is not necessary to fill out thetp_alloc
slot withPyType_GenericNew()
– the allocation function from the basetype will be inherited.
After that, callingPyType_Ready()
and adding the type object to themodule is the same as with the basicCustom
examples.
Footnotes
[1]This is true when we know that the object is a basic type, like a string or afloat.
[2]We relied on this in thetp_dealloc
handlerin this example, because our type doesn’t support garbage collection.
We now know that the first and last members are strings, so perhaps wecould be less careful about decrementing their reference counts, however,we accept instances of string subclasses. Even though deallocating normalstrings won’t call back into our objects, we can’t guarantee that deallocatingan instance of a string subclass won’t call back into our objects.
[4]Also, even with our attributes restricted to strings instances, the usercould pass arbitrarystr
subclasses and therefore still createreference cycles.