1.Extending Python with C or C++

It is quite easy to add new built-in modules to Python, if you know how toprogram in C. Suchextension modules can do two things that can’t bedone directly in Python: they can implement new built-in object types, and theycan call C library functions and system calls.

To support extensions, the Python API (Application Programmers Interface)defines a set of functions, macros and variables that provide access to mostaspects of the Python run-time system. The Python API is incorporated in a Csource file by including the header"Python.h".

The compilation of an extension module depends on its intended use as well as onyour system setup; details are given in later chapters.

Note

The C extension interface is specific to CPython, and extension modules donot work on other Python implementations. In many cases, it is possible toavoid writing C extensions and preserve portability to other implementations.For example, if your use case is calling C library functions or system calls,you should consider using thectypes module or thecffi library rather than writingcustom C code.These modules let you write Python code to interface with C code and are moreportable between implementations of Python than writing and compiling a Cextension module.

1.1.A Simple Example

Let’s create an extension module calledspam (the favorite food of MontyPython fans…) and let’s say we want to create a Python interface to the Clibrary functionsystem()[1]. This function takes a null-terminatedcharacter string as argument and returns an integer. We want this function tobe callable from Python as follows:

>>>importspam>>>status=spam.system("ls -l")

Begin by creating a filespammodule.c. (Historically, if a module iscalledspam, the C file containing its implementation is calledspammodule.c; if the module name is very long, likespammify, themodule name can be justspammify.c.)

The first two lines of our file can be:

#define PY_SSIZE_T_CLEAN#include<Python.h>

which pulls in the Python API (you can add a comment describing the purpose ofthe module and a copyright notice if you like).

Note

Since Python may define some pre-processor definitions which affect the standardheaders on some systems, youmust includePython.h before any standardheaders are included.

#definePY_SSIZE_T_CLEAN was used to indicate thatPy_ssize_t should beused in some APIs instead ofint.It is not necessary since Python 3.13, but we keep it here for backward compatibility.SeeStrings and buffers for a description of this macro.

All user-visible symbols defined byPython.h have a prefix ofPy orPY, except those defined in standard header files. For convenience, andsince they are used extensively by the Python interpreter,"Python.h"includes a few standard header files:<stdio.h>,<string.h>,<errno.h>, and<stdlib.h>. If the latter header file does not exist onyour system, it declares the functionsmalloc(),free() andrealloc() directly.

The next thing we add to our module file is the C function that will be calledwhen the Python expressionspam.system(string) is evaluated (we’ll seeshortly how it ends up being called):

staticPyObject*spam_system(PyObject*self,PyObject*args){constchar*command;intsts;if(!PyArg_ParseTuple(args,"s",&command))returnNULL;sts=system(command);returnPyLong_FromLong(sts);}

There is a straightforward translation from the argument list in Python (forexample, the single expression"ls-l") to the arguments passed to the Cfunction. The C function always has two arguments, conventionally namedselfandargs.

Theself argument points to the module object for module-level functions;for a method it would point to the object instance.

Theargs argument will be a pointer to a Python tuple object containing thearguments. Each item of the tuple corresponds to an argument in the call’sargument list. The arguments are Python objects — in order to do anythingwith them in our C function we have to convert them to C values. The functionPyArg_ParseTuple() in the Python API checks the argument types andconverts them to C values. It uses a template string to determine the requiredtypes of the arguments as well as the types of the C variables into which tostore the converted values. More about this later.

PyArg_ParseTuple() returns true (nonzero) if all arguments have the righttype and its components have been stored in the variables whose addresses arepassed. It returns false (zero) if an invalid argument list was passed. In thelatter case it also raises an appropriate exception so the calling function canreturnNULL immediately (as we saw in the example).

1.2.Intermezzo: Errors and Exceptions

An important convention throughout the Python interpreter is the following: whena function fails, it should set an exception condition and return an error value(usually-1 or aNULL pointer). Exception information is stored inthree members of the interpreter’s thread state. These areNULL ifthere is no exception. Otherwise they are the C equivalents of the membersof the Python tuple returned bysys.exc_info(). These are theexception type, exception instance, and a traceback object. It is importantto know about them to understand how errors are passed around.

The Python API defines a number of functions to set various types of exceptions.

The most common one isPyErr_SetString(). Its arguments are an exceptionobject and a C string. The exception object is usually a predefined object likePyExc_ZeroDivisionError. The C string indicates the cause of the errorand is converted to a Python string object and stored as the “associated value”of the exception.

Another useful function isPyErr_SetFromErrno(), which only takes anexception argument and constructs the associated value by inspection of theglobal variableerrno. The most general function isPyErr_SetObject(), which takes two object arguments, the exception andits associated value. You don’t need toPy_INCREF() the objects passedto any of these functions.

You can test non-destructively whether an exception has been set withPyErr_Occurred(). This returns the current exception object, orNULLif no exception has occurred. You normally don’t need to callPyErr_Occurred() to see whether an error occurred in a function call,since you should be able to tell from the return value.

When a functionf that calls another functiong detects that the latterfails,f should itself return an error value (usuallyNULL or-1). Itshouldnot call one of thePyErr_* functions — one has alreadybeen called byg.f’s caller is then supposed to also return an errorindication toits caller, againwithout callingPyErr_*, and so on— the most detailed cause of the error was already reported by the functionthat first detected it. Once the error reaches the Python interpreter’s mainloop, this aborts the currently executing Python code and tries to find anexception handler specified by the Python programmer.

(There are situations where a module can actually give a more detailed errormessage by calling anotherPyErr_* function, and in such cases it isfine to do so. As a general rule, however, this is not necessary, and can causeinformation about the cause of the error to be lost: most operations can failfor a variety of reasons.)

To ignore an exception set by a function call that failed, the exceptioncondition must be cleared explicitly by callingPyErr_Clear(). The onlytime C code should callPyErr_Clear() is if it doesn’t want to pass theerror on to the interpreter but wants to handle it completely by itself(possibly by trying something else, or pretending nothing went wrong).

Every failingmalloc() call must be turned into an exception — thedirect caller ofmalloc() (orrealloc()) must callPyErr_NoMemory() and return a failure indicator itself. All theobject-creating functions (for example,PyLong_FromLong()) already dothis, so this note is only relevant to those who callmalloc() directly.

Also note that, with the important exception ofPyArg_ParseTuple() andfriends, functions that return an integer status usually return a positive valueor zero for success and-1 for failure, like Unix system calls.

Finally, be careful to clean up garbage (by makingPy_XDECREF() orPy_DECREF() calls for objects you have already created) when you returnan error indicator!

The choice of which exception to raise is entirely yours. There are predeclaredC objects corresponding to all built-in Python exceptions, such asPyExc_ZeroDivisionError, which you can use directly. Of course, youshould choose exceptions wisely — don’t usePyExc_TypeError to meanthat a file couldn’t be opened (that should probably bePyExc_OSError).If something’s wrong with the argument list, thePyArg_ParseTuple()function usually raisesPyExc_TypeError. If you have an argument whosevalue must be in a particular range or must satisfy other conditions,PyExc_ValueError is appropriate.

You can also define a new exception that is unique to your module. For this, youusually declare a static object variable at the beginning of your file:

staticPyObject*SpamError;

and initialize it in your module’s initialization function (PyInit_spam())with an exception object:

PyMODINIT_FUNCPyInit_spam(void){PyObject*m;m=PyModule_Create(&spammodule);if(m==NULL)returnNULL;SpamError=PyErr_NewException("spam.error",NULL,NULL);if(PyModule_AddObjectRef(m,"error",SpamError)<0){Py_CLEAR(SpamError);Py_DECREF(m);returnNULL;}returnm;}

Note that the Python name for the exception object isspam.error. ThePyErr_NewException() function may create a class with the base classbeingException (unless another class is passed in instead ofNULL),described inBuilt-in Exceptions.

Note also that theSpamError variable retains a reference to the newlycreated exception class; this is intentional! Since the exception could beremoved from the module by external code, an owned reference to the class isneeded to ensure that it will not be discarded, causingSpamError tobecome a dangling pointer. Should it become a dangling pointer, C code whichraises the exception could cause a core dump or other unintended side effects.

We discuss the use ofPyMODINIT_FUNC as a function return type later in thissample.

Thespam.error exception can be raised in your extension module using acall toPyErr_SetString() as shown below:

staticPyObject*spam_system(PyObject*self,PyObject*args){constchar*command;intsts;if(!PyArg_ParseTuple(args,"s",&command))returnNULL;sts=system(command);if(sts<0){PyErr_SetString(SpamError,"System command failed");returnNULL;}returnPyLong_FromLong(sts);}

1.3.Back to the Example

Going back to our example function, you should now be able to understand thisstatement:

if(!PyArg_ParseTuple(args,"s",&command))returnNULL;

It returnsNULL (the error indicator for functions returning object pointers)if an error is detected in the argument list, relying on the exception set byPyArg_ParseTuple(). Otherwise the string value of the argument has beencopied to the local variablecommand. This is a pointer assignment andyou are not supposed to modify the string to which it points (so in Standard C,the variablecommand should properly be declared asconstchar*command).

The next statement is a call to the Unix functionsystem(), passing itthe string we just got fromPyArg_ParseTuple():

sts=system(command);

Ourspam.system() function must return the value ofsts as aPython object. This is done using the functionPyLong_FromLong().

returnPyLong_FromLong(sts);

In this case, it will return an integer object. (Yes, even integers are objectson the heap in Python!)

If you have a C function that returns no useful argument (a function returningvoid), the corresponding Python function must returnNone. Youneed this idiom to do so (which is implemented by thePy_RETURN_NONEmacro):

Py_INCREF(Py_None);returnPy_None;

Py_None is the C name for the special Python objectNone. It is agenuine Python object rather than aNULL pointer, which means “error” in mostcontexts, as we have seen.

1.4.The Module’s Method Table and Initialization Function

I promised to show howspam_system() is called from Python programs.First, we need to list its name and address in a “method table”:

staticPyMethodDefSpamMethods[]={...{"system",spam_system,METH_VARARGS,"Execute a shell command."},...{NULL,NULL,0,NULL}/* Sentinel */};

Note the third entry (METH_VARARGS). This is a flag telling the interpreterthe calling convention to be used for the C function. It should normally alwaysbeMETH_VARARGS orMETH_VARARGS|METH_KEYWORDS; a value of0 meansthat an obsolete variant ofPyArg_ParseTuple() is used.

When using onlyMETH_VARARGS, the function should expect the Python-levelparameters to be passed in as a tuple acceptable for parsing viaPyArg_ParseTuple(); more information on this function is provided below.

TheMETH_KEYWORDS bit may be set in the third field if keywordarguments should be passed to the function. In this case, the C function shouldaccept a thirdPyObject* parameter which will be a dictionary of keywords.UsePyArg_ParseTupleAndKeywords() to parse the arguments to such afunction.

The method table must be referenced in the module definition structure:

staticstructPyModuleDefspammodule={PyModuleDef_HEAD_INIT,"spam",/* name of module */spam_doc,/* module documentation, may be NULL */-1,/* size of per-interpreter state of the module,                 or -1 if the module keeps state in global variables. */SpamMethods};

This structure, in turn, must be passed to the interpreter in the module’sinitialization function. The initialization function must be namedPyInit_name(), wherename is the name of the module, and should be theonly non-static item defined in the module file:

PyMODINIT_FUNCPyInit_spam(void){returnPyModule_Create(&spammodule);}

Note thatPyMODINIT_FUNC declares the function asPyObject* return type,declares any special linkage declarations required by the platform, and for C++declares the function asextern"C".

When the Python program imports modulespam for the first time,PyInit_spam() is called. (See below for comments about embedding Python.)It callsPyModule_Create(), which returns a module object, andinserts built-in function objects into the newly created module based upon thetable (an array ofPyMethodDef structures) found in the module definition.PyModule_Create() returns a pointer to the module objectthat it creates. It may abort with a fatal error forcertain errors, or returnNULL if the module could not be initializedsatisfactorily. The init function must return the module object to its caller,so that it then gets inserted intosys.modules.

When embedding Python, thePyInit_spam() function is not calledautomatically unless there’s an entry in thePyImport_Inittab table.To add the module to the initialization table, usePyImport_AppendInittab(),optionally followed by an import of the module:

#define PY_SSIZE_T_CLEAN#include<Python.h>intmain(intargc,char*argv[]){PyStatusstatus;PyConfigconfig;PyConfig_InitPythonConfig(&config);/* Add a built-in module, before Py_Initialize */if(PyImport_AppendInittab("spam",PyInit_spam)==-1){fprintf(stderr,"Error: could not extend in-built modules table\n");exit(1);}/* Pass argv[0] to the Python interpreter */status=PyConfig_SetBytesString(&config,&config.program_name,argv[0]);if(PyStatus_Exception(status)){gotoexception;}/* Initialize the Python interpreter.  Required.       If this step fails, it will be a fatal error. */status=Py_InitializeFromConfig(&config);if(PyStatus_Exception(status)){gotoexception;}PyConfig_Clear(&config);/* Optionally import the module; alternatively,       import can be deferred until the embedded script       imports it. */PyObject*pmodule=PyImport_ImportModule("spam");if(!pmodule){PyErr_Print();fprintf(stderr,"Error: could not import module 'spam'\n");}// ... use Python C API here ...return0;exception:PyConfig_Clear(&config);Py_ExitStatusException(status);}

Note

Removing entries fromsys.modules or importing compiled modules intomultiple interpreters within a process (or following afork() without aninterveningexec()) can create problems for some extension modules.Extension module authors should exercise caution when initializing internal datastructures.

A more substantial example module is included in the Python source distributionasModules/xxmodule.c. This file may be used as a template or simplyread as an example.

Note

Unlike ourspam example,xxmodule usesmulti-phase initialization(new in Python 3.5), where a PyModuleDef structure is returned fromPyInit_spam, and creation of the module is left to the import machinery.For details on multi-phase initialization, seePEP 489.

1.5.Compilation and Linkage

There are two more things to do before you can use your new extension: compilingand linking it with the Python system. If you use dynamic loading, the detailsmay depend on the style of dynamic loading your system uses; see the chaptersabout building extension modules (chapterBuilding C and C++ Extensions) and additionalinformation that pertains only to building on Windows (chapterBuilding C and C++ Extensions on Windows) for more information about this.

If you can’t use dynamic loading, or if you want to make your module a permanentpart of the Python interpreter, you will have to change the configuration setupand rebuild the interpreter. Luckily, this is very simple on Unix: just placeyour file (spammodule.c for example) in theModules/ directoryof an unpacked source distribution, add a line to the fileModules/Setup.local describing your file:

spamspammodule.o

and rebuild the interpreter by runningmake in the topleveldirectory. You can also runmake in theModules/subdirectory, but then you must first rebuildMakefile there by running‘make Makefile’. (This is necessary each time you change theSetup file.)

If your module requires additional libraries to link with, these can be listedon the line in the configuration file as well, for instance:

spamspammodule.o-lX11

1.6.Calling Python Functions from C

So far we have concentrated on making C functions callable from Python. Thereverse is also useful: calling Python functions from C. This is especially thecase for libraries that support so-called “callback” functions. If a Cinterface makes use of callbacks, the equivalent Python often needs to provide acallback mechanism to the Python programmer; the implementation will requirecalling the Python callback functions from a C callback. Other uses are alsoimaginable.

Fortunately, the Python interpreter is easily called recursively, and there is astandard interface to call a Python function. (I won’t dwell on how to call thePython parser with a particular string as input — if you’re interested, have alook at the implementation of the-c command line option inModules/main.c from the Python source code.)

Calling a Python function is easy. First, the Python program must somehow passyou the Python function object. You should provide a function (or some otherinterface) to do this. When this function is called, save a pointer to thePython function object (be careful toPy_INCREF() it!) in a globalvariable — or wherever you see fit. For example, the following function mightbe part of a module definition:

staticPyObject*my_callback=NULL;staticPyObject*my_set_callback(PyObject*dummy,PyObject*args){PyObject*result=NULL;PyObject*temp;if(PyArg_ParseTuple(args,"O:set_callback",&temp)){if(!PyCallable_Check(temp)){PyErr_SetString(PyExc_TypeError,"parameter must be callable");returnNULL;}Py_XINCREF(temp);/* Add a reference to new callback */Py_XDECREF(my_callback);/* Dispose of previous callback */my_callback=temp;/* Remember new callback *//* Boilerplate to return "None" */Py_INCREF(Py_None);result=Py_None;}returnresult;}

This function must be registered with the interpreter using theMETH_VARARGS flag; this is described in sectionThe Module’s Method Table and Initialization Function. ThePyArg_ParseTuple() function and its arguments are documented in sectionExtracting Parameters in Extension Functions.

The macrosPy_XINCREF() andPy_XDECREF() increment/decrement thereference count of an object and are safe in the presence ofNULL pointers(but note thattemp will not beNULL in this context). More info on themin sectionReference Counts.

Later, when it is time to call the function, you call the C functionPyObject_CallObject(). This function has two arguments, both pointers toarbitrary Python objects: the Python function, and the argument list. Theargument list must always be a tuple object, whose length is the number ofarguments. To call the Python function with no arguments, pass inNULL, oran empty tuple; to call it with one argument, pass a singleton tuple.Py_BuildValue() returns a tuple when its format string consists of zeroor more format codes between parentheses. For example:

intarg;PyObject*arglist;PyObject*result;...arg=123;.../* Time to call the callback */arglist=Py_BuildValue("(i)",arg);result=PyObject_CallObject(my_callback,arglist);Py_DECREF(arglist);

PyObject_CallObject() returns a Python object pointer: this is the returnvalue of the Python function.PyObject_CallObject() is“reference-count-neutral” with respect to its arguments. In the example a newtuple was created to serve as the argument list, which isPy_DECREF()-ed immediately after thePyObject_CallObject()call.

The return value ofPyObject_CallObject() is “new”: either it is a brandnew object, or it is an existing object whose reference count has beenincremented. So, unless you want to save it in a global variable, you shouldsomehowPy_DECREF() the result, even (especially!) if you are notinterested in its value.

Before you do this, however, it is important to check that the return valueisn’tNULL. If it is, the Python function terminated by raising an exception.If the C code that calledPyObject_CallObject() is called from Python, itshould now return an error indication to its Python caller, so the interpretercan print a stack trace, or the calling Python code can handle the exception.If this is not possible or desirable, the exception should be cleared by callingPyErr_Clear(). For example:

if(result==NULL)returnNULL;/* Pass error back */...useresult...Py_DECREF(result);

Depending on the desired interface to the Python callback function, you may alsohave to provide an argument list toPyObject_CallObject(). In some casesthe argument list is also provided by the Python program, through the sameinterface that specified the callback function. It can then be saved and usedin the same manner as the function object. In other cases, you may have toconstruct a new tuple to pass as the argument list. The simplest way to do thisis to callPy_BuildValue(). For example, if you want to pass an integralevent code, you might use the following code:

PyObject*arglist;...arglist=Py_BuildValue("(l)",eventcode);result=PyObject_CallObject(my_callback,arglist);Py_DECREF(arglist);if(result==NULL)returnNULL;/* Pass error back *//* Here maybe use the result */Py_DECREF(result);

Note the placement ofPy_DECREF(arglist) immediately after the call, beforethe error check! Also note that strictly speaking this code is not complete:Py_BuildValue() may run out of memory, and this should be checked.

You may also call a function with keyword arguments by usingPyObject_Call(), which supports arguments and keyword arguments. As inthe above example, we usePy_BuildValue() to construct the dictionary.

PyObject*dict;...dict=Py_BuildValue("{s:i}","name",val);result=PyObject_Call(my_callback,NULL,dict);Py_DECREF(dict);if(result==NULL)returnNULL;/* Pass error back *//* Here maybe use the result */Py_DECREF(result);

1.7.Extracting Parameters in Extension Functions

ThePyArg_ParseTuple() function is declared as follows:

intPyArg_ParseTuple(PyObject*arg,constchar*format,...);

Thearg argument must be a tuple object containing an argument list passedfrom Python to a C function. Theformat argument must be a format string,whose syntax is explained inParsing arguments and building values in the Python/C API ReferenceManual. The remaining arguments must be addresses of variables whose type isdetermined by the format string.

Note that whilePyArg_ParseTuple() checks that the Python arguments havethe required types, it cannot check the validity of the addresses of C variablespassed to the call: if you make mistakes there, your code will probably crash orat least overwrite random bits in memory. So be careful!

Note that any Python object references which are provided to the caller areborrowed references; do not decrement their reference count!

Some example calls:

#define PY_SSIZE_T_CLEAN#include<Python.h>
intok;inti,j;longk,l;constchar*s;Py_ssize_tsize;ok=PyArg_ParseTuple(args,"");/* No arguments *//* Python call: f() */
ok=PyArg_ParseTuple(args,"s",&s);/* A string *//* Possible Python call: f('whoops!') */
ok=PyArg_ParseTuple(args,"lls",&k,&l,&s);/* Two longs and a string *//* Possible Python call: f(1, 2, 'three') */
ok=PyArg_ParseTuple(args,"(ii)s#",&i,&j,&s,&size);/* A pair of ints and a string, whose size is also returned *//* Possible Python call: f((1, 2), 'three') */
{constchar*file;constchar*mode="r";intbufsize=0;ok=PyArg_ParseTuple(args,"s|si",&file,&mode,&bufsize);/* A string, and optionally another string and an integer *//* Possible Python calls:       f('spam')       f('spam', 'w')       f('spam', 'wb', 100000) */}
{intleft,top,right,bottom,h,v;ok=PyArg_ParseTuple(args,"((ii)(ii))(ii)",&left,&top,&right,&bottom,&h,&v);/* A rectangle and a point *//* Possible Python call:       f(((0, 0), (400, 300)), (10, 10)) */}
{Py_complexc;ok=PyArg_ParseTuple(args,"D:myfunction",&c);/* a complex, also providing a function name for errors *//* Possible Python call: myfunction(1+2j) */}

1.8.Keyword Parameters for Extension Functions

ThePyArg_ParseTupleAndKeywords() function is declared as follows:

intPyArg_ParseTupleAndKeywords(PyObject*arg,PyObject*kwdict,constchar*format,char*const*kwlist,...);

Thearg andformat parameters are identical to those of thePyArg_ParseTuple() function. Thekwdict parameter is the dictionary ofkeywords received as the third parameter from the Python runtime. Thekwlistparameter is aNULL-terminated list of strings which identify the parameters;the names are matched with the type information fromformat from left toright. On success,PyArg_ParseTupleAndKeywords() returns true, otherwiseit returns false and raises an appropriate exception.

Note

Nested tuples cannot be parsed when using keyword arguments! Keyword parameterspassed in which are not present in thekwlist will causeTypeError tobe raised.

Here is an example module which uses keywords, based on an example by GeoffPhilbrick (philbrick@hks.com):

#define PY_SSIZE_T_CLEAN#include<Python.h>staticPyObject*keywdarg_parrot(PyObject*self,PyObject*args,PyObject*keywds){intvoltage;constchar*state="a stiff";constchar*action="voom";constchar*type="Norwegian Blue";staticchar*kwlist[]={"voltage","state","action","type",NULL};if(!PyArg_ParseTupleAndKeywords(args,keywds,"i|sss",kwlist,&voltage,&state,&action,&type))returnNULL;printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",action,voltage);printf("-- Lovely plumage, the %s -- It's %s!\n",type,state);Py_RETURN_NONE;}staticPyMethodDefkeywdarg_methods[]={/* The cast of the function is necessary since PyCFunction values     * only take two PyObject* parameters, and keywdarg_parrot() takes     * three.     */{"parrot",(PyCFunction)(void(*)(void))keywdarg_parrot,METH_VARARGS|METH_KEYWORDS,"Print a lovely skit to standard output."},{NULL,NULL,0,NULL}/* sentinel */};staticstructPyModuleDefkeywdargmodule={PyModuleDef_HEAD_INIT,"keywdarg",NULL,-1,keywdarg_methods};PyMODINIT_FUNCPyInit_keywdarg(void){returnPyModule_Create(&keywdargmodule);}

1.9.Building Arbitrary Values

This function is the counterpart toPyArg_ParseTuple(). It is declaredas follows:

PyObject*Py_BuildValue(constchar*format,...);

It recognizes a set of format units similar to the ones recognized byPyArg_ParseTuple(), but the arguments (which are input to the function,not output) must not be pointers, just values. It returns a new Python object,suitable for returning from a C function called from Python.

One difference withPyArg_ParseTuple(): while the latter requires itsfirst argument to be a tuple (since Python argument lists are always representedas tuples internally),Py_BuildValue() does not always build a tuple. Itbuilds a tuple only if its format string contains two or more format units. Ifthe format string is empty, it returnsNone; if it contains exactly oneformat unit, it returns whatever object is described by that format unit. Toforce it to return a tuple of size 0 or one, parenthesize the format string.

Examples (to the left the call, to the right the resulting Python value):

Py_BuildValue("")                        NonePy_BuildValue("i", 123)                  123Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)Py_BuildValue("s", "hello")              'hello'Py_BuildValue("y", "hello")              b'hello'Py_BuildValue("ss", "hello", "world")    ('hello', 'world')Py_BuildValue("s#", "hello", 4)          'hell'Py_BuildValue("y#", "hello", 4)          b'hell'Py_BuildValue("()")                      ()Py_BuildValue("(i)", 123)                (123,)Py_BuildValue("(ii)", 123, 456)          (123, 456)Py_BuildValue("(i,i)", 123, 456)         (123, 456)Py_BuildValue("[i,i]", 123, 456)         [123, 456]Py_BuildValue("{s:i,s:i}",              "abc", 123, "def", 456)    {'abc': 123, 'def': 456}Py_BuildValue("((ii)(ii)) (ii)",              1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))

1.10.Reference Counts

In languages like C or C++, the programmer is responsible for dynamic allocationand deallocation of memory on the heap. In C, this is done using the functionsmalloc() andfree(). In C++, the operatorsnew anddelete are used with essentially the same meaning and we’ll restrictthe following discussion to the C case.

Every block of memory allocated withmalloc() should eventually bereturned to the pool of available memory by exactly one call tofree().It is important to callfree() at the right time. If a block’s addressis forgotten butfree() is not called for it, the memory it occupiescannot be reused until the program terminates. This is called amemoryleak. On the other hand, if a program callsfree() for a block and thencontinues to use the block, it creates a conflict with reuse of the blockthrough anothermalloc() call. This is calledusing freed memory.It has the same bad consequences as referencing uninitialized data — coredumps, wrong results, mysterious crashes.

Common causes of memory leaks are unusual paths through the code. For instance,a function may allocate a block of memory, do some calculation, and then freethe block again. Now a change in the requirements for the function may add atest to the calculation that detects an error condition and can returnprematurely from the function. It’s easy to forget to free the allocated memoryblock when taking this premature exit, especially when it is added later to thecode. Such leaks, once introduced, often go undetected for a long time: theerror exit is taken only in a small fraction of all calls, and most modernmachines have plenty of virtual memory, so the leak only becomes apparent in along-running process that uses the leaking function frequently. Therefore, it’simportant to prevent leaks from happening by having a coding convention orstrategy that minimizes this kind of errors.

Since Python makes heavy use ofmalloc() andfree(), it needs astrategy to avoid memory leaks as well as the use of freed memory. The chosenmethod is calledreference counting. The principle is simple: everyobject contains a counter, which is incremented when a reference to the objectis stored somewhere, and which is decremented when a reference to it is deleted.When the counter reaches zero, the last reference to the object has been deletedand the object is freed.

An alternative strategy is calledautomatic garbage collection.(Sometimes, reference counting is also referred to as a garbage collectionstrategy, hence my use of “automatic” to distinguish the two.) The bigadvantage of automatic garbage collection is that the user doesn’t need to callfree() explicitly. (Another claimed advantage is an improvement in speedor memory usage — this is no hard fact however.) The disadvantage is that forC, there is no truly portable automatic garbage collector, while referencecounting can be implemented portably (as long as the functionsmalloc()andfree() are available — which the C Standard guarantees). Maybe someday a sufficiently portable automatic garbage collector will be available for C.Until then, we’ll have to live with reference counts.

While Python uses the traditional reference counting implementation, it alsooffers a cycle detector that works to detect reference cycles. This allowsapplications to not worry about creating direct or indirect circular references;these are the weakness of garbage collection implemented using only referencecounting. Reference cycles consist of objects which contain (possibly indirect)references to themselves, so that each object in the cycle has a reference countwhich is non-zero. Typical reference counting implementations are not able toreclaim the memory belonging to any objects in a reference cycle, or referencedfrom the objects in the cycle, even though there are no further references tothe cycle itself.

The cycle detector is able to detect garbage cycles and can reclaim them.Thegc module exposes a way to run the detector (thecollect() function), as well as configurationinterfaces and the ability to disable the detector at runtime.

1.10.1.Reference Counting in Python

There are two macros,Py_INCREF(x) andPy_DECREF(x), which handle theincrementing and decrementing of the reference count.Py_DECREF() alsofrees the object when the count reaches zero. For flexibility, it doesn’t callfree() directly — rather, it makes a call through a function pointer inthe object’stype object. For this purpose (and others), every objectalso contains a pointer to its type object.

The big question now remains: when to usePy_INCREF(x) andPy_DECREF(x)?Let’s first introduce some terms. Nobody “owns” an object; however, you canown a reference to an object. An object’s reference count is now definedas the number of owned references to it. The owner of a reference isresponsible for callingPy_DECREF() when the reference is no longerneeded. Ownership of a reference can be transferred. There are three ways todispose of an owned reference: pass it on, store it, or callPy_DECREF().Forgetting to dispose of an owned reference creates a memory leak.

It is also possible toborrow[2] a reference to an object. Theborrower of a reference should not callPy_DECREF(). The borrower mustnot hold on to the object longer than the owner from which it was borrowed.Using a borrowed reference after the owner has disposed of it risks using freedmemory and should be avoided completely[3].

The advantage of borrowing over owning a reference is that you don’t need totake care of disposing of the reference on all possible paths through the code— in other words, with a borrowed reference you don’t run the risk of leakingwhen a premature exit is taken. The disadvantage of borrowing over owning isthat there are some subtle situations where in seemingly correct code a borrowedreference can be used after the owner from which it was borrowed has in factdisposed of it.

A borrowed reference can be changed into an owned reference by callingPy_INCREF(). This does not affect the status of the owner from which thereference was borrowed — it creates a new owned reference, and gives fullowner responsibilities (the new owner must dispose of the reference properly, aswell as the previous owner).

1.10.2.Ownership Rules

Whenever an object reference is passed into or out of a function, it is part ofthe function’s interface specification whether ownership is transferred with thereference or not.

Most functions that return a reference to an object pass on ownership with thereference. In particular, all functions whose function it is to create a newobject, such asPyLong_FromLong() andPy_BuildValue(), passownership to the receiver. Even if the object is not actually new, you stillreceive ownership of a new reference to that object. For instance,PyLong_FromLong() maintains a cache of popular values and can return areference to a cached item.

Many functions that extract objects from other objects also transfer ownershipwith the reference, for instancePyObject_GetAttrString(). The pictureis less clear, here, however, since a few common routines are exceptions:PyTuple_GetItem(),PyList_GetItem(),PyDict_GetItem(), andPyDict_GetItemString() all return references that you borrow from thetuple, list or dictionary.

The functionPyImport_AddModule() also returns a borrowed reference, eventhough it may actually create the object it returns: this is possible because anowned reference to the object is stored insys.modules.

When you pass an object reference into another function, in general, thefunction borrows the reference from you — if it needs to store it, it will usePy_INCREF() to become an independent owner. There are exactly twoimportant exceptions to this rule:PyTuple_SetItem() andPyList_SetItem(). These functions take over ownership of the item passedto them — even if they fail! (Note thatPyDict_SetItem() and friendsdon’t take over ownership — they are “normal.”)

When a C function is called from Python, it borrows references to its argumentsfrom the caller. The caller owns a reference to the object, so the borrowedreference’s lifetime is guaranteed until the function returns. Only when such aborrowed reference must be stored or passed on, it must be turned into an ownedreference by callingPy_INCREF().

The object reference returned from a C function that is called from Python mustbe an owned reference — ownership is transferred from the function to itscaller.

1.10.3.Thin Ice

There are a few situations where seemingly harmless use of a borrowed referencecan lead to problems. These all have to do with implicit invocations of theinterpreter, which can cause the owner of a reference to dispose of it.

The first and most important case to know about is usingPy_DECREF() onan unrelated object while borrowing a reference to a list item. For instance:

voidbug(PyObject*list){PyObject*item=PyList_GetItem(list,0);PyList_SetItem(list,1,PyLong_FromLong(0L));PyObject_Print(item,stdout,0);/* BUG! */}

This function first borrows a reference tolist[0], then replaceslist[1] with the value0, and finally prints the borrowed reference.Looks harmless, right? But it’s not!

Let’s follow the control flow intoPyList_SetItem(). The list ownsreferences to all its items, so when item 1 is replaced, it has to dispose ofthe original item 1. Now let’s suppose the original item 1 was an instance of auser-defined class, and let’s further suppose that the class defined a__del__() method. If this class instance has a reference count of 1,disposing of it will call its__del__() method.

Since it is written in Python, the__del__() method can execute arbitraryPython code. Could it perhaps do something to invalidate the reference toitem inbug()? You bet! Assuming that the list passed intobug() is accessible to the__del__() method, it could execute astatement to the effect ofdellist[0], and assuming this was the lastreference to that object, it would free the memory associated with it, therebyinvalidatingitem.

The solution, once you know the source of the problem, is easy: temporarilyincrement the reference count. The correct version of the function reads:

voidno_bug(PyObject*list){PyObject*item=PyList_GetItem(list,0);Py_INCREF(item);PyList_SetItem(list,1,PyLong_FromLong(0L));PyObject_Print(item,stdout,0);Py_DECREF(item);}

This is a true story. An older version of Python contained variants of this bugand someone spent a considerable amount of time in a C debugger to figure outwhy his__del__() methods would fail…

The second case of problems with a borrowed reference is a variant involvingthreads. Normally, multiple threads in the Python interpreter can’t get in eachother’s way, because there is a global lock protecting Python’s entire objectspace. However, it is possible to temporarily release this lock using the macroPy_BEGIN_ALLOW_THREADS, and to re-acquire it usingPy_END_ALLOW_THREADS. This is common around blocking I/O calls, tolet other threads use the processor while waiting for the I/O to complete.Obviously, the following function has the same problem as the previous one:

voidbug(PyObject*list){PyObject*item=PyList_GetItem(list,0);Py_BEGIN_ALLOW_THREADS...someblockingI/Ocall...Py_END_ALLOW_THREADSPyObject_Print(item,stdout,0);/* BUG! */}

1.10.4.NULL Pointers

In general, functions that take object references as arguments do not expect youto pass themNULL pointers, and will dump core (or cause later core dumps) ifyou do so. Functions that return object references generally returnNULL onlyto indicate that an exception occurred. The reason for not testing forNULLarguments is that functions often pass the objects they receive on to otherfunction — if each function were to test forNULL, there would be a lot ofredundant tests and the code would run more slowly.

It is better to test forNULL only at the “source:” when a pointer that may beNULL is received, for example, frommalloc() or from a function thatmay raise an exception.

The macrosPy_INCREF() andPy_DECREF() do not check forNULLpointers — however, their variantsPy_XINCREF() andPy_XDECREF()do.

The macros for checking for a particular object type (Pytype_Check()) don’tcheck forNULL pointers — again, there is much code that calls several ofthese in a row to test an object against various different expected types, andthis would generate redundant tests. There are no variants withNULLchecking.

The C function calling mechanism guarantees that the argument list passed to Cfunctions (args in the examples) is neverNULL — in fact it guaranteesthat it is always a tuple[4].

It is a severe error to ever let aNULL pointer “escape” to the Python user.

1.11.Writing Extensions in C++

It is possible to write extension modules in C++. Some restrictions apply. Ifthe main program (the Python interpreter) is compiled and linked by the Ccompiler, global or static objects with constructors cannot be used. This isnot a problem if the main program is linked by the C++ compiler. Functions thatwill be called by the Python interpreter (in particular, module initializationfunctions) have to be declared usingextern"C". It is unnecessary toenclose the Python header files inextern"C"{...} — they use this formalready if the symbol__cplusplus is defined (all recent C++ compilersdefine this symbol).

1.12.Providing a C API for an Extension Module

Many extension modules just provide new functions and types to be used fromPython, but sometimes the code in an extension module can be useful for otherextension modules. For example, an extension module could implement a type“collection” which works like lists without order. Just like the standard Pythonlist type has a C API which permits extension modules to create and manipulatelists, this new collection type should have a set of C functions for directmanipulation from other extension modules.

At first sight this seems easy: just write the functions (without declaring themstatic, of course), provide an appropriate header file, and documentthe C API. And in fact this would work if all extension modules were alwayslinked statically with the Python interpreter. When modules are used as sharedlibraries, however, the symbols defined in one module may not be visible toanother module. The details of visibility depend on the operating system; somesystems use one global namespace for the Python interpreter and all extensionmodules (Windows, for example), whereas others require an explicit list ofimported symbols at module link time (AIX is one example), or offer a choice ofdifferent strategies (most Unices). And even if symbols are globally visible,the module whose functions one wishes to call might not have been loaded yet!

Portability therefore requires not to make any assumptions about symbolvisibility. This means that all symbols in extension modules should be declaredstatic, except for the module’s initialization function, in order toavoid name clashes with other extension modules (as discussed in sectionThe Module’s Method Table and Initialization Function). And it means that symbols thatshould be accessible fromother extension modules must be exported in a different way.

Python provides a special mechanism to pass C-level information (pointers) fromone extension module to another one: Capsules. A Capsule is a Python data typewhich stores a pointer (void*). Capsules can only be created andaccessed via their C API, but they can be passed around like any other Pythonobject. In particular, they can be assigned to a name in an extension module’snamespace. Other extension modules can then import this module, retrieve thevalue of this name, and then retrieve the pointer from the Capsule.

There are many ways in which Capsules can be used to export the C API of anextension module. Each function could get its own Capsule, or all C API pointerscould be stored in an array whose address is published in a Capsule. And thevarious tasks of storing and retrieving the pointers can be distributed indifferent ways between the module providing the code and the client modules.

Whichever method you choose, it’s important to name your Capsules properly.The functionPyCapsule_New() takes a name parameter(constchar*); you’re permitted to pass in aNULL name, butwe strongly encourage you to specify a name. Properly named Capsules providea degree of runtime type-safety; there is no feasible way to tell one unnamedCapsule from another.

In particular, Capsules used to expose C APIs should be given a name followingthis convention:

modulename.attributename

The convenience functionPyCapsule_Import() makes it easy toload a C API provided via a Capsule, but only if the Capsule’s namematches this convention. This behavior gives C API users a high degreeof certainty that the Capsule they load contains the correct C API.

The following example demonstrates an approach that puts most of the burden onthe writer of the exporting module, which is appropriate for commonly usedlibrary modules. It stores all C API pointers (just one in the example!) in anarray ofvoid pointers which becomes the value of a Capsule. The headerfile corresponding to the module provides a macro that takes care of importingthe module and retrieving its C API pointers; client modules only have to callthis macro before accessing the C API.

The exporting module is a modification of thespam module from sectionA Simple Example. The functionspam.system() does not callthe C library functionsystem() directly, but a functionPySpam_System(), which would of course do something more complicated inreality (such as adding “spam” to every command). This functionPySpam_System() is also exported to other extension modules.

The functionPySpam_System() is a plain C function, declaredstatic like everything else:

staticintPySpam_System(constchar*command){returnsystem(command);}

The functionspam_system() is modified in a trivial way:

staticPyObject*spam_system(PyObject*self,PyObject*args){constchar*command;intsts;if(!PyArg_ParseTuple(args,"s",&command))returnNULL;sts=PySpam_System(command);returnPyLong_FromLong(sts);}

In the beginning of the module, right after the line

#include<Python.h>

two more lines must be added:

#define SPAM_MODULE#include"spammodule.h"

The#define is used to tell the header file that it is being included in theexporting module, not a client module. Finally, the module’s initializationfunction must take care of initializing the C API pointer array:

PyMODINIT_FUNCPyInit_spam(void){PyObject*m;staticvoid*PySpam_API[PySpam_API_pointers];PyObject*c_api_object;m=PyModule_Create(&spammodule);if(m==NULL)returnNULL;/* Initialize the C API pointer array */PySpam_API[PySpam_System_NUM]=(void*)PySpam_System;/* Create a Capsule containing the API pointer array's address */c_api_object=PyCapsule_New((void*)PySpam_API,"spam._C_API",NULL);if(PyModule_Add(m,"_C_API",c_api_object)<0){Py_DECREF(m);returnNULL;}returnm;}

Note thatPySpam_API is declaredstatic; otherwise the pointerarray would disappear whenPyInit_spam() terminates!

The bulk of the work is in the header filespammodule.h, which lookslike this:

#ifndef Py_SPAMMODULE_H#define Py_SPAMMODULE_H#ifdef __cplusplusextern"C"{#endif/* Header file for spammodule *//* C API functions */#define PySpam_System_NUM 0#define PySpam_System_RETURN int#define PySpam_System_PROTO (const char *command)/* Total number of C API pointers */#define PySpam_API_pointers 1#ifdef SPAM_MODULE/* This section is used when compiling spammodule.c */staticPySpam_System_RETURNPySpam_SystemPySpam_System_PROTO;#else/* This section is used in modules that use spammodule's API */staticvoid**PySpam_API;#define PySpam_System \ (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])/* Return -1 on error, 0 on success. * PyCapsule_Import will set an exception if there's an error. */staticintimport_spam(void){PySpam_API=(void**)PyCapsule_Import("spam._C_API",0);return(PySpam_API!=NULL)?0:-1;}#endif#ifdef __cplusplus}#endif#endif/* !defined(Py_SPAMMODULE_H) */

All that a client module must do in order to have access to the functionPySpam_System() is to call the function (or rather macro)import_spam() in its initialization function:

PyMODINIT_FUNCPyInit_client(void){PyObject*m;m=PyModule_Create(&clientmodule);if(m==NULL)returnNULL;if(import_spam()<0)returnNULL;/* additional initialization can happen here */returnm;}

The main disadvantage of this approach is that the filespammodule.h israther complicated. However, the basic structure is the same for each functionthat is exported, so it has to be learned only once.

Finally it should be mentioned that Capsules offer additional functionality,which is especially useful for memory allocation and deallocation of the pointerstored in a Capsule. The details are described in the Python/C API ReferenceManual in the sectionCapsules and in the implementation of Capsules (filesInclude/pycapsule.h andObjects/pycapsule.c in the Python sourcecode distribution).

Footnotes

[1]

An interface for this function already exists in the standard moduleos— it was chosen as a simple and straightforward example.

[2]

The metaphor of “borrowing” a reference is not completely correct: the ownerstill has a copy of the reference.

[3]

Checking that the reference count is at least 1does not work — thereference count itself could be in freed memory and may thus be reused foranother object!

[4]

These guarantees don’t hold when you use the “old” style calling convention —this is still found in much existing code.