Many people like to say that Python is a fantastic glue language.Hopefully, this Chapter will convince you that this is true. The firstadopters of Python for science were typically people who used it toglue together large application codes running on super-computers. Notonly was it much nicer to code in Python than in a shell script orPerl, in addition, the ability to easily extend Python made itrelatively easy to create new classes and types specifically adaptedto the problems being solved. From the interactions of these earlycontributors, Numeric emerged as an array-like object that could beused to pass data between these applications.
As Numeric has matured and developed into NumPy, people have been ableto write more code directly in NumPy. Often this code is fast-enoughfor production use, but there are still times that there is a need toaccess compiled code. Either to get that last bit of efficiency out ofthe algorithm or to make it easier to access widely-available codeswritten in C/C++ or Fortran.
This chapter will review many of the tools that are available for thepurpose of accessing code written in other compiled languages. Thereare many resources available for learning to call other compiledlibraries from Python and the purpose of this Chapter is not to makeyou an expert. The main goal is to make you aware of some of thepossibilities so that you will know what to “Google” in order to learn more.
While Python is a great language and a pleasure to code in, itsdynamic nature results in overhead that can cause some code (i.e.raw computations inside of for loops) to be up 10-100 times slowerthan equivalent code written in a static compiled language. Inaddition, it can cause memory usage to be larger than necessary astemporary arrays are created and destroyed during computation. Formany types of computing needs, the extra slow-down and memoryconsumption can often not be spared (at least for time- or memory-critical portions of your code). Therefore one of the most commonneeds is to call out from Python code to a fast, machine-code routine(e.g. compiled using C/C++ or Fortran). The fact that this isrelatively easy to do is a big reason why Python is such an excellenthigh-level language for scientific and engineering programming.
Their are two basic approaches to calling compiled code: writing anextension module that is then imported to Python using the importcommand, or calling a shared-library subroutine directly from Pythonusing thectypesmodule. Writing an extension module is the most common method.
Warning
Calling C-code from Python can result in Python crashes if you are notcareful. None of the approaches in this chapter are immune. You haveto know something about the way data is handled by both NumPy and bythe third-party library being used.
Extension modules were discussed inWriting an extension module.The most basic way to interface with compiled code is to writean extension module and construct a module method that callsthe compiled code. For improved readability, your method shouldtake advantage of thePyArg_ParseTuple call to convert betweenPython objects and C data-types. For standard C data-types thereis probably already a built-in converter. For others you may needto write your own converter and use the"O&" format string whichallows you to specify a function that will be used to perform theconversion from the Python object to whatever C-structures are needed.
Once the conversions to the appropriate C-structures and C data-typeshave been performed, the next step in the wrapper is to call theunderlying function. This is straightforward if the underlyingfunction is in C or C++. However, in order to call Fortran code youmust be familiar with how Fortran subroutines are called from C/C++using your compiler and platform. This can vary somewhat platforms andcompilers (which is another reason f2py makes life much simpler forinterfacing Fortran code) but generally involves underscore manglingof the name and the fact that all variables are passed by reference(i.e. all arguments are pointers).
The advantage of the hand-generated wrapper is that you have completecontrol over how the C-library gets used and called which can lead toa lean and tight interface with minimal over-head. The disadvantage isthat you have to write, debug, and maintain C-code, although most ofit can be adapted using the time-honored technique of“cutting-pasting-and-modifying” from other extension modules. Because,the procedure of calling out to additional C-code is fairlyregimented, code-generation procedures have been developed to makethis process easier. One of these code-generation techniques isdistributed with NumPy and allows easy integration with Fortran and(simple) C code. This package, f2py, will be covered briefly in thenext section.
F2py allows you to automatically construct an extension module thatinterfaces to routines in Fortran 77/90/95 code. It has the ability toparse Fortran 77/90/95 code and automatically generate Pythonsignatures for the subroutines it encounters, or you can guide how thesubroutine interfaces with Python by constructing an interface-definition-file(or modifying the f2py-produced one).
Probably the easiest way to introduce f2py is to offer a simpleexample. Here is one of the subroutines contained in a file namedadd.f:
C SUBROUTINE ZADD(A,B,C,N)C DOUBLE COMPLEX A(*) DOUBLE COMPLEX B(*) DOUBLE COMPLEX C(*) INTEGER N DO 20 J = 1, N C(J) = A(J)+B(J) 20 CONTINUE END
This routine simply adds the elements in two contiguous arrays andplaces the result in a third. The memory for all three arrays must beprovided by the calling routine. A very basic interface to thisroutine can be automatically generated by f2py:
f2py-maddadd.f
You should be able to run this command assuming your search-path isset-up properly. This command will produce an extension module namedaddmodule.c in the current directory. This extension module can now becompiled and used from Python just like any other extension module.
You can also get f2py to compile add.f and also compile its producedextension module leaving only a shared-library extension file that canbe imported from Python:
f2py-c-maddadd.f
This command leaves a file named add.{ext} in the current directory(where {ext} is the appropriate extension for a python extensionmodule on your platform — so, pyd,etc. ). This module may then beimported from Python. It will contain a method for each subroutine inadd (zadd, cadd, dadd, sadd). The docstring of each method containsinformation about how the module method may be called:
>>>importadd>>>printadd.zadd.__doc__zadd - Function signature: zadd(a,b,c,n)Required arguments: a : input rank-1 array('D') with bounds (*) b : input rank-1 array('D') with bounds (*) c : input rank-1 array('D') with bounds (*) n : input int
The default interface is a very literal translation of the fortrancode into Python. The Fortran array arguments must now be NumPy arraysand the integer argument should be an integer. The interface willattempt to convert all arguments to their required types (and shapes)and issue an error if unsuccessful. However, because it knows nothingabout the semantics of the arguments (such that C is an output and nshould really match the array sizes), it is possible to abuse thisfunction in ways that can cause Python to crash. For example:
>>>add.zadd([1,2,3],[1,2],[3,4],1000)
will cause a program crash on most systems. Under the covers, thelists are being converted to proper arrays but then the underlying addloop is told to cycle way beyond the borders of the allocated memory.
In order to improve the interface, directives should be provided. Thisis accomplished by constructing an interface definition file. It isusually best to start from the interface file that f2py can produce(where it gets its default behavior from). To get f2py to generate theinterface file use the -h option:
f2py-hadd.pyf-maddadd.f
This command leaves the file add.pyf in the current directory. Thesection of this file corresponding to zadd is:
subroutine zadd(a,b,c,n) ! in :add:add.f double complex dimension(*) :: a double complex dimension(*) :: b double complex dimension(*) :: c integer :: nend subroutine zadd
By placing intent directives and checking code, the interface can becleaned up quite a bit until the Python module method is both easierto use and more robust.
subroutine zadd(a,b,c,n) ! in :add:add.f double complex dimension(n) :: a double complex dimension(n) :: b double complex intent(out),dimension(n) :: c integer intent(hide),depend(a) :: n=len(a)end subroutine zadd
The intent directive, intent(out) is used to tell f2py thatc isan output variable and should be created by the interface before beingpassed to the underlying code. The intent(hide) directive tells f2pyto not allow the user to specify the variable,n, but instead toget it from the size ofa. The depend(a ) directive isnecessary to tell f2py that the value of n depends on the input a (sothat it won’t try to create the variable n until the variable a iscreated).
After modifyingadd.pyf, the new python module file can be generatedby compiling bothadd.f95 andadd.pyf:
f2py-cadd.pyfadd.f95
The new interface has docstring:
>>>importadd>>>printadd.zadd.__doc__zadd - Function signature: c = zadd(a,b)Required arguments: a : input rank-1 array('D') with bounds (n) b : input rank-1 array('D') with bounds (n)Return objects: c : rank-1 array('D') with bounds (n)
Now, the function can be called in a much more robust way:
>>>add.zadd([1,2,3],[4,5,6])array([ 5.+0.j, 7.+0.j, 9.+0.j])
Notice the automatic conversion to the correct format that occurred.
The nice interface can also be generated automatically by placing thevariable directives as special comments in the original fortran code.Thus, if I modify the source code to contain:
C SUBROUTINE ZADD(A,B,C,N)CCF2PY INTENT(OUT) :: CCF2PY INTENT(HIDE) :: NCF2PY DOUBLE COMPLEX :: A(N)CF2PY DOUBLE COMPLEX :: B(N)CF2PY DOUBLE COMPLEX :: C(N) DOUBLE COMPLEX A(*) DOUBLE COMPLEX B(*) DOUBLE COMPLEX C(*) INTEGER N DO 20 J = 1, N C(J) = A(J) + B(J) 20 CONTINUE END
Then, I can compile the extension module using:
f2py-c-maddadd.f
The resulting signature for the function add.zadd is exactly the sameone that was created previously. If the original source code hadcontainedA(N) instead ofA(*) and so forth withB andC,then I could obtain (nearly) the same interface simply by placing theINTENT(OUT)::C comment line in the source code. The only differenceis thatN would be an optional input that would default to the lengthofA.
For comparison with the other methods to be discussed. Here is anotherexample of a function that filters a two-dimensional array of doubleprecision floating-point numbers using a fixed averaging filter. Theadvantage of using Fortran to index into multi-dimensional arraysshould be clear from this example.
SUBROUTINE DFILTER2D(A,B,M,N)C DOUBLE PRECISION A(M,N) DOUBLE PRECISION B(M,N) INTEGER N, MCF2PY INTENT(OUT) :: BCF2PY INTENT(HIDE) :: NCF2PY INTENT(HIDE) :: M DO 20 I = 2,M-1 DO 40 J=2,N-1 B(I,J) = A(I,J) + $ (A(I-1,J)+A(I+1,J) + $ A(I,J-1)+A(I,J+1) )*0.5D0 + $ (A(I-1,J-1) + A(I-1,J+1) + $ A(I+1,J-1) + A(I+1,J+1))*0.25D0 40 CONTINUE 20 CONTINUE END
This code can be compiled and linked into an extension module namedfilter using:
f2py-c-mfilterfilter.f
This will produce an extension module named filter.so in the currentdirectory with a method named dfilter2d that returns a filteredversion of the input.
The f2py program is written in Python and can be run from inside your codeto compile Fortran code at runtime, as follows:
fromnumpyimportf2pywithopen("add.f")assourcefile:sourcecode=sourcefile.read()f2py.compile(sourcecode,modulename='add')importadd
The source string can be any valid Fortran code. If you want to savethe extension-module source code then a suitable file-name can beprovided by thesource_fn keyword to the compile function.
If you want to distribute your f2py extension module, then you onlyneed to include the .pyf file and the Fortran code. The distutilsextensions in NumPy allow you to define an extension module entirelyin terms of this interface file. A validsetup.py file allowingdistribution of theadd.f module (as part of the packagef2py_examples so that it would be loaded asf2py_examples.add) is:
defconfiguration(parent_package='',top_path=None)fromnumpy.distutils.misc_utilimportConfigurationconfig=Configuration('f2py_examples',parent_package,top_path)config.add_extension('add',sources=['add.pyf','add.f'])returnconfigif__name__=='__main__':fromnumpy.distutils.coreimportsetupsetup(**configuration(top_path='').todict())
Installation of the new package is easy using:
pythonsetup.pyinstall
assuming you have the proper permissions to write to the main site-packages directory for the version of Python you are using. For theresulting package to work, you need to create a file named__init__.py(in the same directory asadd.pyf). Notice the extension module isdefined entirely in terms of theadd.pyf andadd.f files. Theconversion of the .pyf file to a .c file is handled bynumpy.disutils.
The interface definition file (.pyf) is how you can fine-tune theinterface between Python and Fortran. There is decent documentationfor f2py found in the numpy/f2py/docs directory where-ever NumPy isinstalled on your system (usually under site-packages). There is alsomore information on using f2py (including how to use it to wrap Ccodes) athttp://www.scipy.org/Cookbook under the “Using NumPy withOther Languages” heading.
The f2py method of linking compiled code is currently the mostsophisticated and integrated approach. It allows clean separation ofPython with compiled code while still allowing for separatedistribution of the extension module. The only draw-back is that itrequires the existence of a Fortran compiler in order for a user toinstall the code. However, with the existence of the free-compilersg77, gfortran, and g95, as well as high-quality commercial compilers,this restriction is not particularly onerous. In my opinion, Fortranis still the easiest way to write fast and clear code for scientificcomputing. It handles complex numbers, and multi-dimensional indexingin the most straightforward way. Be aware, however, that some Fortrancompilers will not be able to optimize code as well as good hand-written C-code.
Cython is a compiler for a Python dialect that adds(optional) static typing for speed, and allows mixing C or C++ codeinto your modules. It produces C or C++ extensions that can be compiledand imported in Python code.
If you are writing an extension module that will include quite a bit of yourown algorithmic code as well, then Cython is a good match. Among itsfeatures is the ability to easily and quicklywork with multidimensional arrays.
Notice that Cython is an extension-module generator only. Unlike f2py,it includes no automatic facility for compiling and linkingthe extension module (which must be done in the usual fashion). Itdoes provide a modified distutils class calledbuild_ext which letsyou build an extension module from a.pyx source. Thus, you couldwrite in asetup.py file:
fromCython.Distutilsimportbuild_extfromdistutils.extensionimportExtensionfromdistutils.coreimportsetupimportnumpysetup(name='mine',description='Nothing',ext_modules=[Extension('filter',['filter.pyx'],include_dirs=[numpy.get_include()])],cmdclass={'build_ext':build_ext})
Adding the NumPy include directory is, of course, only necessary ifyou are using NumPy arrays in the extension module (which is what weassume you are using Cython for). The distutils extensions in NumPyalso include support for automatically producing the extension-moduleand linking it from a.pyx file. It works so that if the user doesnot have Cython installed, then it looks for a file with the samefile-name but a.c extension which it then uses instead of tryingto produce the.c file again.
If you just use Cython to compile a standard Python module, then youwill get a C extension module that typically runs a bit faster than theequivalent Python module. Further speed increases can be gained by usingthecdef keyword to statically define C variables.
Let’s look at two examples we’ve seen before to see how they might beimplemented using Cython. These examples were compiled into extensionmodules using Cython 0.21.1.
Here is part of a Cython module namedadd.pyx which implements thecomplex addition functions we previously implemented using f2py:
cimport cythoncimport numpy as npimport numpy as np# We need to initialize NumPy.np.import_array()#@cython.boundscheck(False)def zadd(in1, in2): cdef double complex[:] a = in1.ravel() cdef double complex[:] b = in2.ravel() out = np.empty(a.shape[0], np.complex64) cdef double complex[:] c = out.ravel() for i in range(c.shape[0]): c[i].real = a[i].real + b[i].real c[i].imag = a[i].imag + b[i].imag return out
This module shows use of thecimport statement to load the definitionsfrom thenumpy.pxd header that ships with Cython. It looks like NumPy isimported twice;cimport only makes the NumPy C-API available, while theregularimport causes a Python-style import at runtime and makes itpossible to call into the familiar NumPy Python API.
The example also demonstrates Cython’s “typed memoryviews”, which are likeNumPy arrays at the C level, in the sense that they are shaped and stridedarrays that know their own extent (unlike a C array addressed through a barepointer). The syntaxdoublecomplex[:] denotes a one-dimensional array(vector) of doubles, with arbitrary strides. A contiguous array of ints wouldbeint[::1], while a matrix of floats would befloat[:,:].
Shown commented is thecython.boundscheck decorator, which turnsbounds-checking for memory view accesses on or off on a per-function basis.We can use this to further speed up our code, at the expense of safety(or a manual check prior to entering the loop).
Other than the view syntax, the function is immediately readable to a Pythonprogrammer. Static typing of the variablei is implicit. Instead of theview syntax, we could also have used Cython’s special NumPy array syntax,but the view syntax is preferred.
The two-dimensional example we created using Fortran is just as easy to writein Cython:
cimport numpy as npimport numpy as npnp.import_array()def filter(img): cdef double[:, :] a = np.asarray(img, dtype=np.double) out = np.zeros(img.shape, dtype=np.double) cdef double[:, ::1] b = out cdef np.npy_intp i, j for i in range(1, a.shape[0] - 1): for j in range(1, a.shape[1] - 1): b[i, j] = (a[i, j] + .5 * ( a[i-1, j] + a[i+1, j] + a[i, j-1] + a[i, j+1]) + .25 * ( a[i-1, j-1] + a[i-1, j+1] + a[i+1, j-1] + a[i+1, j+1])) return out
This 2-d averaging filter runs quickly because the loop is in C andthe pointer computations are done only as needed. If the code above iscompiled as a moduleimage, then a 2-d image,img, can be filteredusing this code very quickly using:
importimageout=image.filter(img)
Regarding the code, two things are of note: firstly, it is impossible toreturn a memory view to Python. Instead, a NumPy arrayout is firstcreated, and then a viewb onto this array is used for the computation.Secondly, the viewb is typeddouble[:,::1]. This means 2-d arraywith contiguous rows, i.e., C matrix order. Specifying the order explicitlycan speed up some algorithms since they can skip stride computations.
Cython is the extension mechanism of choice for several scientific Pythonlibraries, including Scipy, Pandas, SAGE, scikit-image and scikit-learn,as well as the XML processing library LXML.The language and compiler are well-maintained.
There are several disadvantages of using Cython:
malloc and friends), it’s easy to introducememory leaks. However, just compiling a Python module renamed to.pyxcan already speed it up, and adding a few type declarations can givedramatic speedups in some code.One big advantage of Cython-generated extension modules is that they areeasy to distribute. In summary, Cython is a very capable tool for eithergluing C code or generating an extension module quickly and should not beover-looked. It is especially useful for people that can’t or won’t writeC or Fortran code.
Ctypesis a Python extension module, included in the stdlib, thatallows you to call an arbitrary function in a shared library directlyfrom Python. This approach allows you to interface with C-code directlyfrom Python. This opens up an enormous number of libraries for use fromPython. The drawback, however, is that coding mistakes can lead to uglyprogram crashes very easily (just as can happen in C) because there islittle type or bounds checking done on the parameters. This is especiallytrue when array data is passed in as a pointer to a raw memorylocation. The responsibility is then on you that the subroutine willnot access memory outside the actual array area. But, if you don’tmind living a little dangerously ctypes can be an effective tool forquickly taking advantage of a large shared library (or writingextended functionality in your own shared library).
Because the ctypes approach exposes a raw interface to the compiledcode it is not always tolerant of user mistakes. Robust use of thectypes module typically involves an additional layer of Python code inorder to check the data types and array bounds of objects passed tothe underlying subroutine. This additional layer of checking (not tomention the conversion from ctypes objects to C-data-types that ctypesitself performs), will make the interface slower than a hand-writtenextension-module interface. However, this overhead should be negligibleif the C-routine being called is doing any significant amount of work.If you are a great Python programmer with weak C skills, ctypes is aneasy way to write a useful interface to a (shared) library of compiledcode.
To use ctypes you must
There are several requirements for a shared library that can be usedwith ctypes that are platform specific. This guide assumes you havesome familiarity with making a shared library on your system (orsimply have a shared library available to you). Items to remember are:
A shared library must be compiled in a special way (e.g. usingthe-shared flag with gcc).
On some platforms (e.g. Windows) , a shared library requires a.def file that specifies the functions to be exported. For example amylib.def file might contain:
LIBRARYmylib.dllEXPORTScool_function1cool_function2
Alternatively, you may be able to use the storage-class specifier__declspec(dllexport) in the C-definition of the function to avoidthe need for this.def file.
There is no standard way in Python distutils to create a standardshared library (an extension module is a “special” shared libraryPython understands) in a cross-platform manner. Thus, a bigdisadvantage of ctypes at the time of writing this book is that it isdifficult to distribute in a cross-platform manner a Python extensionthat uses ctypes and includes your own code which should be compiledas a shared library on the users system.
A simple, but robust way to load the shared library is to get theabsolute path name and load it using the cdll object of ctypes:
lib=ctypes.cdll[<full_path_name>]
However, on Windows accessing an attribute of thecdll method willload the first DLL by that name found in the current directory or onthe PATH. Loading the absolute path name requires a little finesse forcross-platform work since the extension of shared libraries varies.There is actypes.util.find_library utility available that cansimplify the process of finding the library to load but it is notfoolproof. Complicating matters, different platforms have differentdefault extensions used by shared libraries (e.g. .dll – Windows, .so– Linux, .dylib – Mac OS X). This must also be taken into account ifyou are using ctypes to wrap code that needs to work on severalplatforms.
NumPy provides a convenience function calledctypeslib.load_library (name, path). This function takes the nameof the shared library (including any prefix like ‘lib’ but excludingthe extension) and a path where the shared library can be located. Itreturns a ctypes library object or raises anOSError if the librarycannot be found or raises anImportError if the ctypes module is notavailable. (Windows users: the ctypes library object loaded usingload_library is always loaded assuming cdecl calling convention.See the ctypes documentation underctypes.windll and/orctypes.oledllfor ways to load libraries under other calling conventions).
The functions in the shared library are available as attributes of thectypes library object (returned fromctypeslib.load_library) oras items usinglib['func_name'] syntax. The latter method forretrieving a function name is particularly useful if the function namecontains characters that are not allowable in Python variable names.
Python ints/longs, strings, and unicode objects are automaticallyconverted as needed to equivalent ctypes arguments The None object isalso converted automatically to a NULL pointer. All other Pythonobjects must be converted to ctypes-specific types. There are two waysaround this restriction that allow ctypes to integrate with otherobjects.
_as_parameter_ method for the object you want to pass in. The_as_parameter_ method must return a Python int which will be passeddirectly to the function._as_parameter_ attribute).NumPy uses both methods with a preference for the second methodbecause it can be safer. The ctypes attribute of the ndarray returnsan object that has an_as_parameter_ attribute which returns aninteger representing the address of the ndarray to which it isassociated. As a result, one can pass this ctypes attribute objectdirectly to a function expecting a pointer to the data in yourndarray. The caller must be sure that the ndarray object is of thecorrect type, shape, and has the correct flags set or risk nastycrashes if the data-pointer to inappropriate arrays are passed in.
To implement the second method, NumPy provides the class-factoryfunctionndpointer in thectypeslib module. Thisclass-factory function produces an appropriate class that can beplaced in an argtypes attribute entry of a ctypes function. The classwill contain a from_param method which ctypes will use to convert anyndarray passed in to the function to a ctypes-recognized object. Inthe process, the conversion will perform checking on any properties ofthe ndarray that were specified by the user in the call tondpointer.Aspects of the ndarray that can be checked include the data-type, thenumber-of-dimensions, the shape, and/or the state of the flags on anyarray passed. The return value of the from_param method is the ctypesattribute of the array which (because it contains the_as_parameter_attribute pointing to the array data area) can be used by ctypesdirectly.
The ctypes attribute of an ndarray is also endowed with additionalattributes that may be convenient when passing additional informationabout the array into a ctypes function. The attributesdata,shape, andstrides can provide ctypes compatible typescorresponding to the data-area, the shape, and the strides of thearray. The data attribute returns ac_void_p representing apointer to the data area. The shape and strides attributes each returnan array of ctypes integers (or None representing a NULL pointer, if a0-d array). The base ctype of the array is a ctype integer of the samesize as a pointer on the platform. There are also methodsdata_as({ctype}),shape_as(<basectype>), andstrides_as(<basectype>). These return the data as a ctype object of your choice andthe shape/strides arrays using an underlying base type of your choice.For convenience, thectypeslib module also containsc_intp asa ctypes integer data-type whose size is the same as the size ofc_void_p on the platform (its value is None if ctypes is notinstalled).
The function is accessed as an attribute of or an item from the loadedshared-library. Thus, if./mylib.so has a function namedcool_function1 , I could access this function either as:
lib=numpy.ctypeslib.load_library('mylib','.')func1=lib.cool_function1# or equivalentlyfunc1=lib['cool_function1']
In ctypes, the return-value of a function is set to be ‘int’ bydefault. This behavior can be changed by setting the restype attributeof the function. Use None for the restype if the function has noreturn value (‘void’):
func1.restype=None
As previously discussed, you can also set the argtypes attribute ofthe function in order to have ctypes check the types of the inputarguments when the function is called. Use thendpointer factoryfunction to generate a ready-made class for data-type, shape, andflags checking on your new function. Thendpointer function has thesignature
ndpointer(dtype=None,ndim=None,shape=None,flags=None)¶Keyword arguments with the valueNone are not checked.Specifying a keyword enforces checking of that aspect of thendarray on conversion to a ctypes-compatible object. The dtypekeyword can be any object understood as a data-type object. Thendim keyword should be an integer, and the shape keyword should bean integer or a sequence of integers. The flags keyword specifiesthe minimal flags that are required on any array passed in. Thiscan be specified as a string of comma separated requirements, aninteger indicating the requirement bits OR’d together, or a flagsobject returned from the flags attribute of an array with thenecessary requirements.
Using an ndpointer class in the argtypes method can make itsignificantly safer to call a C function using ctypes and the data-area of an ndarray. You may still want to wrap the function in anadditional Python wrapper to make it user-friendly (hiding someobvious arguments and making some arguments output arguments). In thisprocess, therequires function in NumPy may be useful to return the rightkind of array from a given input.
In this example, I will show how the addition function and the filterfunction implemented previously using the other approaches can beimplemented using ctypes. First, the C code which implements thealgorithms contains the functionszadd,dadd,sadd,cadd,anddfilter2d. Thezadd function is:
/* Add arrays of contiguous data */typedefstruct{doublereal;doubleimag;}cdouble;typedefstruct{floatreal;floatimag;}cfloat;voidzadd(cdouble*a,cdouble*b,cdouble*c,longn){while(n--){c->real=a->real+b->real;c->imag=a->imag+b->imag;a++;b++;c++;}}
with similar code forcadd,dadd, andsadd that handles complexfloat, double, and float data-types, respectively:
voidcadd(cfloat*a,cfloat*b,cfloat*c,longn){while(n--){c->real=a->real+b->real;c->imag=a->imag+b->imag;a++;b++;c++;}}voiddadd(double*a,double*b,double*c,longn){while(n--){*c++=*a+++*b++;}}voidsadd(float*a,float*b,float*c,longn){while(n--){*c++=*a+++*b++;}}
Thecode.c file also contains the functiondfilter2d:
/* * Assumes b is contiguous and has strides that are multiples of * sizeof(double) */voiddfilter2d(double*a,double*b,ssize_t*astrides,ssize_t*dims){ssize_ti,j,M,N,S0,S1;ssize_tr,c,rm1,rp1,cp1,cm1;M=dims[0];N=dims[1];S0=astrides[0]/sizeof(double);S1=astrides[1]/sizeof(double);for(i=1;i<M-1;i++){r=i*S0;rp1=r+S0;rm1=r-S0;for(j=1;j<N-1;j++){c=j*S1;cp1=j+S1;cm1=j-S1;b[i*N+j]=a[r+c]+(a[rp1+c]+a[rm1+c]+a[r+cp1]+a[r+cm1])*0.5+(a[rp1+cp1]+a[rp1+cm1]+a[rm1+cp1]+a[rm1+cp1])*0.25;}}}
A possible advantage this code has over the Fortran-equivalent code isthat it takes arbitrarily strided (i.e. non-contiguous arrays) and mayalso run faster depending on the optimization capability of yourcompiler. But, it is an obviously more complicated than the simple codeinfilter.f. This code must be compiled into a shared library. On myLinux system this is accomplished using:
gcc-ocode.so-sharedcode.c
Which creates a shared_library named code.so in the current directory.On Windows don’t forget to either add__declspec(dllexport) in frontof void on the line preceding each function definition, or write acode.def file that lists the names of the functions to be exported.
A suitable Python interface to this shared library should beconstructed. To do this create a file named interface.py with thefollowing lines at the top:
__all__=['add','filter2d']importnumpyasNimportos_path=os.path.dirname('__file__')lib=N.ctypeslib.load_library('code',_path)_typedict={'zadd':complex,'sadd':N.single,'cadd':N.csingle,'dadd':float}fornamein_typedict.keys():val=getattr(lib,name)val.restype=None_type=_typedict[name]val.argtypes=[N.ctypeslib.ndpointer(_type,flags='aligned, contiguous'),N.ctypeslib.ndpointer(_type,flags='aligned, contiguous'),N.ctypeslib.ndpointer(_type,flags='aligned, contiguous,'\'writeable'),N.ctypeslib.c_intp]
This code loads the shared library namedcode.{ext} located in thesame path as this file. It then adds a return type of void to thefunctions contained in the library. It also adds argument checking tothe functions in the library so that ndarrays can be passed as thefirst three arguments along with an integer (large enough to hold apointer on the platform) as the fourth argument.
Setting up the filtering function is similar and allows the filteringfunction to be called with ndarray arguments as the first twoarguments and with pointers to integers (large enough to handle thestrides and shape of an ndarray) as the last two arguments.:
lib.dfilter2d.restype=Nonelib.dfilter2d.argtypes=[N.ctypeslib.ndpointer(float,ndim=2,flags='aligned'),N.ctypeslib.ndpointer(float,ndim=2,flags='aligned, contiguous,'\'writeable'),ctypes.POINTER(N.ctypeslib.c_intp),ctypes.POINTER(N.ctypeslib.c_intp)]
Next, define a simple selection function that chooses which additionfunction to call in the shared library based on the data-type:
defselect(dtype):ifdtype.charin['?bBhHf']:returnlib.sadd,singleelifdtype.charin['F']:returnlib.cadd,csingleelifdtype.charin['DG']:returnlib.zadd,complexelse:returnlib.dadd,floatreturnfunc,ntype
Finally, the two functions to be exported by the interface can bewritten simply as:
defadd(a,b):requires=['CONTIGUOUS','ALIGNED']a=N.asanyarray(a)func,dtype=select(a.dtype)a=N.require(a,dtype,requires)b=N.require(b,dtype,requires)c=N.empty_like(a)func(a,b,c,a.size)returnc
and:
deffilter2d(a):a=N.require(a,float,['ALIGNED'])b=N.zeros_like(a)lib.dfilter2d(a,b,a.ctypes.strides,a.ctypes.shape)returnb
Using ctypes is a powerful way to connect Python with arbitraryC-code. Its advantages for extending Python include
clean separation of C code from Python code
- no need to learn a new syntax except Python and C
- allows re-use of C code
- functionality in shared libraries written for other purposes can beobtained with a simple Python wrapper and search for the library.
easy integration with NumPy through the ctypes attribute
full argument checking with the ndpointer class factory
Its disadvantages include
Because of the difficulty in distributing an extension module madeusing ctypes, f2py and Cython are still the easiest ways to extend Pythonfor package creation. However, ctypes is in some cases a useful alternative.This should bring more features to ctypes that shouldeliminate the difficulty in extending Python and distributing theextension using ctypes.
These tools have been found useful by others using Python and so areincluded here. They are discussed separately because they areeither older ways to do things now handled by f2py, Cython, or ctypes(SWIG, PyFort) or because I don’t know much about them (SIP, Boost).I have not added links to thesemethods because my experience is that you can find the most relevantlink faster using Google or some other search engine, and any linksprovided here would be quickly dated. Do not assume that just becauseit is included in this list, I don’t think the package deserves yourattention. I’m including information about these packages because manypeople have found them useful and I’d like to give you as many optionsas possible for tackling the problem of easily integrating your code.
Simplified Wrapper and Interface Generator (SWIG) is an old and fairlystable method for wrapping C/C++-libraries to a large variety of otherlanguages. It does not specifically understand NumPy arrays but can bemade useable with NumPy through the use of typemaps. There are somesample typemaps in the numpy/tools/swig directory under numpy.i togetherwith an example module that makes use of them. SWIG excels at wrappinglarge C/C++ libraries because it can (almost) parse their headers andauto-produce an interface. Technically, you need to generate a.ifile that defines the interface. Often, however, this.i file canbe parts of the header itself. The interface usually needs a bit oftweaking to be very useful. This ability to parse C/C++ headers andauto-generate the interface still makes SWIG a useful approach toadding functionalilty from C/C++ into Python, despite the othermethods that have emerged that are more targeted to Python. SWIG canactually target extensions for several languages, but the typemapsusually have to be language-specific. Nonetheless, with modificationsto the Python-specific typemaps, SWIG can be used to interface alibrary with other languages such as Perl, Tcl, and Ruby.
My experience with SWIG has been generally positive in that it isrelatively easy to use and quite powerful. I used to use it quiteoften before becoming more proficient at writing C-extensions.However, I struggled writing custom interfaces with SWIG because itmust be done using the concept of typemaps which are not Pythonspecific and are written in a C-like syntax. Therefore, I tend toprefer other gluing strategies and would only attempt to use SWIG towrap a very-large C/C++ library. Nonetheless, there are others who useSWIG quite happily.
SIP is another tool for wrapping C/C++ libraries that is Pythonspecific and appears to have very good support for C++. RiverbankComputing developed SIP in order to create Python bindings to the QTlibrary. An interface file must be written to generate the binding,but the interface file looks a lot like a C/C++ header file. While SIPis not a full C++ parser, it understands quite a bit of C++ syntax aswell as its own special directives that allow modification of how thePython binding is accomplished. It also allows the user to definemappings between Python types and C/C++ structures and classes.
Boost is a repository of C++ libraries and Boost.Python is one ofthose libraries which provides a concise interface for binding C++classes and functions to Python. The amazing part of the Boost.Pythonapproach is that it works entirely in pure C++ without introducing anew syntax. Many users of C++ report that Boost.Python makes itpossible to combine the best of both worlds in a seamless fashion. Ihave not used Boost.Python because I am not a big user of C++ andusing Boost to wrap simple C-subroutines is usually over-kill. It’sprimary purpose is to make C++ classes available in Python. So, if youhave a set of C++ classes that need to be integrated cleanly intoPython, consider learning about and using Boost.Python.
PyFort is a nice tool for wrapping Fortran and Fortran-like C-codeinto Python with support for Numeric arrays. It was written by PaulDubois, a distinguished computer scientist and the very firstmaintainer of Numeric (now retired). It is worth mentioning in thehopes that somebody will update PyFort to work with NumPy arrays aswell which now support either Fortran or C-style contiguous arrays.