Beyond the Basics¶

The voyage of discovery is not in seeking new landscapes but in having

new eyes.

—Marcel Proust

Discovery is seeing what everyone else has seen and thinking what no

one else has thought.

—Albert Szent-Gyorgi

Iterating over elements in the array¶

Basic Iteration¶

One common algorithmic requirement is to be able to walk over allelements in a multidimensional array. The array iterator object makesthis easy to do in a generic way that works for arrays of anydimension. Naturally, if you know the number of dimensions you will beusing, then you can always write nested for loops to accomplish theiteration. If, however, you want to write code that works with anynumber of dimensions, then you can make use of the array iterator. Anarray iterator object is returned when accessing the .flat attributeof an array.

Basic usage is to callPyArray_IterNew (array ) where arrayis an ndarray object (or one of its sub-classes). The returned objectis an array-iterator object (the same object returned by the .flatattribute of the ndarray). This object is usually cast toPyArrayIterObject* so that its members can be accessed. The onlymembers that are needed areiter->size which contains the totalsize of the array,iter->index, which contains the current 1-dindex into the array, anditer->dataptr which is a pointer to thedata for the current element of the array. Sometimes it is alsouseful to accessiter->ao which is a pointer to the underlyingndarray object.

After processing data at the current element of the array, the nextelement of the array can be obtained using the macroPyArray_ITER_NEXT (iter ). The iteration always proceeds in aC-style contiguous fashion (last index varying the fastest). ThePyArray_ITER_GOTO (iter,destination ) can be used tojump to a particular point in the array, wheredestination is anarray of npy_intp data-type with space to handle at least the numberof dimensions in the underlying array. Occasionally it is useful tousePyArray_ITER_GOTO1D (iter,index ) which will jumpto the 1-d index given by the value ofindex. The most commonusage, however, is given in the following example.

PyObject*obj;/* assumed to be some ndarray object */PyArrayIterObject*iter;...iter=(PyArrayIterObject*)PyArray_IterNew(obj);if(iter==NULL)gotofail;/* Assume fail has clean-up code */while(iter->index<iter->size){/* do something with the data at it->dataptr */PyArray_ITER_NEXT(it);}...

You can also usePyArrayIter_Check (obj ) to ensure you havean iterator object andPyArray_ITER_RESET (iter ) to reset aniterator object back to the beginning of the array.

It should be emphasized at this point that you may not need the arrayiterator if your array is already contiguous (using an array iteratorwill work but will be slower than the fastest code you could write).The major purpose of array iterators is to encapsulate iteration overN-dimensional arrays with arbitrary strides. They are used in many,many places in the NumPy source code itself. If you already know yourarray is contiguous (Fortran or C), then simply adding the element-size to a running pointer variable will step you through the arrayvery efficiently. In other words, code like this will probably befaster for you in the contiguous case (assuming doubles).

npy_intpsize;double*dptr;/* could make this any variable type */size=PyArray_SIZE(obj);dptr=PyArray_DATA(obj);while(size--){/* do something with the data at dptr */dptr++;}

Iterating over all but one axis¶

A common algorithm is to loop over all elements of an array andperform some function with each element by issuing a function call. Asfunction calls can be time consuming, one way to speed up this kind ofalgorithm is to write the function so it takes a vector of data andthen write the iteration so the function call is performed for anentire dimension of data at a time. This increases the amount of workdone per function call, thereby reducing the function-call over-headto a small(er) fraction of the total time. Even if the interior of theloop is performed without a function call it can be advantageous toperform the inner loop over the dimension with the highest number ofelements to take advantage of speed enhancements available on micro-processors that use pipelining to enhance fundmental operations.

ThePyArray_IterAllButAxis (array,&dim ) constructs aniterator object that is modified so that it will not iterate over thedimension indicated by dim. The only restriction on this iteratorobject, is that thePyArray_Iter_GOTO1D (it,ind ) macrocannot be used (thus flat indexing won’t work either if you pass thisobject back to Python — so you shouldn’t do this). Note that thereturned object from this routine is still usually cast toPyArrayIterObject *. All that’s been done is to modify the stridesand dimensions of the returned iterator to simulate iterating overarray[…,0,…] where 0 is placed on the\textrm{dim}^{\textrm{th}} dimension. If dim is negative, thenthe dimension with the largest axis is found and used.

Iterating over multiple arrays¶

Very often, it is desirable to iterate over several arrays at thesame time. The universal functions are an example of this kind ofbehavior. If all you want to do is iterate over arrays with the sameshape, then simply creating several iterator objects is the standardprocedure. For example, the following code iterates over two arraysassumed to be the same shape and size (actually obj1 just has to haveat least as many total elements as does obj2):

/* It is already assumed that obj1 and obj2   are ndarrays of the same shape and size.*/iter1=(PyArrayIterObject*)PyArray_IterNew(obj1);if(iter1==NULL)gotofail;iter2=(PyArrayIterObject*)PyArray_IterNew(obj2);if(iter2==NULL)gotofail;/* assume iter1 is DECREF'd at fail */while(iter2->index<iter2->size){/* process with iter1->dataptr and iter2->dataptr */PyArray_ITER_NEXT(iter1);PyArray_ITER_NEXT(iter2);}

Broadcasting over multiple arrays¶

When multiple arrays are involved in an operation, you may want to use thesame broadcasting rules that the math operations (i.e. the ufuncs) use.This can be done easily using thePyArrayMultiIterObject. This isthe object returned from the Python command numpy.broadcast and it is almostas easy to use from C. The functionPyArray_MultiIterNew (n,... ) is used (withn inputobjects in place of... ). The input objects can be arrays or anythingthat can be converted into an array. A pointer to a PyArrayMultiIterObject isreturned. Broadcasting has already been accomplished which adjusts theiterators so that all that needs to be done to advance to the next element ineach array is for PyArray_ITER_NEXT to be called for each of the inputs. Thisincrementing is automatically performed byPyArray_MultiIter_NEXT (obj ) macro (which can handle amultiteratorobj as either aPyArrayMultiObject* or aPyObject*). The data from input numberi is available usingPyArray_MultiIter_DATA (obj,i ) and the total (broadcasted)size asPyArray_MultiIter_SIZE (obj). An example of using thisfeature follows.

mobj=PyArray_MultiIterNew(2,obj1,obj2);size=PyArray_MultiIter_SIZE(obj);while(size--){ptr1=PyArray_MultiIter_DATA(mobj,0);ptr2=PyArray_MultiIter_DATA(mobj,1);/* code using contents of ptr1 and ptr2 */PyArray_MultiIter_NEXT(mobj);}

The functionPyArray_RemoveSmallest (multi ) can be used totake a multi-iterator object and adjust all the iterators so thatiteration does not take place over the largest dimension (it makesthat dimension of size 1). The code being looped over that makes useof the pointers will very-likely also need the strides data for eachof the iterators. This information is stored inmulti->iters[i]->strides.

There are several examples of using the multi-iterator in the NumPysource code as it makes N-dimensional broadcasting-code very simple towrite. Browse the source for more examples.

User-defined data-types¶

NumPy comes with 24 builtin data-types. While this covers a largemajority of possible use cases, it is conceivable that a user may havea need for an additional data-type. There is some support for addingan additional data-type into the NumPy system. This additional data-type will behave much like a regular data-type except ufuncs must have1-d loops registered to handle it separately. Also checking forwhether or not other data-types can be cast “safely” to and from thisnew type or not will always return “can cast” unless you also registerwhich types your new data-type can be cast to and from. Addingdata-types is one of the less well-tested areas for NumPy 1.0, sothere may be bugs remaining in the approach. Only add a new data-typeif you can’t do what you want to do using the OBJECT or VOIDdata-types that are already available. As an example of what Iconsider a useful application of the ability to add data-types is thepossibility of adding a data-type of arbitrary precision floats toNumPy.

Adding the new data-type¶

To begin to make use of the new data-type, you need to first define anew Python type to hold the scalars of your new data-type. It shouldbe acceptable to inherit from one of the array scalars if your newtype has a binary compatible layout. This will allow your new datatype to have the methods and attributes of array scalars. New data-types must have a fixed memory size (if you want to define a data-typethat needs a flexible representation, like a variable-precisionnumber, then use a pointer to the object as the data-type). The memorylayout of the object structure for the new Python type must bePyObject_HEAD followed by the fixed-size memory needed for the data-type. For example, a suitable structure for the new Python type is:

typedefstruct{PyObject_HEAD;some_data_typeobval;/* the name can be whatever you want */}PySomeDataTypeObject;

After you have defined a new Python type object, you must then definea newPyArray_Descr structure whose typeobject member will contain apointer to the data-type you’ve just defined. In addition, therequired functions in the “.f” member must be defined: nonzero,copyswap, copyswapn, setitem, getitem, and cast. The more functions inthe “.f” member you define, however, the more useful the new data-typewill be. It is very important to initialize unused functions to NULL.This can be achieved usingPyArray_InitArrFuncs (f).

Once a newPyArray_Descr structure is created and filled with theneeded information and useful functions you callPyArray_RegisterDataType (new_descr). The return value from thiscall is an integer providing you with a unique type_number thatspecifies your data-type. This type number should be stored and madeavailable by your module so that other modules can use it to recognizeyour data-type (the other mechanism for finding a user-defineddata-type number is to search based on the name of the type-objectassociated with the data-type usingPyArray_TypeNumFromName ).

Registering a casting function¶

You may want to allow builtin (and other user-defined) data-types tobe cast automatically to your data-type. In order to make thispossible, you must register a casting function with the data-type youwant to be able to cast from. This requires writing low-level castingfunctions for each conversion you want to support and then registeringthese functions with the data-type descriptor. A low-level castingfunction has the signature.

voidcastfunc(void* from, void* to, npy_intp n, void* fromarr, void* toarr)¶: Castn elementsfrom one typeto another. The data tocast from is in a contiguous, correctly-swapped and aligned chunkof memory pointed to by from. The buffer to cast to is alsocontiguous, correctly-swapped and aligned. The fromarr and toarrarguments should only be used for flexible-element-sized arrays(string, unicode, void).

An example castfunc is:

staticvoiddouble_to_float(double*from,float*to,npy_intpn,void*ig1,void*ig2);while(n--){(*to++)=(double)*(from++);}

This could then be registered to convert doubles to floats using thecode:

doub=PyArray_DescrFromType(NPY_DOUBLE);PyArray_RegisterCastFunc(doub,NPY_FLOAT,(PyArray_VectorUnaryFunc*)double_to_float);Py_DECREF(doub);

Registering coercion rules¶

By default, all user-defined data-types are not presumed to be safelycastable to any builtin data-types. In addition builtin data-types arenot presumed to be safely castable to user-defined data-types. Thissituation limits the ability of user-defined data-types to participatein the coercion system used by ufuncs and other times when automaticcoercion takes place in NumPy. This can be changed by registeringdata-types as safely castable from a particular data-type object. ThefunctionPyArray_RegisterCanCast (from_descr, totype_number,scalarkind) should be used to specify that the data-type objectfrom_descr can be cast to the data-type with type numbertotype_number. If you are not trying to alter scalar coercion rules,then useNPY_NOSCALAR for the scalarkind argument.

If you want to allow your new data-type to also be able to share inthe scalar coercion rules, then you need to specify the scalarkindfunction in the data-type object’s “.f” member to return the kind ofscalar the new data-type should be seen as (the value of the scalar isavailable to that function). Then, you can register data-types thatcan be cast to separately for each scalar kind that may be returnedfrom your user-defined data-type. If you don’t register scalarcoercion handling, then all of your user-defined data-types will beseen asNPY_NOSCALAR.

Registering a ufunc loop¶

You may also want to register low-level ufunc loops for your data-typeso that an ndarray of your data-type can have math applied to itseamlessly. Registering a new loop with exactly the same arg_typessignature, silently replaces any previously registered loops for thatdata-type.

Before you can register a 1-d loop for a ufunc, the ufunc must bepreviously created. Then you callPyUFunc_RegisterLoopForType(…) with the information needed for the loop. The return value ofthis function is0 if the process was successful and-1 withan error condition set if it was not successful.

intPyUFunc_RegisterLoopForType(PyUFuncObject* ufunc, int usertype, PyUFuncGenericFunction function, int* arg_types, void* data)¶

ufunc

The ufunc to attach this loop to.

usertype

The user-defined type this loop should be indexed under. This numbermust be a user-defined type or an error occurs.

function

The ufunc inner 1-d loop. This function must have the signature asexplained in Section3 .

arg_types

(optional) If given, this should contain an array of integers of atleast size ufunc.nargs containing the data-types expected by the loopfunction. The data will be copied into a NumPy-managed structure sothe memory for this argument should be deleted after calling thisfunction. If this is NULL, then it will be assumed that all data-typesare of type usertype.

data

(optional) Specify any optional data needed by the function which willbe passed when the function is called.

Subtyping the ndarray in C¶

One of the lesser-used features that has been lurking in Python since2.2 is the ability to sub-class types in C. This facility is one ofthe important reasons for basing NumPy off of the Numeric code-basewhich was already in C. A sub-type in C allows much more flexibilitywith regards to memory management. Sub-typing in C is not difficulteven if you have only a rudimentary understanding of how to create newtypes for Python. While it is easiest to sub-type from a single parenttype, sub-typing from multiple parent types is also possible. Multipleinheritance in C is generally less useful than it is in Python becausea restriction on Python sub-types is that they have a binarycompatible memory layout. Perhaps for this reason, it is somewhateasier to sub-type from a single parent type.

All C-structures corresponding to Python objects must begin withPyObject_HEAD (orPyObject_VAR_HEAD). In the sameway, any sub-type must have a C-structure that begins with exactly thesame memory layout as the parent type (or all of the parent types inthe case of multiple-inheritance). The reason for this is that Pythonmay attempt to access a member of the sub-type structure as if it hadthe parent structure (i.e. it will cast a given pointer to apointer to the parent structure and then dereference one of it’smembers). If the memory layouts are not compatible, then this attemptwill cause unpredictable behavior (eventually leading to a memoryviolation and program crash).

One of the elements inPyObject_HEAD is a pointer to atype-object structure. A new Python type is created by creating a newtype-object structure and populating it with functions and pointers todescribe the desired behavior of the type. Typically, a newC-structure is also created to contain the instance-specificinformation needed for each object of the type as well. For example,&PyArray_Type is a pointer to the type-object table for the ndarraywhile aPyArrayObject* variable is a pointer to a particular instanceof an ndarray (one of the members of the ndarray structure is, inturn, a pointer to the type- object table&PyArray_Type). FinallyPyType_Ready (<pointer_to_type_object>) must be called forevery new Python type.

Creating sub-types¶

To create a sub-type, a similar procedure must be followed exceptonly behaviors that are different require new entries in the type-object structure. All other entries can be NULL and will be filled inbyPyType_Ready with appropriate functions from the parenttype(s). In particular, to create a sub-type in C follow these steps:

If needed create a new C-structure to handle each instance of yourtype. A typical C-structure would be:
```
typedef_new_struct{PyArrayObjectbase;/* new things here */}NewArrayObject;
```
Notice that the full PyArrayObject is used as the first entry in orderto ensure that the binary layout of instances of the new type isidentical to the PyArrayObject.
Fill in a new Python type-object structure with pointers to newfunctions that will over-ride the default behavior while leaving anyfunction that should remain the same unfilled (or NULL). The tp_nameelement should be different.
Fill in the tp_base member of the new type-object structure with apointer to the (main) parent type object. For multiple-inheritance,also fill in the tp_bases member with a tuple containing all of theparent objects in the order they should be used to define inheritance.Remember, all parent-types must have the same C-structure for multipleinheritance to work properly.
CallPyType_Ready (<pointer_to_new_type>). If this functionreturns a negative number, a failure occurred and the type is notinitialized. Otherwise, the type is ready to be used. It isgenerally important to place a reference to the new type into themodule dictionary so it can be accessed from Python.

More information on creating sub-types in C can be learned by readingPEP 253 (available athttp://www.python.org/dev/peps/pep-0253).

Specific features of ndarray sub-typing¶

Some special methods and attributes are used by arrays in order tofacilitate the interoperation of sub-types with the base ndarray type.

The __array_finalize__ method¶

ndarray.__array_finalize__¶

Several array-creation functions of the ndarray allowspecification of a particular sub-type to be created. This allowssub-types to be handled seamlessly in many routines. When asub-type is created in such a fashion, however, neither the__new__ method nor the __init__ method gets called. Instead, thesub-type is allocated and the appropriate instance-structuremembers are filled in. Finally, the__array_finalize__attribute is looked-up in the object dictionary. If it is presentand not None, then it can be either a CObject containing a pointerto aPyArray_FinalizeFunc or it can be a method taking asingle argument (which could be None).

If the__array_finalize__ attribute is a CObject, then the pointermust be a pointer to a function with the signature:

(int)(PyArrayObject*,PyObject*)

The first argument is the newly created sub-type. The second argument(if not NULL) is the “parent” array (if the array was created usingslicing or some other operation where a clearly-distinguishable parentis present). This routine can do anything it wants to. It shouldreturn a -1 on error and 0 otherwise.

If the__array_finalize__ attribute is not None nor a CObject,then it must be a Python method that takes the parent array as anargument (which could be None if there is no parent), and returnsnothing. Errors in this method will be caught and handled.

The __array_priority__ attribute¶

ndarray.__array_priority__¶: This attribute allows simple but flexible determination of which sub-type should be considered “primary” when an operation involving two ormore sub-types arises. In operations where different sub-types arebeing used, the sub-type with the largest__array_priority__attribute will determine the sub-type of the output(s). If two sub-types have the same__array_priority__ then the sub-type of thefirst argument determines the output. The default__array_priority__ attribute returns a value of 0.0 for the basendarray type and 1.0 for a sub-type. This attribute can also bedefined by objects that are not sub-types of the ndarray and can beused to determine which__array_wrap__ method should be called forthe return output.

The __array_wrap__ method¶

ndarray.__array_wrap__¶: Any class or type can define this method which should take an ndarrayargument and return an instance of the type. It can be seen as theopposite of the__array__ method. This method is used by theufuncs (and other NumPy functions) to allow other objects to passthrough. For Python >2.4, it can also be used to write a decoratorthat converts a function that works only with ndarrays to one thatworks with any type with__array__ and__array_wrap__ methods.

Beyond the Basics

Previous topic

Writing your own ufunc

Next topic

NumPy Reference

Movatterモバイル変換