Generalized universal function API#

There is a general need for looping over not only functions on scalarsbut also over functions on vectors (or arrays).This concept is realized in NumPy by generalizing the universal functions(ufuncs). In regular ufuncs, the elementary function is limited toelement-by-element operations, whereas the generalized version (gufuncs)supports “sub-array” by “sub-array” operations. The Perl vector library PDLprovides a similar functionality and its terms are re-used in the following.

Each generalized ufunc has information associated with it that stateswhat the “core” dimensionality of the inputs is, as well as thecorresponding dimensionality of the outputs (the element-wise ufuncshave zero core dimensions). The list of the core dimensions for allarguments is called the “signature” of a ufunc. For example, theufuncnumpy.add has signature(),()->() defining two scalar inputsand one scalar output.

Another example is the functioninner1d(a,b) with a signature of(i),(i)->(). This applies the inner product along the last axis ofeach input, but keeps the remaining indices intact.For example, wherea is of shape(3,5,N) andb is of shape(5,N), this will return an output of shape(3,5).The underlying elementary function is called3*5 times. In thesignature, we specify one core dimension(i) for each input and zero coredimensions() for the output, since it takes two 1-d arrays andreturns a scalar. By using the same namei, we specify that the twocorresponding dimensions should be of the same size.

The dimensions beyond the core dimensions are called “loop” dimensions. Inthe above example, this corresponds to(3,5).

The signature determines how the dimensions of each input/output array aresplit into core and loop dimensions:

  1. Each dimension in the signature is matched to a dimension of thecorresponding passed-in array, starting from the end of the shape tuple.These are the core dimensions, and they must be present in the arrays, oran error will be raised.

  2. Core dimensions assigned to the same label in the signature (e.g. thei ininner1d’s(i),(i)->()) must have exactly matching sizes,no broadcasting is performed.

  3. The core dimensions are removed from all inputs and the remainingdimensions are broadcast together, defining the loop dimensions.

  4. The shape of each output is determined from the loop dimensions plus theoutput’s core dimensions

Typically, the size of all core dimensions in an output will be determined bythe size of a core dimension with the same label in an input array. This isnot a requirement, and it is possible to define a signature where a labelcomes up for the first time in an output, although some precautions must betaken when calling such a function. An example would be the functioneuclidean_pdist(a), with signature(n,d)->(p), that given an array ofnd-dimensional vectors, computes all unique pairwise Euclideandistances among them. The output dimensionp must therefore be equal ton*(n-1)/2, but by default, it is the caller’s responsibility to passin an output array of the right size. If the size of a core dimension of an outputcannot be determined from a passed in input or output array, an error will beraised. This can be changed by defining aPyUFunc_ProcessCoreDimsFunc functionand assigning it to theproces_core_dims_func field of thePyUFuncObjectstructure. See below for more details.

Note: Prior to NumPy 1.10.0, less strict checks were in place: missing coredimensions were created by prepending 1’s to the shape as necessary, coredimensions with the same label were broadcast together, and undetermineddimensions were created with size 1.

Definitions#

Elementary Function

Each ufunc consists of an elementary function that performs themost basic operation on the smallest portion of array arguments(e.g. adding two numbers is the most basic operation in adding twoarrays). The ufunc applies the elementary function multiple timeson different parts of the arrays. The input/output of elementaryfunctions can be vectors; e.g., the elementary function ofinner1dtakes two vectors as input.

Signature

A signature is a string describing the input/output dimensions ofthe elementary function of a ufunc. See section below for moredetails.

Core Dimension

The dimensionality of each input/output of an elementary functionis defined by its core dimensions (zero core dimensions correspondto a scalar input/output). The core dimensions are mapped to thelast dimensions of the input/output arrays.

Dimension Name

A dimension name represents a core dimension in the signature.Different dimensions may share a name, indicating that they are ofthe same size.

Dimension Index

A dimension index is an integer representing a dimension name. Itenumerates the dimension names according to the order of the firstoccurrence of each name in the signature.

Details of signature#

The signature defines “core” dimensionality of input and outputvariables, and thereby also defines the contraction of thedimensions. The signature is represented by a string of thefollowing format:

  • Core dimensions of each input or output array are represented by alist of dimension names in parentheses,(i_1,...,i_N); a scalarinput/output is denoted by(). Instead ofi_1,i_2,etc, one can use any valid Python variable name.

  • Dimension lists for different arguments are separated by",".Input/output arguments are separated by"->".

  • If one uses the same dimension name in multiple locations, thisenforces the same size of the corresponding dimensions.

The formal syntax of signatures is as follows:

<Signature>::=<Inputarguments>"->"<Outputarguments><Inputarguments>::=<Argumentlist><Outputarguments>::=<Argumentlist><Argumentlist>::=nil|<Argument>|<Argument>","<Argumentlist><Argument>::="("<Coredimensionlist>")"<Coredimensionlist>::=nil|<Coredimension>|<Coredimension>","<Coredimensionlist><Coredimension>::=<Dimensionname><Dimensionmodifier><Dimensionname>::=validPythonvariablename|validinteger<Dimensionmodifier>::=nil|"?"

Notes:

  1. All quotes are for clarity.

  2. Unmodified core dimensions that share the same name must have the same size.Each dimension name typically corresponds to one level of looping in theelementary function’s implementation.

  3. White spaces are ignored.

  4. An integer as a dimension name freezes that dimension to the value.

  5. If the name is suffixed with the “?” modifier, the dimension is a coredimension only if it exists on all inputs and outputs that share it;otherwise it is ignored (and replaced by a dimension of size 1 for theelementary function).

Here are some examples of signatures:

name

signature

common usage

add

(),()->()

binary ufunc

sum1d

(i)->()

reduction

inner1d

(i),(i)->()

vector-vector multiplication

matmat

(m,n),(n,p)->(m,p)

matrix multiplication

vecmat

(n),(n,p)->(p)

vector-matrix multiplication

matvec

(m,n),(n)->(m)

matrix-vector multiplication

matmul

(m?,n),(n,p?)->(m?,p?)

combination of the four above

outer_inner

(i,t),(j,t)->(i,j)

inner over the last dimension,outer over the second to last,and loop/broadcast over the rest.

cross1d

(3),(3)->(3)

cross product where the lastdimension is frozen and must be 3

The last is an instance of freezing a core dimension and can be used toimprove ufunc performance

C-API for implementing elementary functions#

The current interface remains unchanged, andPyUFunc_FromFuncAndDatacan still be used to implement (specialized) ufuncs, consisting ofscalar elementary functions.

One can usePyUFunc_FromFuncAndDataAndSignature to declare a moregeneral ufunc. The argument list is the same asPyUFunc_FromFuncAndData, with an additional argument specifying thesignature as C string.

Furthermore, the callback function is of the same type as before,void(*foo)(char**args,intp*dimensions,intp*steps,void*func).When invoked,args is a list of lengthnargs containingthe data of all input/output arguments. For a scalar elementaryfunction,steps is also of lengthnargs, denoting the strides usedfor the arguments.dimensions is a pointer to a single integerdefining the size of the axis to be looped over.

For a non-trivial signature,dimensions will also contain the sizesof the core dimensions as well, starting at the second entry. Onlyone size is provided for each unique dimension name and the sizes aregiven according to the first occurrence of a dimension name in thesignature.

The firstnargs elements ofsteps remain the same as for scalarufuncs. The following elements contain the strides of all coredimensions for all arguments in order.

For example, consider a ufunc with signature(i,j),(i)->(). Inthis case,args will contain three pointers to the data of theinput/output arraysa,b,c. Furthermore,dimensions will be[N,I,J] to define the size ofN of the loop and the sizesI andJfor the core dimensionsi andj. Finally,steps will be[a_N,b_N,c_N,a_i,a_j,b_i], containing all necessary strides.

Customizing core dimension size processing#

The optional function of typePyUFunc_ProcessCoreDimsFunc, storedon theprocess_core_dims_func attribute of the ufunc, provides theauthor of the ufunc a “hook” into the processing of the core dimensionsof the arrays that were passed to the ufunc. The two primary uses ofthis “hook” are:

  • Check that constraints on the core dimensions requiredby the ufunc are satisfied (and set an exception if they are not).

  • Compute output shapes for any output core dimensions that were notdetermined by the input arrays.

As an example of the first use, consider the generalized ufuncminmaxwith signature(n)->(2) that simultaneously computes the minimum andmaximum of a sequence. It should require thatn>0, becausethe minimum and maximum of a sequence with length 0 is not meaningful.In this case, the ufunc author might define the function like this:

intminmax_process_core_dims(PyUFuncObjectufunc,npy_intp*core_dim_sizes){npy_intpn=core_dim_sizes[0];if(n==0){PyExc_SetString("minmax requires the core dimension ""to be at least 1.");return-1;}return0;}

In this case, the length of the arraycore_dim_sizes will be 2.The second value in the array will always be 2, so there is no needfor the function to inspect it. The core dimensionn is storedin the first element. The function sets an exception and returns -1if it finds thatn is 0.

The second use for the “hook” is to compute the size of output arrayswhen the output arrays are not provided by the caller and one or morecore dimension of the output is not also an input core dimension.If the ufunc does not have a function defined on theprocess_core_dims_func attribute, an unspecified output coredimension size will result in an exception being raised. With the“hook” provided byprocess_core_dims_func, the author of the ufunccan set the output size to whatever is appropriate for the ufunc.

In the array passed to the “hook” function, core dimensions thatwere not determined by the input are indicating by having the value -1in thecore_dim_sizes array. The function can replace the -1 withwhatever value is appropriate for the ufunc, based on the core dimensionsthat occurred in the input arrays.

Warning

The function must never change a value incore_dim_sizes thatis not -1 on input. Changing a value that was not -1 will generallyresult in incorrect output from the ufunc, and could result in thePython interpreter crashing.

For example, consider the generalized ufuncconv1d for whichthe elementary function computes the “full” convolution of twoone-dimensional arraysx andy with lengthsm andn,respectively. The output of this convolution has lengthm+n-1.To implement this as a generalized ufunc, the signature is set to(m),(n)->(p), and in the “hook” function, if the core dimensionp is found to be -1, it is replaced withm+n-1. Ifpisnot -1, it must be verified that the given value equalsm+n-1.If it does not, the function must set an exception and return -1.For a meaningful result, the operation also requires thatm+nis at least 1, i.e. both inputs can’t have length 0.

Here’s how that might look in code:

intconv1d_process_core_dims(PyUFuncObject*ufunc,npy_intp*core_dim_sizes){// core_dim_sizes will hold the core dimensions [m, n, p].// p will be -1 if the caller did not provide the out argument.npy_intpm=core_dim_sizes[0];npy_intpn=core_dim_sizes[1];npy_intpp=core_dim_sizes[2];npy_intprequired_p=m+n-1;if(m==0&&n==0){// Disallow both inputs having length 0.PyErr_SetString(PyExc_ValueError,"conv1d: both inputs have core dimension 0; the function ""requires that at least one input has size greater than 0.");return-1;}if(p==-1){// Output array was not given in the call of the ufunc.// Set the correct output size here.core_dim_sizes[2]=required_p;return0;}// An output array *was* given.  Validate its core dimension.if(p!=required_p){PyErr_Format(PyExc_ValueError,"conv1d: the core dimension p of the out parameter ""does not equal m + n - 1, where m and n are the ""core dimensions of the inputs x and y; got m=%zd ""and n=%zd so p must be %zd, but got p=%zd.",m,n,required_p,p);return-1;}return0;}