Standard array subclasses#

Note

Subclassing anumpy.ndarray is possible but if your goal is to createan array withmodified behavior, as do dask arrays for distributedcomputation and cupy arrays for GPU-based computation, subclassing isdiscouraged. Instead, using numpy’sdispatch mechanism is recommended.

Thendarray can be inherited from (in Python or in C)if desired. Therefore, it can form a foundation for many usefulclasses. Often whether to sub-class the array object or to simply usethe core array component as an internal part of a new class is adifficult decision, and can be simply a matter of choice. NumPy hasseveral tools for simplifying how your new object interacts with otherarray objects, and so the choice may not be significant in theend. One way to simplify the question is by asking yourself if theobject you are interested in can be replaced as a single array or doesit really require two or more arrays at its core.

Note thatasarray always returns the base-class ndarray. Ifyou are confident that your use of the array object can handle anysubclass of an ndarray, thenasanyarray can be used to allowsubclasses to propagate more cleanly through your subroutine. Inprincipal a subclass could redefine any aspect of the array andtherefore, under strict guidelines,asanyarray would rarely beuseful. However, most subclasses of the array object will notredefine certain aspects of the array object such as the bufferinterface, or the attributes of the array. One important example,however, of why your subroutine may not be able to handle an arbitrarysubclass of an array is that matrices redefine the “*” operator to bematrix-multiplication, rather than element-by-element multiplication.

Special attributes and methods#

NumPy provides several hooks that classes can customize:

class.__array_ufunc__(ufunc,method,*inputs,**kwargs)#

Any class, ndarray subclass or not, can define this method or set it toNone in order to override the behavior of NumPy’s ufuncs. This worksquite similarly to Python’s__mul__ and other binary operation routines.

  • ufunc is the ufunc object that was called.

  • method is a string indicating which Ufunc method was called(one of"__call__","reduce","reduceat","accumulate","outer","inner").

  • inputs is a tuple of the input arguments to theufunc.

  • kwargs is a dictionary containing the optional input argumentsof the ufunc. If given, anyout arguments, both positionaland keyword, are passed as atuple inkwargs. See thediscussion inUniversal functions (ufunc) for details.

The method should return either the result of the operation, orNotImplemented if the operation requested is not implemented.

If one of the input, output, orwhere arguments has a__array_ufunc__method, it is executedinstead of the ufunc. If more than one of thearguments implements__array_ufunc__, they are tried in theorder: subclasses before superclasses, inputs before outputs,outputs beforewhere, otherwiseleft to right. The first routine returning something other thanNotImplemented determines the result. If all of the__array_ufunc__ operations returnNotImplemented, aTypeError is raised.

Note

We intend to re-implement numpy functions as (generalized)Ufunc, in which case it will become possible for them to beoverridden by the__array_ufunc__ method. A prime candidate ismatmul, which currently is not a Ufunc, but could berelatively easily be rewritten as a (set of) generalized Ufuncs. Thesame may happen with functions such asmedian,amin, andargsort.

Like with some other special methods in python, such as__hash__ and__iter__, it is possible to indicate that your class doesnotsupport ufuncs by setting__array_ufunc__=None. Ufuncs always raiseTypeError when called on an object that sets__array_ufunc__=None.

The presence of__array_ufunc__ also influences howndarray handles binary operations likearr+obj andarr<obj whenarr is anndarray andobj is an instanceof a custom class. There are two possibilities. Ifobj.__array_ufunc__ is present and not None, thenndarray.__add__ and friends will delegate to the ufunc machinery,meaning thatarr+obj becomesnp.add(arr,obj), and thenadd invokesobj.__array_ufunc__. This is useful if youwant to define an object that acts like an array.

Alternatively, ifobj.__array_ufunc__ is set to None, then as aspecial case, special methods likendarray.__add__ will notice thisandunconditionally raiseTypeError. This is useful if you want tocreate objects that interact with arrays via binary operations, butare not themselves arrays. For example, a units handling system might havean objectm representing the “meters” unit, and want to support thesyntaxarr*m to represent that the array has units of “meters”, butnot want to otherwise interact with arrays via ufuncs or otherwise. Thiscan be done by setting__array_ufunc__=None and defining__mul__and__rmul__ methods. (Note that this means that writing an__array_ufunc__ that always returnsNotImplemented is notquite the same as setting__array_ufunc__=None: in the formercase,arr+obj will raiseTypeError, while in the lattercase it is possible to define a__radd__ method to prevent this.)

The above does not hold for in-place operators, for whichndarraynever returnsNotImplemented. Hence,arr+=obj would alwayslead to aTypeError. This is because for arrays in-place operationscannot generically be replaced by a simple reverse operation. (Forinstance, by default,arr+=obj would be translated toarr=arr+obj, i.e.,arr would be replaced, contrary to what is expectedfor in-place array operations.)

Note

If you define__array_ufunc__:

  • If you are not a subclass ofndarray, we recommend yourclass define special methods like__add__ and__lt__ thatdelegate to ufuncs just like ndarray does. An easy way to do thisis to subclass fromNDArrayOperatorsMixin.

  • If you subclassndarray, we recommend that you put all youroverride logic in__array_ufunc__ and not also override specialmethods. This ensures the class hierarchy is determined in only oneplace rather than separately by the ufunc machinery and by the binaryoperation rules (which gives preference to special methods ofsubclasses; the alternative way to enforce a one-place only hierarchy,of setting__array_ufunc__ to None, would seem veryunexpected and thus confusing, as then the subclass would not work atall with ufuncs).

  • ndarray defines its own__array_ufunc__, which,evaluates the ufunc if no arguments have overrides, and returnsNotImplemented otherwise. This may be useful for subclassesfor which__array_ufunc__ converts any instances of its ownclass tondarray: it can then pass these on to itssuperclass usingsuper().__array_ufunc__(*inputs,**kwargs),and finally return the results after possible back-conversion. Theadvantage of this practice is that it ensures that it is possibleto have a hierarchy of subclasses that extend the behaviour. SeeSubclassing ndarray for details.

class.__array_function__(func,types,args,kwargs)#
  • func is an arbitrary callable exposed by NumPy’s public API,which was called in the formfunc(*args,**kwargs).

  • types is a collectioncollections.abc.Collectionof unique argument types from the original NumPy function call thatimplement__array_function__.

  • The tupleargs and dictkwargs are directly passed on from theoriginal call.

As a convenience for__array_function__ implementers,typesprovides all argument types with an'__array_function__' attribute.This allows implementers to quickly identify cases where they should deferto__array_function__ implementations on other arguments.Implementations should not rely on the iteration order oftypes.

Most implementations of__array_function__ will start with twochecks:

  1. Is the given function something that we know how to overload?

  2. Are all arguments of a type that we know how to handle?

If these conditions hold,__array_function__ should return the resultfrom calling its implementation forfunc(*args,**kwargs). Otherwise,it should return the sentinel valueNotImplemented, indicating that thefunction is not implemented by these types.

There are no general requirements on the return value from__array_function__, although most sensible implementations shouldprobably return array(s) with the same type as one of the function’sarguments.

It may also be convenient to define a custom decorators (implementsbelow) for registering__array_function__ implementations.

HANDLED_FUNCTIONS={}classMyArray:def__array_function__(self,func,types,args,kwargs):iffuncnotinHANDLED_FUNCTIONS:returnNotImplemented# Note: this allows subclasses that don't override# __array_function__ to handle MyArray objectsifnotall(issubclass(t,MyArray)fortintypes):returnNotImplementedreturnHANDLED_FUNCTIONS[func](*args,**kwargs)defimplements(numpy_function):"""Register an __array_function__ implementation for MyArray objects."""defdecorator(func):HANDLED_FUNCTIONS[numpy_function]=funcreturnfuncreturndecorator@implements(np.concatenate)defconcatenate(arrays,axis=0,out=None):...# implementation of concatenate for MyArray objects@implements(np.broadcast_to)defbroadcast_to(array,shape):...# implementation of broadcast_to for MyArray objects

Note that it is not required for__array_function__ implementations toincludeall of the corresponding NumPy function’s optional arguments(e.g.,broadcast_to above omits the irrelevantsubok argument).Optional arguments are only passed in to__array_function__ if theywere explicitly used in the NumPy function call.

Just like the case for builtin special methods like__add__, properlywritten__array_function__ methods should always returnNotImplemented when an unknown type is encountered. Otherwise, it willbe impossible to correctly override NumPy functions from another objectif the operation also includes one of your objects.

For the most part, the rules for dispatch with__array_function__match those for__array_ufunc__. In particular:

  • NumPy will gather implementations of__array_function__ from allspecified inputs and call them in order: subclasses beforesuperclasses, and otherwise left to right. Note that in some edge casesinvolving subclasses, this differs slightly from thecurrent behavior of Python.

  • Implementations of__array_function__ indicate that they canhandle the operation by returning any value other thanNotImplemented.

  • If all__array_function__ methods returnNotImplemented,NumPy will raiseTypeError.

If no__array_function__ methods exists, NumPy will default to callingits own implementation, intended for use on NumPy arrays. This case arises,for example, when all array-like arguments are Python numbers or lists.(NumPy arrays do have a__array_function__ method, given below, but italways returnsNotImplemented if any argument other than a NumPy arraysubclass implements__array_function__.)

One deviation from the current behavior of__array_ufunc__ is thatNumPy will only call__array_function__ on thefirst argument of eachunique type. This matches Python’srule for calling reflected methods, andthis ensures that checking overloads has acceptable performance even whenthere are a large number of overloaded arguments.

class.__array_finalize__(obj)#

This method is called whenever the system internally allocates anew array fromobj, whereobj is a subclass (subtype) of thendarray. It can be used to change attributes ofselfafter construction (so as to ensure a 2-d matrix for example), orto update meta-information from the “parent.” Subclasses inherita default implementation of this method that does nothing.

class.__array_wrap__(array,context=None,return_scalar=False)#

At the end of everyufunc, this methodis called on the input object with the highest array priority, orthe output object if one was specified. The ufunc-computed arrayis passed in and whatever is returned is passed to the user.Subclasses inherit a default implementation of this method, whichtransforms the array into a new instance of the object’s class.Subclasses may opt to use this method to transform the output arrayinto an instance of the subclass and update metadata beforereturning the array to the user.

NumPy may also call this function without a context from non-ufuncs toallow preserving subclass information.

Changed in version 2.0:return_scalar is now passed as eitherFalse (usually) orTrueindicating that NumPy would return a scalar.Subclasses may ignore the value, or returnarray[()] to behave morelike NumPy.

Note

It is hoped to eventually deprecate this method in favour of__array_ufunc__ for ufuncs (and__array_function__for a few other functions likenumpy.squeeze).

class.__array_priority__#

The value of this attribute is used to determine what type ofobject to return in situations where there is more than onepossibility for the Python type of the returned object. Subclassesinherit a default value of 0.0 for this attribute.

Note

For ufuncs, it is hoped to eventually deprecate this method infavour of__array_ufunc__.

class.__array__(dtype=None,copy=None)#

If defined on an object, it must return a NumPyndarray.This method is called by array-coercion functions likenp.array()if an object implementing this interface is passed to those functions.

Third-party implementations of__array__ must takedtype andcopy arguments.

Deprecated since version NumPy:2.0Not implementingcopy anddtype is deprecated as of NumPy 2.When adding them, you must ensure correct behavior forcopy.

  • dtype is the requested data type of the returned array and is passedby NumPy positionally (only if requested by the user).It is acceptable to ignore thedtype because NumPy will check theresult and cast todtype if necessary. If it is more efficient tocoerce the data to the requested dtype without relying on NumPy,you should handle it in your library.

  • copy is a boolean passed by keyword. Ifcopy=True youmustreturn a copy. Returning a view into existing data will lead to incorrectuser code.Ifcopy=False the user requested that a copy is never made and youmustraise an error unless no copy is made and the returned array is a view intoexisting data. It is valid to always raise an error forcopy=False.The defaultcopy=None (not passed) allows for the result to either be aview or a copy. However, a view return should be preferred when possible.

Please refer toInteroperability with NumPyfor the protocol hierarchy, of which__array__ is the oldest and leastdesirable.

Note

If a class (ndarray subclass or not) having the__array__method is used as the output object of anufunc, results willnot be written to the objectreturned by__array__. This practice will returnTypeError.

Matrix objects#

Note

It is strongly advisednot to use the matrix subclass. As describedbelow, it makes writing functions that deal consistently with matricesand regular arrays very difficult. Currently, they are mainly used forinteracting withscipy.sparse. We hope to provide an alternativefor this use, however, and eventually remove thematrix subclass.

matrix objects inherit from the ndarray and therefore, theyhave the same attributes and methods of ndarrays. There are siximportant differences of matrix objects, however, that may lead tounexpected results when you use matrices but expect them to act likearrays:

  1. Matrix objects can be created using a string notation to allowMatlab-style syntax where spaces separate columns and semicolons(‘;’) separate rows.

  2. Matrix objects are always two-dimensional. This has far-reachingimplications, in that m.ravel() is still two-dimensional (with a 1in the first dimension) and item selection returns two-dimensionalobjects so that sequence behavior is fundamentally different thanarrays.

  3. Matrix objects over-ride multiplication to bematrix-multiplication.Make sure you understand this forfunctions that you may want to receive matrices. Especially inlight of the fact that asanyarray(m) returns a matrix when m isa matrix.

  4. Matrix objects over-ride power to be matrix raised to a power. Thesame warning about using power inside a function that usesasanyarray(…) to get an array object holds for this fact.

  5. The default __array_priority__ of matrix objects is 10.0, andtherefore mixed operations with ndarrays always produce matrices.

  6. Matrices have special attributes which make calculations easier.These are

    matrix.T

    Returns the transpose of the matrix.

    matrix.H

    Returns the (complex) conjugate transpose ofself.

    matrix.I

    Returns the (multiplicative) inverse of invertibleself.

    matrix.A

    Returnself as anndarray object.

Warning

Matrix objects over-ride multiplication, ‘*’, and power, ‘**’, tobe matrix-multiplication and matrix power, respectively. If yoursubroutine can accept sub-classes and you do not convert to base-class arrays, then you must use the ufuncs multiply and power tobe sure that you are performing the correct operation for allinputs.

The matrix class is a Python subclass of the ndarray and can be usedas a reference for how to construct your own subclass of the ndarray.Matrices can be created from other matrices, strings, and anythingelse that can be converted to anndarray . The name “mat “is analias for “matrix “in NumPy.

matrix(data[, dtype, copy])

Returns a matrix from an array-like object, or from a string of data.

asmatrix(data[, dtype])

Interpret the input as a matrix.

bmat(obj[, ldict, gdict])

Build a matrix object from a string, nested sequence, or array.

Example 1: Matrix creation from a string

>>>importnumpyasnp>>>a=np.asmatrix('1 2 3; 4 5 3')>>>print((a*a.T).I)  [[ 0.29239766 -0.13450292]  [-0.13450292  0.08187135]]

Example 2: Matrix creation from a nested sequence

>>>importnumpyasnp>>>np.asmatrix([[1,5,10],[1.0,3,4j]])matrix([[  1.+0.j,   5.+0.j,  10.+0.j],        [  1.+0.j,   3.+0.j,   0.+4.j]])

Example 3: Matrix creation from an array

>>>importnumpyasnp>>>np.asmatrix(np.random.rand(3,3)).Tmatrix([[4.17022005e-01, 3.02332573e-01, 1.86260211e-01],        [7.20324493e-01, 1.46755891e-01, 3.45560727e-01],        [1.14374817e-04, 9.23385948e-02, 3.96767474e-01]])

Memory-mapped file arrays#

Memory-mapped files are useful for reading and/or modifying smallsegments of a large file with regular layout, without reading theentire file into memory. A simple subclass of the ndarray uses amemory-mapped file for the data buffer of the array. For small files,the over-head of reading the entire file into memory is typically notsignificant, however for large files using memory mapping can saveconsiderable resources.

Memory-mapped-file arrays have one additional method (besides thosethey inherit from the ndarray):.flush() whichmust be called manually by the user to ensure that any changes to thearray actually get written to disk.

memmap(filename[, dtype, mode, offset, ...])

Create a memory-map to an array stored in abinary file on disk.

memmap.flush()

Write any changes in the array to the file on disk.

Example:

>>>importnumpyasnp
>>>a=np.memmap('newfile.dat',dtype=float,mode='w+',shape=1000)>>>a[10]=10.0>>>a[30]=30.0>>>dela
>>>b=np.fromfile('newfile.dat',dtype=float)>>>print(b[10],b[30])10.0 30.0
>>>a=np.memmap('newfile.dat',dtype=float)>>>print(a[10],a[30])10.0 30.0

Character arrays (numpy.char)#

Note

Thechararray class exists for backwards compatibility withNumarray, it is not recommended for new development. Starting from numpy1.4, if one needs arrays of strings, it is recommended to use arrays ofdtypeobject_,bytes_ orstr_, and use the free functionsin thenumpy.char module for fast vectorized string operations.

These are enhanced arrays of eitherstr_ type orbytes_ type. These arrays inherit from thendarray, but specially-define the operations+,*,and% on a (broadcasting) element-by-element basis. Theseoperations are not available on the standardndarray ofcharacter type. In addition, thechararray has all of thestandardstr (andbytes) methods,executing them on an element-by-element basis. Perhaps the easiestway to create a chararray is to useself.view(chararray) whereself is an ndarray of str or unicodedata-type. However, a chararray can also be created using thechararray constructor, or via thenumpy.char.array function:

char.chararray(shape[, itemsize, unicode, ...])

Provides a convenient view on arrays of string and unicode values.

char.array(obj[, itemsize, copy, unicode, order])

Create achararray.

Another difference with the standard ndarray of str data-type isthat the chararray inherits the feature introduced by Numarray thatwhite-space at the end of any element in the array will be ignoredon item retrieval and comparison operations.

Record arrays#

NumPy provides therecarray class which allows accessing thefields of a structured array as attributes, and a correspondingscalar data type objectrecord.

recarray(shape[, dtype, buf, offset, ...])

Construct an ndarray that allows field access using attributes.

record

A data-type scalar that allows field access as attribute lookup.

Note

The pandas DataFrame is more powerful than record array. If possible,please use pandas DataFrame instead.

Masked arrays (numpy.ma)#

Standard container class#

For backward compatibility and as a standard “container “class, theUserArray from Numeric has been brought over to NumPy and namednumpy.lib.user_array.container The container class is aPython class whose self.array attribute is an ndarray. Multipleinheritance is probably easier with numpy.lib.user_array.containerthan with the ndarray itself and so it is included by default. It isnot documented here beyond mentioning its existence because you areencouraged to use the ndarray class directly if you can.

numpy.lib.user_array.container(data[, ...])

Standard container-class for easy multiple-inheritance.

Array iterators#

Iterators are a powerful concept for array processing. Essentially,iterators implement a generalized for-loop. Ifmyiter is an iteratorobject, then the Python code:

forvalinmyiter:...somecodeinvolvingval...

callsval=next(myiter) repeatedly untilStopIteration israised by the iterator. There are several ways to iterate over anarray that may be useful: default iteration, flat iteration, and\(N\)-dimensional enumeration.

Default iteration#

The default iterator of an ndarray object is the default Pythoniterator of a sequence type. Thus, when the array object itself isused as an iterator. The default behavior is equivalent to:

foriinrange(arr.shape[0]):val=arr[i]

This default iterator selects a sub-array of dimension\(N-1\)from the array. This can be a useful construct for defining recursivealgorithms. To loop over the entire array requires\(N\) for-loops.

>>>importnumpyasnp>>>a=np.arange(24).reshape(3,2,4)+10>>>forvalina:...print('item:',val)item: [[10 11 12 13][14 15 16 17]]item: [[18 19 20 21][22 23 24 25]]item: [[26 27 28 29][30 31 32 33]]

Flat iteration#

ndarray.flat

A 1-D iterator over the array.

As mentioned previously, the flat attribute of ndarray objects returnsan iterator that will cycle over the entire array in C-stylecontiguous order.

>>>importnumpyasnp>>>fori,valinenumerate(a.flat):...ifi%5==0:print(i,val)0 105 1510 2015 2520 30

Here, I’ve used the built-in enumerate iterator to return the iteratorindex as well as the value.

N-dimensional enumeration#

ndenumerate(arr)

Multidimensional index iterator.

Sometimes it may be useful to get the N-dimensional index whileiterating. The ndenumerate iterator can achieve this.

>>>importnumpyasnp>>>fori,valinnp.ndenumerate(a):...ifsum(i)%5==0:            print(i, val)(0, 0, 0) 10(1, 1, 3) 25(2, 0, 3) 29(2, 1, 2) 32

Iterator for broadcasting#

broadcast

Produce an object that mimics broadcasting.

The general concept of broadcasting is also available from Pythonusing thebroadcast iterator. This object takes\(N\)objects as inputs and returns an iterator that returns tuplesproviding each of the input sequence elements in the broadcastedresult.

>>>importnumpyasnp>>>forvalinnp.broadcast([[1,0],[2,3]],[0,1]):...print(val)(np.int64(1), np.int64(0))(np.int64(0), np.int64(1))(np.int64(2), np.int64(0))(np.int64(3), np.int64(1))