Standard array subclasses#
Note
Subclassing anumpy.ndarray is possible but if your goal is to createan array withmodified behavior, as do dask arrays for distributedcomputation and cupy arrays for GPU-based computation, subclassing isdiscouraged. Instead, using numpy’sdispatch mechanism is recommended.
Thendarray can be inherited from (in Python or in C)if desired. Therefore, it can form a foundation for many usefulclasses. Often whether to sub-class the array object or to simply usethe core array component as an internal part of a new class is adifficult decision, and can be simply a matter of choice. NumPy hasseveral tools for simplifying how your new object interacts with otherarray objects, and so the choice may not be significant in theend. One way to simplify the question is by asking yourself if theobject you are interested in can be replaced as a single array or doesit really require two or more arrays at its core.
Note thatasarray always returns the base-class ndarray. Ifyou are confident that your use of the array object can handle anysubclass of an ndarray, thenasanyarray can be used to allowsubclasses to propagate more cleanly through your subroutine. Inprincipal a subclass could redefine any aspect of the array andtherefore, under strict guidelines,asanyarray would rarely beuseful. However, most subclasses of the array object will notredefine certain aspects of the array object such as the bufferinterface, or the attributes of the array. One important example,however, of why your subroutine may not be able to handle an arbitrarysubclass of an array is that matrices redefine the “*” operator to bematrix-multiplication, rather than element-by-element multiplication.
Special attributes and methods#
See also
NumPy provides several hooks that classes can customize:
- class.__array_ufunc__(ufunc,method,*inputs,**kwargs)#
Any class, ndarray subclass or not, can define this method or set it toNone in order to override the behavior of NumPy’s ufuncs. This worksquite similarly to Python’s
__mul__and other binary operation routines.ufunc is the ufunc object that was called.
method is a string indicating which Ufunc method was called(one of
"__call__","reduce","reduceat","accumulate","outer","inner").inputs is a tuple of the input arguments to the
ufunc.kwargs is a dictionary containing the optional input argumentsof the ufunc. If given, any
outarguments, both positionaland keyword, are passed as atupleinkwargs. See thediscussion inUniversal functions (ufunc) for details.
The method should return either the result of the operation, or
NotImplementedif the operation requested is not implemented.If one of the input, output, or
wherearguments has a__array_ufunc__method, it is executedinstead of the ufunc. If more than one of thearguments implements__array_ufunc__, they are tried in theorder: subclasses before superclasses, inputs before outputs,outputs beforewhere, otherwiseleft to right. The first routine returning something other thanNotImplementeddetermines the result. If all of the__array_ufunc__operations returnNotImplemented, aTypeErroris raised.Note
We intend to re-implement numpy functions as (generalized)Ufunc, in which case it will become possible for them to beoverridden by the
__array_ufunc__method. A prime candidate ismatmul, which currently is not a Ufunc, but could berelatively easily be rewritten as a (set of) generalized Ufuncs. Thesame may happen with functions such asmedian,amin, andargsort.Like with some other special methods in python, such as
__hash__and__iter__, it is possible to indicate that your class doesnotsupport ufuncs by setting__array_ufunc__=None. Ufuncs always raiseTypeErrorwhen called on an object that sets__array_ufunc__=None.The presence of
__array_ufunc__also influences howndarrayhandles binary operations likearr+objandarr<objwhenarris anndarrayandobjis an instanceof a custom class. There are two possibilities. Ifobj.__array_ufunc__is present and not None, thenndarray.__add__and friends will delegate to the ufunc machinery,meaning thatarr+objbecomesnp.add(arr,obj), and thenaddinvokesobj.__array_ufunc__. This is useful if youwant to define an object that acts like an array.Alternatively, if
obj.__array_ufunc__is set to None, then as aspecial case, special methods likendarray.__add__will notice thisandunconditionally raiseTypeError. This is useful if you want tocreate objects that interact with arrays via binary operations, butare not themselves arrays. For example, a units handling system might havean objectmrepresenting the “meters” unit, and want to support thesyntaxarr*mto represent that the array has units of “meters”, butnot want to otherwise interact with arrays via ufuncs or otherwise. Thiscan be done by setting__array_ufunc__=Noneand defining__mul__and__rmul__methods. (Note that this means that writing an__array_ufunc__that always returnsNotImplementedis notquite the same as setting__array_ufunc__=None: in the formercase,arr+objwill raiseTypeError, while in the lattercase it is possible to define a__radd__method to prevent this.)The above does not hold for in-place operators, for which
ndarraynever returnsNotImplemented. Hence,arr+=objwould alwayslead to aTypeError. This is because for arrays in-place operationscannot generically be replaced by a simple reverse operation. (Forinstance, by default,arr+=objwould be translated toarr=arr+obj, i.e.,arrwould be replaced, contrary to what is expectedfor in-place array operations.)Note
If you define
__array_ufunc__:If you are not a subclass of
ndarray, we recommend yourclass define special methods like__add__and__lt__thatdelegate to ufuncs just like ndarray does. An easy way to do thisis to subclass fromNDArrayOperatorsMixin.If you subclass
ndarray, we recommend that you put all youroverride logic in__array_ufunc__and not also override specialmethods. This ensures the class hierarchy is determined in only oneplace rather than separately by the ufunc machinery and by the binaryoperation rules (which gives preference to special methods ofsubclasses; the alternative way to enforce a one-place only hierarchy,of setting__array_ufunc__to None, would seem veryunexpected and thus confusing, as then the subclass would not work atall with ufuncs).ndarraydefines its own__array_ufunc__, which,evaluates the ufunc if no arguments have overrides, and returnsNotImplementedotherwise. This may be useful for subclassesfor which__array_ufunc__converts any instances of its ownclass tondarray: it can then pass these on to itssuperclass usingsuper().__array_ufunc__(*inputs,**kwargs),and finally return the results after possible back-conversion. Theadvantage of this practice is that it ensures that it is possibleto have a hierarchy of subclasses that extend the behaviour. SeeSubclassing ndarray for details.
- class.__array_function__(func,types,args,kwargs)#
funcis an arbitrary callable exposed by NumPy’s public API,which was called in the formfunc(*args,**kwargs).typesis a collectioncollections.abc.Collectionof unique argument types from the original NumPy function call thatimplement__array_function__.The tuple
argsand dictkwargsare directly passed on from theoriginal call.
As a convenience for
__array_function__implementers,typesprovides all argument types with an'__array_function__'attribute.This allows implementers to quickly identify cases where they should deferto__array_function__implementations on other arguments.Implementations should not rely on the iteration order oftypes.Most implementations of
__array_function__will start with twochecks:Is the given function something that we know how to overload?
Are all arguments of a type that we know how to handle?
If these conditions hold,
__array_function__should return the resultfrom calling its implementation forfunc(*args,**kwargs). Otherwise,it should return the sentinel valueNotImplemented, indicating that thefunction is not implemented by these types.There are no general requirements on the return value from
__array_function__, although most sensible implementations shouldprobably return array(s) with the same type as one of the function’sarguments.It may also be convenient to define a custom decorators (
implementsbelow) for registering__array_function__implementations.HANDLED_FUNCTIONS={}classMyArray:def__array_function__(self,func,types,args,kwargs):iffuncnotinHANDLED_FUNCTIONS:returnNotImplemented# Note: this allows subclasses that don't override# __array_function__ to handle MyArray objectsifnotall(issubclass(t,MyArray)fortintypes):returnNotImplementedreturnHANDLED_FUNCTIONS[func](*args,**kwargs)defimplements(numpy_function):"""Register an __array_function__ implementation for MyArray objects."""defdecorator(func):HANDLED_FUNCTIONS[numpy_function]=funcreturnfuncreturndecorator@implements(np.concatenate)defconcatenate(arrays,axis=0,out=None):...# implementation of concatenate for MyArray objects@implements(np.broadcast_to)defbroadcast_to(array,shape):...# implementation of broadcast_to for MyArray objects
Note that it is not required for
__array_function__implementations toincludeall of the corresponding NumPy function’s optional arguments(e.g.,broadcast_toabove omits the irrelevantsubokargument).Optional arguments are only passed in to__array_function__if theywere explicitly used in the NumPy function call.Just like the case for builtin special methods like
__add__, properlywritten__array_function__methods should always returnNotImplementedwhen an unknown type is encountered. Otherwise, it willbe impossible to correctly override NumPy functions from another objectif the operation also includes one of your objects.For the most part, the rules for dispatch with
__array_function__match those for__array_ufunc__. In particular:NumPy will gather implementations of
__array_function__from allspecified inputs and call them in order: subclasses beforesuperclasses, and otherwise left to right. Note that in some edge casesinvolving subclasses, this differs slightly from thecurrent behavior of Python.Implementations of
__array_function__indicate that they canhandle the operation by returning any value other thanNotImplemented.If all
__array_function__methods returnNotImplemented,NumPy will raiseTypeError.
If no
__array_function__methods exists, NumPy will default to callingits own implementation, intended for use on NumPy arrays. This case arises,for example, when all array-like arguments are Python numbers or lists.(NumPy arrays do have a__array_function__method, given below, but italways returnsNotImplementedif any argument other than a NumPy arraysubclass implements__array_function__.)One deviation from the current behavior of
__array_ufunc__is thatNumPy will only call__array_function__on thefirst argument of eachunique type. This matches Python’srule for calling reflected methods, andthis ensures that checking overloads has acceptable performance even whenthere are a large number of overloaded arguments.
- class.__array_finalize__(obj)#
This method is called whenever the system internally allocates anew array fromobj, whereobj is a subclass (subtype) of the
ndarray. It can be used to change attributes ofselfafter construction (so as to ensure a 2-d matrix for example), orto update meta-information from the “parent.” Subclasses inherita default implementation of this method that does nothing.
- class.__array_wrap__(array,context=None,return_scalar=False)#
At the end of everyufunc, this methodis called on the input object with the highest array priority, orthe output object if one was specified. The ufunc-computed arrayis passed in and whatever is returned is passed to the user.Subclasses inherit a default implementation of this method, whichtransforms the array into a new instance of the object’s class.Subclasses may opt to use this method to transform the output arrayinto an instance of the subclass and update metadata beforereturning the array to the user.
NumPy may also call this function without a context from non-ufuncs toallow preserving subclass information.
Changed in version 2.0:
return_scalaris now passed as eitherFalse(usually) orTrueindicating that NumPy would return a scalar.Subclasses may ignore the value, or returnarray[()]to behave morelike NumPy.Note
It is hoped to eventually deprecate this method in favour of
__array_ufunc__for ufuncs (and__array_function__for a few other functions likenumpy.squeeze).
- class.__array_priority__#
The value of this attribute is used to determine what type ofobject to return in situations where there is more than onepossibility for the Python type of the returned object. Subclassesinherit a default value of 0.0 for this attribute.
Note
For ufuncs, it is hoped to eventually deprecate this method infavour of
__array_ufunc__.
- class.__array__(dtype=None,copy=None)#
If defined on an object, it must return a NumPy
ndarray.This method is called by array-coercion functions likenp.array()if an object implementing this interface is passed to those functions.Third-party implementations of
__array__must takedtypeandcopyarguments.Deprecated since version NumPy:2.0Not implementing
copyanddtypeis deprecated as of NumPy 2.When adding them, you must ensure correct behavior forcopy.dtypeis the requested data type of the returned array and is passedby NumPy positionally (only if requested by the user).It is acceptable to ignore thedtypebecause NumPy will check theresult and cast todtypeif necessary. If it is more efficient tocoerce the data to the requested dtype without relying on NumPy,you should handle it in your library.copyis a boolean passed by keyword. Ifcopy=Trueyoumustreturn a copy. Returning a view into existing data will lead to incorrectuser code.Ifcopy=Falsethe user requested that a copy is never made and youmustraise an error unless no copy is made and the returned array is a view intoexisting data. It is valid to always raise an error forcopy=False.The defaultcopy=None(not passed) allows for the result to either be aview or a copy. However, a view return should be preferred when possible.
Please refer toInteroperability with NumPyfor the protocol hierarchy, of which
__array__is the oldest and leastdesirable.
Matrix objects#
Note
It is strongly advisednot to use the matrix subclass. As describedbelow, it makes writing functions that deal consistently with matricesand regular arrays very difficult. Currently, they are mainly used forinteracting withscipy.sparse. We hope to provide an alternativefor this use, however, and eventually remove thematrix subclass.
matrix objects inherit from the ndarray and therefore, theyhave the same attributes and methods of ndarrays. There are siximportant differences of matrix objects, however, that may lead tounexpected results when you use matrices but expect them to act likearrays:
Matrix objects can be created using a string notation to allowMatlab-style syntax where spaces separate columns and semicolons(‘;’) separate rows.
Matrix objects are always two-dimensional. This has far-reachingimplications, in that m.ravel() is still two-dimensional (with a 1in the first dimension) and item selection returns two-dimensionalobjects so that sequence behavior is fundamentally different thanarrays.
Matrix objects over-ride multiplication to bematrix-multiplication.Make sure you understand this forfunctions that you may want to receive matrices. Especially inlight of the fact that asanyarray(m) returns a matrix when m isa matrix.
Matrix objects over-ride power to be matrix raised to a power. Thesame warning about using power inside a function that usesasanyarray(…) to get an array object holds for this fact.
The default __array_priority__ of matrix objects is 10.0, andtherefore mixed operations with ndarrays always produce matrices.
Matrices have special attributes which make calculations easier.These are
Warning
Matrix objects over-ride multiplication, ‘*’, and power, ‘**’, tobe matrix-multiplication and matrix power, respectively. If yoursubroutine can accept sub-classes and you do not convert to base-class arrays, then you must use the ufuncs multiply and power tobe sure that you are performing the correct operation for allinputs.
The matrix class is a Python subclass of the ndarray and can be usedas a reference for how to construct your own subclass of the ndarray.Matrices can be created from other matrices, strings, and anythingelse that can be converted to anndarray . The name “mat “is analias for “matrix “in NumPy.
| Returns a matrix from an array-like object, or from a string of data. |
| Interpret the input as a matrix. |
| Build a matrix object from a string, nested sequence, or array. |
Example 1: Matrix creation from a string
>>>importnumpyasnp>>>a=np.asmatrix('1 2 3; 4 5 3')>>>print((a*a.T).I) [[ 0.29239766 -0.13450292] [-0.13450292 0.08187135]]
Example 2: Matrix creation from a nested sequence
>>>importnumpyasnp>>>np.asmatrix([[1,5,10],[1.0,3,4j]])matrix([[ 1.+0.j, 5.+0.j, 10.+0.j], [ 1.+0.j, 3.+0.j, 0.+4.j]])
Example 3: Matrix creation from an array
>>>importnumpyasnp>>>np.asmatrix(np.random.rand(3,3)).Tmatrix([[4.17022005e-01, 3.02332573e-01, 1.86260211e-01], [7.20324493e-01, 1.46755891e-01, 3.45560727e-01], [1.14374817e-04, 9.23385948e-02, 3.96767474e-01]])
Memory-mapped file arrays#
Memory-mapped files are useful for reading and/or modifying smallsegments of a large file with regular layout, without reading theentire file into memory. A simple subclass of the ndarray uses amemory-mapped file for the data buffer of the array. For small files,the over-head of reading the entire file into memory is typically notsignificant, however for large files using memory mapping can saveconsiderable resources.
Memory-mapped-file arrays have one additional method (besides thosethey inherit from the ndarray):.flush() whichmust be called manually by the user to ensure that any changes to thearray actually get written to disk.
| Create a memory-map to an array stored in abinary file on disk. |
Write any changes in the array to the file on disk. |
Example:
>>>importnumpyasnp
>>>a=np.memmap('newfile.dat',dtype=float,mode='w+',shape=1000)>>>a[10]=10.0>>>a[30]=30.0>>>dela
>>>b=np.fromfile('newfile.dat',dtype=float)>>>print(b[10],b[30])10.0 30.0
>>>a=np.memmap('newfile.dat',dtype=float)>>>print(a[10],a[30])10.0 30.0
Character arrays (numpy.char)#
Note
Thechararray class exists for backwards compatibility withNumarray, it is not recommended for new development. Starting from numpy1.4, if one needs arrays of strings, it is recommended to use arrays ofdtypeobject_,bytes_ orstr_, and use the free functionsin thenumpy.char module for fast vectorized string operations.
These are enhanced arrays of eitherstr_ type orbytes_ type. These arrays inherit from thendarray, but specially-define the operations+,*,and% on a (broadcasting) element-by-element basis. Theseoperations are not available on the standardndarray ofcharacter type. In addition, thechararray has all of thestandardstr (andbytes) methods,executing them on an element-by-element basis. Perhaps the easiestway to create a chararray is to useself.view(chararray) whereself is an ndarray of str or unicodedata-type. However, a chararray can also be created using thechararray constructor, or via thenumpy.char.array function:
| Provides a convenient view on arrays of string and unicode values. |
| Create a |
Another difference with the standard ndarray of str data-type isthat the chararray inherits the feature introduced by Numarray thatwhite-space at the end of any element in the array will be ignoredon item retrieval and comparison operations.
Record arrays#
NumPy provides therecarray class which allows accessing thefields of a structured array as attributes, and a correspondingscalar data type objectrecord.
| Construct an ndarray that allows field access using attributes. |
A data-type scalar that allows field access as attribute lookup. |
Note
The pandas DataFrame is more powerful than record array. If possible,please use pandas DataFrame instead.
Masked arrays (numpy.ma)#
See also
Standard container class#
For backward compatibility and as a standard “container “class, theUserArray from Numeric has been brought over to NumPy and namednumpy.lib.user_array.container The container class is aPython class whose self.array attribute is an ndarray. Multipleinheritance is probably easier with numpy.lib.user_array.containerthan with the ndarray itself and so it is included by default. It isnot documented here beyond mentioning its existence because you areencouraged to use the ndarray class directly if you can.
| Standard container-class for easy multiple-inheritance. |
Array iterators#
Iterators are a powerful concept for array processing. Essentially,iterators implement a generalized for-loop. Ifmyiter is an iteratorobject, then the Python code:
forvalinmyiter:...somecodeinvolvingval...
callsval=next(myiter) repeatedly untilStopIteration israised by the iterator. There are several ways to iterate over anarray that may be useful: default iteration, flat iteration, and\(N\)-dimensional enumeration.
Default iteration#
The default iterator of an ndarray object is the default Pythoniterator of a sequence type. Thus, when the array object itself isused as an iterator. The default behavior is equivalent to:
foriinrange(arr.shape[0]):val=arr[i]
This default iterator selects a sub-array of dimension\(N-1\)from the array. This can be a useful construct for defining recursivealgorithms. To loop over the entire array requires\(N\) for-loops.
>>>importnumpyasnp>>>a=np.arange(24).reshape(3,2,4)+10>>>forvalina:...print('item:',val)item: [[10 11 12 13][14 15 16 17]]item: [[18 19 20 21][22 23 24 25]]item: [[26 27 28 29][30 31 32 33]]
Flat iteration#
A 1-D iterator over the array. |
As mentioned previously, the flat attribute of ndarray objects returnsan iterator that will cycle over the entire array in C-stylecontiguous order.
>>>importnumpyasnp>>>fori,valinenumerate(a.flat):...ifi%5==0:print(i,val)0 105 1510 2015 2520 30
Here, I’ve used the built-in enumerate iterator to return the iteratorindex as well as the value.
N-dimensional enumeration#
| Multidimensional index iterator. |
Sometimes it may be useful to get the N-dimensional index whileiterating. The ndenumerate iterator can achieve this.
>>>importnumpyasnp>>>fori,valinnp.ndenumerate(a):...ifsum(i)%5==0: print(i, val)(0, 0, 0) 10(1, 1, 3) 25(2, 0, 3) 29(2, 1, 2) 32
Iterator for broadcasting#
Produce an object that mimics broadcasting. |
The general concept of broadcasting is also available from Pythonusing thebroadcast iterator. This object takes\(N\)objects as inputs and returns an iterator that returns tuplesproviding each of the input sequence elements in the broadcastedresult.
>>>importnumpyasnp>>>forvalinnp.broadcast([[1,0],[2,3]],[0,1]):...print(val)(np.int64(1), np.int64(0))(np.int64(0), np.int64(1))(np.int64(2), np.int64(0))(np.int64(3), np.int64(1))