NEP 30 — Duck typing for NumPy arrays - implementation#

Author:

Peter Andreas Entschev <pentschev@nvidia.com>

Author:

Stephan Hoyer <shoyer@google.com>

Status:

Superseded

Replaced-By:

NEP 56 — Array API standard support in NumPy’s main namespace

Type:

Standards Track

Created:

2019-07-31

Updated:

2019-07-31

Resolution:

https://mail.python.org/archives/list/numpy-discussion@python.org/message/Z6AA5CL47NHBNEPTFWYOTSUVSRDGHYPN/

Abstract#

We propose the__duckarray__ protocol, following the high-level overviewdescribed in NEP 22, allowing downstream libraries to return arrays of theirdefined types, in contrast tonp.asarray, that coerces thosearray_likeobjects to NumPy arrays.

Detailed description#

NumPy’s API, including array definitions, is implemented and mimicked incountless other projects. By definition, many of those arrays are fairlysimilar in how they operate to the NumPy standard. The introduction of__array_function__ allowed dispatching of functions implemented by severalof these projects directly via NumPy’s API. This introduces a new requirement,returning the NumPy-like array itself, rather than forcing a coercion into apure NumPy array.

For the purpose above, NEP 22 introduced the concept of duck typing to NumPyarrays. The suggested solution described in the NEP allows libraries to avoidcoercion of a NumPy-like array to a pure NumPy array where necessary, whilestill allowing that NumPy-like array libraries that do not wish to implementthe protocol to coerce arrays to a pure NumPy array vianp.asarray.

Usage Guidance#

Code that usesnp.duckarray is meant for supporting other ndarray-like objectsthat “follow the NumPy API”. That is an ill-defined concept at the moment –every known library implements the NumPy API only partly, and many deviateintentionally in at least some minor ways. This cannot be easily remedied, sofor users ofnp.duckarray we recommend the following strategy: check if theNumPy functionality used by the code that follows your use ofnp.duckarrayis present in Dask, CuPy and Sparse. If so, it’s reasonable to expect any duckarray to work here. If not, we suggest you indicate in your docstring what kindsof duck arrays are accepted, or what properties they need to have.

To exemplify the usage of duck arrays, suppose one wants to take themean()of an array-like objectarr. Using NumPy to achieve that, one could writenp.asarray(arr).mean() to achieve the intended result. Ifarr is nota NumPy array, this would create an actual NumPy array in order to call.mean(). However, if the array is an object that is compliant with the NumPyAPI (either in full or partially) such as a CuPy, Sparse or a Dask array, thenthat copy would have been unnecessary. On the other hand, if one were to use the new__duckarray__ protocol:np.duckarray(arr).mean(), andarr is an objectcompliant with the NumPy API, it would simply be returned rather than coercedinto a pure NumPy array, avoiding unnecessary copies and potential loss ofperformance.

Implementation#

The implementation idea is fairly straightforward, requiring a new functionduckarray to be introduced in NumPy, and a new method__duckarray__ inNumPy-like array classes. The new__duckarray__ method shall return thedownstream array-like object itself, such as theself object, while the__array__ method raisesTypeError. Alternatively, the__array__method could create an actual NumPy array and return that.

The new NumPyduckarray function can be implemented as follows:

defduckarray(array_like):ifhasattr(array_like,'__duckarray__'):returnarray_like.__duckarray__()returnnp.asarray(array_like)

Example for a project implementing NumPy-like arrays#

Now consider a library that implements a NumPy-compatible array class calledNumPyLikeArray, this class shall implement the methods described above, anda complete implementation would look like the following:

classNumPyLikeArray:def__duckarray__(self):returnselfdef__array__(self):raiseTypeError("NumPyLikeArray can not be converted to a NumPy ""array. You may want to use np.duckarray() instead.")

The implementation above exemplifies the simplest case, but the overall ideais that libraries will implement a__duckarray__ method that returns theoriginal object, and an__array__ method that either creates and returns anappropriate NumPy array, or raises aTypeError to prevent unintentional useas an object in a NumPy array (ifnp.asarray is called on an arbitraryobject that does not implement__array__, it will create a NumPy arrayscalar).

In case of existing libraries that don’t already implement__array__ butwould like to use duck array typing, it is advised that they introduceboth__array__ and__duckarray__ methods.

Usage#

An example of how the__duckarray__ protocol could be used to write astack function based onconcatenate, and its produced outcome, can beseen below. The example here was chosen not only to demonstrate the usage oftheduckarray function, but also to demonstrate its dependency on the NumPyAPI, demonstrated by checks on the array’sshape attribute. Note that theexample is merely a simplified version of NumPy’s actual implementation ofstack working on the first axis, and it is assumed that Dask has implementedthe__duckarray__ method.

defduckarray_stack(arrays):arrays=[np.duckarray(arr)forarrinarrays]shapes={arr.shapeforarrinarrays}iflen(shapes)!=1:raiseValueError('all input arrays must have the same shape')expanded_arrays=[arr[np.newaxis,...]forarrinarrays]returnnp.concatenate(expanded_arrays,axis=0)dask_arr=dask.array.arange(10)np_arr=np.arange(10)np_like=list(range(10))duckarray_stack((dask_arr,dask_arr))# Returns dask.arrayduckarray_stack((dask_arr,np_arr))# Returns dask.arrayduckarray_stack((dask_arr,np_like))# Returns dask.array

In contrast, using onlynp.asarray (at the time of writing of this NEP, thisis the usual method employed by library developers to ensure arrays areNumPy-like) has a different outcome:

defasarray_stack(arrays):arrays=[np.asanyarray(arr)forarrinarrays]# The remaining implementation is the same as that of# ``duckarray_stack`` aboveasarray_stack((dask_arr,dask_arr))# Returns np.ndarrayasarray_stack((dask_arr,np_arr))# Returns np.ndarrayasarray_stack((dask_arr,np_like))# Returns np.ndarray

Backward compatibility#

This proposal does not raise any backward compatibility issues within NumPy,given that it only introduces a new function. However, downstream librariesthat opt to introduce the__duckarray__ protocol may choose to remove theability of coercing arrays back to a NumPy array vianp.array ornp.asarray functions, preventing unintended effects of coercion of sucharrays back to a pure NumPy array (as some libraries already do, such as CuPyand Sparse), but still leaving libraries not implementing the protocol with thechoice of utilizingnp.duckarray to promotearray_like objects to pureNumPy arrays.

Previous proposals and discussion#

The duck typing protocol proposed here was described in a high level inNEP 22.

Additionally, longer discussions about the protocol and related proposalstook place innumpy/numpy #13831

Copyright#

This document has been placed in the public domain.