The array interface protocol#
Note
This page describes the NumPy-specific API for accessing the contents ofa NumPy array from other C extensions.PEP 3118 –TheRevisedBufferProtocol
introducessimilar, standardized API for any extension module to use.Cython’sbuffer array support uses thePEP 3118 API; see theCython NumPytutorial.
- version:
3
The array interface (sometimes called array protocol) was created in2005 as a means for array-like Python objects to reuse each other’sdata buffers intelligently whenever possible. The homogeneousN-dimensional array interface is a default mechanism for objects toshare N-dimensional array memory and information. The interfaceconsists of a Python-side and a C-side using two attributes. Objectswishing to be considered an N-dimensional array in application codeshould support at least one of these attributes. Objects wishing tosupport an N-dimensional array in application code should look for atleast one of these attributes and use the information providedappropriately.
This interface describes homogeneous arrays in the sense that eachitem of the array has the same “type”. This type can be very simpleor it can be a quite arbitrary and complicated C-like structure.
There are two ways to use the interface: A Python side and a C-side.Both are separate attributes.
Python side#
This approach to the interface consists of the object having an__array_interface__
attribute.
- object.__array_interface__#
A dictionary of items (3 required and 5 optional). The optionalkeys in the dictionary have implied defaults if they are notprovided.
The keys are:
- shape (required)
Tuple whose elements are the array size in each dimension. Eachentry is an integer (a Python
int
). Note that theseintegers could be larger than the platformint
orlong
could hold (a Pythonint
is a Clong
). It is up to the codeusing this attribute to handle this appropriately; either byraising an error when overflow is possible, or by usinglonglong
as the C type for the shapes.- typestr (required)
A string providing the basic type of the homogeneous array Thebasic string format consists of 3 parts: a character describingthe byteorder of the data (
<
: little-endian,>
:big-endian,|
: not-relevant), a character code giving thebasic type of the array, and an integer providing the number ofbytes the type uses.The basic type character codes are:
t
Bit field (following integer gives the number ofbits in the bit field).
b
Boolean (integer type where all values are only
True
orFalse
)i
Integer
u
Unsigned integer
f
Floating point
c
Complex floating point
m
Timedelta
M
Datetime
O
Object (i.e. the memory contains a pointer to
PyObject
)S
String (fixed-length sequence of char)
U
Unicode (fixed-length sequence of
Py_UCS4
)V
Other (void * – each item is a fixed-size chunk of memory)
- descr (optional)
A list of tuples providing a more detailed description of thememory layout for each item in the homogeneous array. Eachtuple in the list has two or three elements. Normally, thisattribute would be used whentypestr is
V[0-9]+
, but this isnot a requirement. The only requirement is that the number ofbytes represented in thetypestr key is the same as the totalnumber of bytes represented here. The idea is to supportdescriptions of C-like structs that make up arrayelements. The elements of each tuple in the list areA string providing a name associated with this portion ofthe datatype. This could also be a tuple of
('fullname','basic_name')
where basic name would be a valid Pythonvariable name representing the full name of the field.Either a basic-type description string as intypestr oranother list (for nested structured types)
An optional shape tuple providing how many times this partof the structure should be repeated. No repeats are assumedif this is not given. Very complicated structures can bedescribed using this generic interface. Notice, however,that each element of the array is still of the samedata-type. Some examples of using this interface are givenbelow.
Default:
[('',typestr)]
- data (optional)
A 2-tuple whose first argument is aPython integerthat points to the data-area storing the array contents.
Note
When converting from C/C++ via
PyLong_From*
or high-levelbindings such as Cython or pybind11, make sure to use an integerof sufficiently large bitness.This pointer must point to the first element ofdata (in other words any offset is always ignored in thiscase). The second entry in the tuple is a read-only flag (truemeans the data area is read-only).
This attribute can also be an object exposing thebuffer interface whichwill be used to share the data. If this key is not present (orreturns None), then memory sharing will be donethrough the buffer interface of the object itself. In thiscase, the offset key can be used to indicate the start of thebuffer. A reference to the object exposing the array interfacemust be stored by the new object if the memory area is to besecured.
Default:
None
- strides (optional)
Either
None
to indicate a C-style contiguous array ora tuple of strides which provides the number of bytes neededto jump to the next array element in the correspondingdimension. Each entry must be an integer (a Pythonint
). As with shape, the values maybe larger than can be represented by a Cint
orlong
; thecalling code should handle this appropriately, either byraising an error, or by usinglonglong
in C. Thedefault isNone
which implies a C-style contiguousmemory buffer. In this model, the last dimension of the arrayvaries the fastest. For example, the default strides tuplefor an object whose array entries are 8 bytes long and whoseshape is(10,20,30)
would be(4800,240,8)
.Default:
None
(C-style contiguous)- mask (optional)
None
or an object exposing the array interface. Allelements of the mask array should be interpreted only as trueor not true indicating which elements of this array are valid.The shape of this object should be“broadcastable” to the shape of theoriginal array.Default:
None
(All array values are valid)- offset (optional)
An integer offset into the array data region. This can only beused when data is
None
or returns amemoryview
object.Default:
0
.- version (required)
An integer showing the version of the interface (i.e. 3 forthis version). Be careful not to use this to invalidateobjects exposing future versions of the interface.
C-struct access#
This approach to the array interface allows for faster access to anarray using only one attribute lookup and a well-defined C-structure.
- object.__array_struct__#
A
PyCapsule
whosepointer
member contains apointer to a filledPyArrayInterface
structure. Memoryfor the structure is dynamically created and thePyCapsule
is also created with an appropriate destructor so the retriever ofthis attribute simply has to applyPy_DECREF
to theobject returned by this attribute when it is finished. Also,either the data needs to be copied out, or a reference to theobject exposing this attribute must be held to ensure the data isnot freed. Objects exposing the__array_struct__
interfacemust also not reallocate their memory if other objects arereferencing them.
ThePyArrayInterface
structure is defined innumpy/ndarrayobject.h
as:
typedefstruct{inttwo;/*containstheinteger2--simplesanitycheck*/intnd;/*numberofdimensions*/chartypekind;/*kindinarray---charactercodeoftypestr*/intitemsize;/*sizeofeachelement*/intflags;/*flagsindicatinghowthedatashouldbeinterpreted*//*mustsetARR_HAS_DESCRbittovalidatedescr*/Py_ssize_t*shape;/*Alength-ndarrayofshapeinformation*/Py_ssize_t*strides;/*Alength-ndarrayofstrideinformation*/void*data;/*Apointertothefirstelementofthearray*/PyObject*descr;/*NULLordata-description(sameasdescrkeyof__array_interface__)--mustsetARR_HAS_DESCRflagorthiswillbeignored.*/}PyArrayInterface;
The flags member may consist of 5 bits showing how the data should beinterpreted and one bit showing how the Interface should beinterpreted. The data-bits areNPY_ARRAY_C_CONTIGUOUS
(0x1),NPY_ARRAY_F_CONTIGUOUS
(0x2),NPY_ARRAY_ALIGNED
(0x100),NPY_ARRAY_NOTSWAPPED
(0x200), andNPY_ARRAY_WRITEABLE
(0x400). A final flagNPY_ARR_HAS_DESCR
(0x800) indicates whether or not this structurehas the arrdescr field. The field should not be accessed unless thisflag is present.
- NPY_ARR_HAS_DESCR#
New since June 16, 2006:
In the past most implementations used thedesc
member of thePyCObject
(nowPyCapsule
) itself (do not confuse this with the “descr” member ofthePyArrayInterface
structure above — they are two separatethings) to hold the pointer to the object exposing the interface.This is now an explicit part of the interface. Be sure to take areference to the object and callPyCapsule_SetContext
beforereturning thePyCapsule
, and configure a destructor to decref thisreference.
Note
__array_struct__
is considered legacy and should not be used for newcode. Use thebuffer protocol or the DLPack protocolnumpy.from_dlpack
instead.
Type description examples#
For clarity it is useful to provide some examples of the typedescription and corresponding__array_interface__
‘descr’entries. Thanks to Scott Gilbert for these examples:
In every case, the ‘descr’ key is optional, but of course providesmore information which may be important for various applications:
*Floatdatatypestr=='>f4'descr==[('','>f4')]*Complexdoubletypestr=='>c8'descr==[('real','>f4'),('imag','>f4')]*RGBPixeldatatypestr=='|V3'descr==[('r','|u1'),('g','|u1'),('b','|u1')]*Mixedendian(weirdbutcouldhappen).typestr=='|V8'(or'>u8')descr==[('big','>i4'),('little','<i4')]*Nestedstructurestruct{intival;struct{unsignedshortsval;unsignedcharbval;unsignedcharcval;}sub;}typestr=='|V8'(or'<u8'ifyouwant)descr==[('ival','<i4'),('sub',[('sval','<u2'),('bval','|u1'),('cval','|u1')])]*Nestedarraystruct{intival;doubledata[16*4];}typestr=='|V516'descr==[('ival','>i4'),('data','>f8',(16,4))]*Paddedstructurestruct{intival;doubledval;}typestr=='|V16'descr==[('ival','>i4'),('','|V4'),('dval','>f8')]
It should be clear that any structured type could be described using thisinterface.
Differences with array interface (version 2)#
The version 2 interface was very similar. The differences werelargely aesthetic. In particular:
The PyArrayInterface structure had no descr member at the end(and therefore no flag ARR_HAS_DESCR)
The
context
member of thePyCapsule
(formally thedesc
member of thePyCObject
) returned from__array_struct__
wasnot specified. Usually, it was the object exposing the array (sothat a reference to it could be kept and destroyed when theC-object was destroyed). It is now an explicit requirement that this fieldbe used in some way to hold a reference to the owning object.Note
Until August 2020, this said:
Now it must be a tuple whose first element is a string with“PyArrayInterface Version #” and whose second element is the objectexposing the array.
This design was retracted almost immediately after it was proposed, in<https://mail.python.org/pipermail/numpy-discussion/2006-June/020995.html>.Despite 14 years of documentation to the contrary, at no point was itvalid to assume that
__array_interface__
capsules held this tuplecontent.The tuple returned from
__array_interface__['data']
used to be ahex-string (now it is an integer or a long integer).There was no
__array_interface__
attribute instead all of the keys(except for version) in the__array_interface__
dictionary weretheir own attribute: Thus to obtain the Python-side information youhad to access separately the attributes:__array_data__
__array_shape__
__array_strides__
__array_typestr__
__array_descr__
__array_offset__
__array_mask__