Python Specification for DLPack¶

The Python specification for DLPack is a part of thePython array API standard.More details about the spec can be found under theData interchange mechanisms page.

Syntax for data interchange with DLPack¶

The array API will offer the following syntax for data interchange:

Afrom_dlpack() function, which accepts any (array) object withthe two DLPack methods implemented (see below) and uses them to constructa new array containing the data from the input array.
__dlpack__() and__dlpack_device__() methods on thearray object, which will be called from withinfrom_dlpack(), to querywhat device the array is on (may be needed to pass in the correctstream, e.g. in the case of multiple GPUs) and to access the data.

Semantics¶

DLPack describes the memory layout of dense, strided, n-dimensional arrays.When a user callsy=from_dlpack(x), the library implementingx (the“producer”) will provide access to the data fromx to the librarycontainingfrom_dlpack (the “consumer”). If possible, this must bezero-copy (i.e.y will be aview onx). If not possible, that librarymay flag this and make a copy of the data. In both cases:

The producer keeps owning the memory ofx (andy if a copy is made)
y may or may not be a view, therefore the user must keep the recommendation toavoid mutatingy in mind - seeCopy-view behavior and mutability.
Bothx andy may continue to be used just like arrays created in other ways.

If an array that is accessed via the interchange protocol lives on a device thatthe requesting (consumer) library does not support, it is recommended to raise aBufferError, unless an explicit copy is requested (see below) and the producercan support the request.

Stream handling through thestream keyword applies to CUDA and ROCm (perhapsto other devices that have a stream concept as well, however those haven’t beenconsidered in detail). The consumer must pass the stream it will use to theproducer; the producer must synchronize or wait on the stream when necessary.In the common case of the default stream being used, synchronization will beunnecessary so asynchronous execution is enabled.

Starting Python array API standard v2023, a copy can be explicitly requested (ordisabled) through the newcopy argument offrom_dlpack(). When a copy ismade, the producer must set theDLPACK_FLAG_BITMASK_IS_COPIED bit flag.It is also possible to request cross-device copies through the newdeviceargument, though the v2023 standard only mandates the support ofkDLCPU.

Implementation¶

Note that while this API standard largely tries to avoid discussingimplementation details, some discussion and requirements are neededhere because data interchange requires coordination betweenimplementers on, e.g., memory management.

DLPack diagram. Dark blue are the structs it defines, light bluestruct members, gray text enum values of supported devices and datatypes.

Starting Python array API standard v2023, a newmax_version argumentis added to__dlpack__ for the consumer to signal the producer themaximal supported DLPack version. Starting DLPack 1.0, theDLManagedTensorVersionedstruct should be used and the existingDLManagedTensor struct is considereddeprecated, though a library should try to support both during the transitionperiod if possible.

Note

In the rest of this document,DLManagedTensorVersioned andDLManagedTensorare treated as synonyms, assuming a proper handling ofmax_version has beendone to choose the right struct. As far as the capsule name is concerned,whenDLManagedTensorVersioned is in use the capsule namesdltensorandused_dltensor will need a_versioned suffix.

The__dlpack__ method will produce aPyCapsule containing aDLManagedTensor, which will be consumed immediately withinfrom_dlpack - therefore it is consumed exactly once, and it will not bevisible to users of the Python API.

The producer must set thePyCapsule name to"dltensor" so thatit can be inspected by name, and setPyCapsule_Destructor that callsthedeleter of theDLManagedTensor when the"dltensor"-namedcapsule is no longer needed.

The consumer must transer ownership of theDLManagedTensor from thecapsule to its own object. It does so by renaming the capsule to"used_dltensor" to ensure thatPyCapsule_Destructor will not getcalled (ensured ifPyCapsule_Destructor callsdeleter only forcapsules whose name is"dltensor"), but thedeleter of theDLManagedTensor will be called by the destructor of the consumerlibrary object created to own theDLManagedTensor obtained from thecapsule. Below is an example of the capsule deleter written in the PythonC API which is called either when the refcount on the capsule named"dltensor" reaches zero or the consumer decides to deallocate its array:

staticvoiddlpack_capsule_deleter(PyObject*self){if(PyCapsule_IsValid(self,"used_dltensor")){return;/* Do nothing if the capsule has been consumed. */}DLManagedTensor*managed=(DLManagedTensor*)PyCapsule_GetPointer(self,"dltensor");if(managed==NULL){PyErr_WriteUnraisable(self);return;}/* the spec says the deleter can be NULL if there is no way for the caller to provide a reasonable destructor. */if(managed->deleter){managed->deleter(managed);}}

Note: the capsule names"dltensor" and"used_dltensor" must bestatically allocated.

TheDLManagedTensor deleter must ensure that sharing beyond Pythonboundaries is possible, this means that the GIL must be acquired explicitlyif it uses Python objects or API.In Python, the deleter usually needs toPy_DECREF() the original ownerand free theDLManagedTensor allocation.For example, NumPy uses the following code to ensure sharing with arbitrarynon-Python code is safe:

staticvoidarray_dlpack_deleter(DLManagedTensor*self){/*    * Leak the Python object if the Python runtime is not available.    * This can happen if the DLPack consumer destroys the tensor late    * after Python runtime finalization (for example in case the tensor    * was indirectly kept alive by a C++ static variable).    */if(!Py_IsInitialized()){return;}PyGILState_STATEstate=PyGILState_Ensure();PyObject*array=(PyObject*)self->manager_ctx;// This will also free the shape and strides as it's one allocation.PyMem_Free(self);Py_XDECREF(array);PyGILState_Release(state);}

When thestrides field in theDLTensor struct isNULL, it indicates arow-major compact array. If the array is of size zero, the data pointer inDLTensor should be set to eitherNULL or0.

For further details on DLPack design and how to implement support for it,refer togithub.com/dmlc/dlpack.

Warning

DLPack contains adevice_id, which will be the deviceID (an integer,0,1,...) which the producer library uses. Inpractice this will likely be the same numbering as that of theconsumer, however that is not guaranteed. Depending on the hardwaretype, it may be possible for the consumer library implementation tolook up the actual device from the pointer to the data - this ispossible for example for CUDA device pointers.

It is recommended that implementers of this array API consider and documentwhether thedevice attribute of the array returned fromfrom_dlpack isguaranteed to be in a certain order or not.

Reference Implementations¶

Several Python libraries have adopted this standard using Python C API, C++, Cython,ctypes, cffi, etc:

NumPy:Python C API
CuPy:Cython
Tensorflow:C++,Python wrapper using Python C API,XLA
PyTorch:C++,Python wrapper using Python C API
MXNet:ctypes
TVM:ctypes,Cython
mpi4py:Cython
Paddle:C++,Python wrapper using Python C API
Hidet:ctypes

C API (dlpack.h)

Movatterモバイル変換

Python Specification for DLPack¶

Syntax for data interchange with DLPack¶

Semantics¶

Implementation¶

Reference Implementations¶