The DLPack Protocol#
The DLPack Protocolis a stable in-memory data structure that allows exchangebetween major frameworks working with multidimensionalarrays or tensors. It is designed for cross hardwaresupport meaning it allows exchange of data on devices otherthan the CPU (e.g. GPU).
DLPack protocol had beenselected as the Python array API standardby theConsortium for Python Data API Standardsin order to enable device aware data interchange between array/tensorlibraries in the Python ecosystem. See more about the standardin theprotocol documentationand more about DLPack in thePython Specification for DLPack.
Implementation of DLPack in PyArrow#
The producing side of the DLPack Protocol is implemented forpa.Arrayand can be used to interchange data between PyArrow and other tensorlibraries. Supported data types are integer, unsigned integer and float. Theprotocol has no missing data support meaning PyArrow arrays withmissing values cannot be transferred through the DLPackprotocol. Currently, the Arrow implementation of the protocol only supportsdata on a CPU device.
Data interchange syntax of the protocol includes
from_dlpack(x): consuming an array object that implements a__dlpack__method and creating a new array while sharing thememory.__dlpack__(self,stream=None)and__dlpack_device__:producing a PyCapsule with the DLPack struct which is called fromwithinfrom_dlpack(x).
PyArrow implements the second part of the protocol(__dlpack__(self,stream=None) and__dlpack_device__) and canthus be consumed by libraries implementingfrom_dlpack.
Examples#
Convert a PyArrow CPU array into a NumPy array:
>>>importpyarrowaspa>>>array=pa.array([2,0,2,4])<pyarrow.lib.Int64Array object at 0x121fd4880>[2,0,2,4]>>>importnumpyasnp>>>np.from_dlpack(array)array([2, 0, 2, 4])
Convert a PyArrow CPU array into a PyTorch tensor:
>>>importtorch>>>torch.from_dlpack(array)tensor([2, 0, 2, 4])
Convert a PyArrow CPU array into a JAX array:
>>>importjax>>>jax.numpy.from_dlpack(array)Array([2, 0, 2, 4], dtype=int32)>>>jax.dlpack.from_dlpack(array)Array([2, 0, 2, 4], dtype=int32)

