The Arrow C Device data interface#

Warning

The Arrow C Device Data Interface should be considered experimental

Rationale#

The currentC Data Interface, and mostimplementations of it, make the assumption that all data buffers providedare CPU buffers. Since Apache Arrow is designed to be a universal in-memoryformat for representing tabular (“columnar”) data, there will be the desireto leverage this data on non-CPU hardware such as GPUs. One example of sucha case is theRAPIDS cuDF library which uses the Arrow memory format withCUDA for NVIDIA GPUs. Since copying data from host to device and back isexpensive, the ideal would be to be able to leave the data on the devicefor as long as possible, even when passing it between runtimes andlibraries.

The Arrow C Device data interface builds on the existing C data interfaceby adding a very small, stable set of C definitions to it. These definitionsare equivalents to theArrowArray andArrowArrayStream structuresfrom the C Data Interface which add members to allow specifying the devicetype and pass necessary information to synchronize with the producer.For non-C/C++ languages and runtimes, translating the C definitions tocorresponding C FFI declarations should be just as simple as with thecurrent C data interface.

Applications and libraries can then use Arrow schemas and Arrow formattedmemory on non-CPU devices to exchange data just as easily as they donow with CPU data. This will enable leaving data on those devices longerand avoiding costly copies back and forth between the host and devicejust to leverage new libraries and runtimes.

Goals#

  • Expose an ABI-stable interface built on the existing C data interface.

  • Make it easy for third-party projects to implement support with littleinitial investment.

  • Allow zero-copy sharing of Arrow formatted device memory betweenindependent runtimes and components running in the same process.

  • Avoid the need for one-to-one adaptation layers such as theCUDA Array Interface for Python processes to pass CUDA data.

  • Enable integration without explicit dependencies (either at compile-timeor runtime) on the Arrow software project itself.

The intent is for the Arrow C Device data interface to expand the reachof the current C data interface, allowing it to also become the standardlow-level building block for columnar processing on devices like GPUs orFPGAs.

Structure definitions#

Because this is built on the C data interface, the C Device data interfaceuses theArrowSchema andArrowArray structures as defined in theC data interface spec. It then adds thefollowing free-standing definitions. Like the rest of the Arrow project,they are available under the Apache License 2.0.

#ifndef ARROW_C_DEVICE_DATA_INTERFACE#define ARROW_C_DEVICE_DATA_INTERFACE// Device type for the allocated memorytypedefint32_tArrowDeviceType;// CPU device, same as using ArrowArray directly#define ARROW_DEVICE_CPU 1// CUDA GPU Device#define ARROW_DEVICE_CUDA 2// Pinned CUDA CPU memory by cudaMallocHost#define ARROW_DEVICE_CUDA_HOST 3// OpenCL Device#define ARROW_DEVICE_OPENCL 4// Vulkan buffer for next-gen graphics#define ARROW_DEVICE_VULKAN 7// Metal for Apple GPU#define ARROW_DEVICE_METAL 8// Verilog simulator buffer#define ARROW_DEVICE_VPI 9// ROCm GPUs for AMD GPUs#define ARROW_DEVICE_ROCM 10// Pinned ROCm CPU memory allocated by hipMallocHost#define ARROW_DEVICE_ROCM_HOST 11// Reserved for extension//// used to quickly test extension devices, semantics// can differ based on implementation#define ARROW_DEVICE_EXT_DEV 12// CUDA managed/unified memory allocated by cudaMallocManaged#define ARROW_DEVICE_CUDA_MANAGED 13// Unified shared memory allocated on a oneAPI// non-partitioned device.//// A call to the oneAPI runtime is required to determine the// device type, the USM allocation type and the sycl context// that it is bound to.#define ARROW_DEVICE_ONEAPI 14// GPU support for next-gen WebGPU standard#define ARROW_DEVICE_WEBGPU 15// Qualcomm Hexagon DSP#define ARROW_DEVICE_HEXAGON 16structArrowDeviceArray{structArrowArrayarray;int64_tdevice_id;ArrowDeviceTypedevice_type;void*sync_event;// reserved bytes for future expansionint64_treserved[3];};#endif// ARROW_C_DEVICE_DATA_INTERFACE

Note

The canonical guardARROW_C_DEVICE_DATA_INTERFACE is meant to avoidduplicate definitions if two projects copy the definitions in their ownheaders, and a third-party project includes from these two projects. Itis therefore important that this guard is kept exactly as-is when thesedefinitions are copied.

ArrowDeviceType#

TheArrowDeviceType typedef is used to indicate what type of device theprovided memory buffers were allocated on. This, in conjunction with thedevice_id, should be sufficient to reference the correct data buffers.

We then use macros to define values for different device types. The providedmacro values are compatible with the widely useddlpackDLDeviceTypedefinition values, using the same value for each as the equivalentkDL<type> enum fromdlpack.h. The list will be kept in sync with thoseequivalent enum values over time to ensure compatibility, rather thanpotentially diverging. To avoid the Arrow project having to be in theposition of vetting new hardware devices, new additions should first beadded to dlpack before we add a corresponding macro here.

To ensure predictability with the ABI, we use macros instead of anenumso the storage type is not compiler dependent.

ARROW_DEVICE_CPU#

CPU Device, equivalent to just usingArrowArray directly instead ofusingArrowDeviceArray.

ARROW_DEVICE_CUDA#

ACUDA GPU Device. This could represent data allocated either with theruntime library (cudaMalloc) or the device driver (cuMemAlloc).

ARROW_DEVICE_CUDA_HOST#

CPU memory that was pinned and page-locked by CUDA by usingcudaMallocHost orcuMemAllocHost.

ARROW_DEVICE_OPENCL#

Data allocated on the device by using theOpenCL (Open Computing Language)framework.

ARROW_DEVICE_VULKAN#

Data allocated by theVulkan framework and libraries.

ARROW_DEVICE_METAL#

Data on Apple GPU devices using theMetal framework and libraries.

ARROW_DEVICE_VPI#

Indicates usage of a Verilog simulator buffer.

ARROW_DEVICE_ROCM#

An AMD device using theROCm stack.

ARROW_DEVICE_ROCM_HOST#

CPU memory that was pinned and page-locked by ROCm by usinghipMallocHost.

ARROW_DEVICE_EXT_DEV#

This value is an escape-hatch for devices to extend which aren’tcurrently represented otherwise. Producers would need to provideadditional information/context specific to the device if usingthis device type. This is used to quickly test extension devicesand semantics can differ based on the implementation.

ARROW_DEVICE_CUDA_MANAGED#

CUDA managed/unified memory which is allocated bycudaMallocManaged.

ARROW_DEVICE_ONEAPI#

Unified shared memory allocated on an InteloneAPI non-partitioneddevice. A call to theoneAPI runtime is required to determinethe specific device type, the USM allocation type and the sycl contextthat it is bound to.

ARROW_DEVICE_WEBGPU#

GPU support for next-gen WebGPU standards

ARROW_DEVICE_HEXAGON#

Data allocated on a Qualcomm Hexagon DSP device.

The ArrowDeviceArray structure#

TheArrowDeviceArray structure embeds the C dataArrowArray structureand adds additional information necessary for consumers to use the data. Ithas the following fields:

structArrowArrayArrowDeviceArray.array#

Mandatory. The allocated array data. The values in thevoid** buffers (alongwith the buffers of any children) are what is allocated on the device.The buffer values should be device pointers. The rest of the structureshould be accessible to the CPU.

Theprivate_data andrelease callback of this structure shouldcontain any necessary information and structures related to freeingthe array according to the device it is allocated on, rather thanhaving a separate release callback andprivate_data pointer here.

int64_tArrowDeviceArray.device_id#

Mandatory. The device id to identify a specific device if multiple devices of thistype are on the system. The semantics of the id will be hardware dependent,but we use anint64_t to future-proof the id as devices change over time.

For device types that do not have an intrinsic notion of a device identifier (e.g.,ARROW_DEVICE_CPU), it is recommended to use adevice_id of -1 as aconvention.

ArrowDeviceTypeArrowDeviceArray.device_type#

Mandatory. The type of the device which can access the buffers in the array.

void*ArrowDeviceArray.sync_event#

Optional. An event-like object to synchronize on if needed.

Many devices, like GPUs, are primarily asynchronous with respect toCPU processing. As such, in order to safely access device memory, it is oftennecessary to have an object to synchronize processing with. Sincedifferent devices will use different types to specify this, we use avoid* which can be coerced into a pointer to whatever the deviceappropriate type is.

If synchronization is not needed, this can be null. If this is non-nullthen it MUST be used to call the appropriate sync method for the device(e.g.cudaStreamWaitEvent orhipStreamWaitEvent) before attemptingto access the memory in the buffers.

If an event is provided, then the producer MUST ensure that the exporteddata is available on the device before the event is triggered. Theconsumer SHOULD wait on the event before trying to access the exporteddata.

See also

Thesynchronization event typessection below.

int64_tArrowDeviceArray.reserved[3]#

As non-CPU development expands, there may be a need to expand thisstructure. In order to do so without potentially breaking ABI changes,we reserve 24 bytes at the end of the object. These bytes MUST be zero’dout after initialization by the producer in order to ensure safeevolution of the ABI in the future.

Synchronization event types#

The table below lists the expected event types for each device type.If no event type is supported (“N/A”), then thesync_event membershould always be null.

Remember that the eventCAN be null if synchronization is not neededto access the data.

Device Type

Actual Event Type

Notes

ARROW_DEVICE_CPU

N/A

ARROW_DEVICE_CUDA

cudaEvent_t*

ARROW_DEVICE_CUDA_HOST

cudaEvent_t*

ARROW_DEVICE_OPENCL

cl_event*

ARROW_DEVICE_VULKAN

VkEvent*

ARROW_DEVICE_METAL

MTLEvent*

ARROW_DEVICE_VPI

N/A

ARROW_DEVICE_ROCM

hipEvent_t*

ARROW_DEVICE_ROCM_HOST

hipEvent_t*

ARROW_DEVICE_EXT_DEV

ARROW_DEVICE_CUDA_MANAGED

cudaEvent_t*

ARROW_DEVICE_ONEAPI

sycl::event*

ARROW_DEVICE_WEBGPU

N/A

ARROW_DEVICE_HEXAGON

N/A

Notes:

  • (1) Currently unknown if framework has an event type to support.

  • (2) Extension Device has producer defined semantics and thus ifsynchronization is needed for an extension device, the producershould document the type.

Semantics#

Memory management#

First and foremost: Out of everything in this interface, it isonly thedata buffers themselves which reside in device memory (i.e. thebuffersmember of theArrowArray struct). Everything else should be in CPUmemory.

TheArrowDeviceArray structure contains anArrowArray object whichitself hasspecific semantics for releasingmemory. The term“base structure” below refers to theArrowDeviceArrayobject that is passed directly between the producer and consumer – not anychild structure thereof.

It is intended for the base structure to be stack- or heap-allocated by theconsumer. In this case, the producer API should take a pointer to theconsumer-allocated structure.

However, any data pointed to by the struct MUST be allocated and maintainedby the producer. This includes thesync_event member if it is not null,along with any pointers in theArrowArray object as usual. Data lifetimeis managed through therelease callback of theArrowArray member.

For anArrowDeviceArray, the semantics of a released structure and thecallback semantics are identical to those forArrowArray itself. Any producer specific contextinformation necessary for releasing the device data buffers, in addition toany allocated event, should be stored in theprivate_data member oftheArrowArray and managed by therelease callback.

Moving an array#

The consumer canmove theArrowDeviceArray structure by bitwise copyingor shallow member-wise copying. Then it MUST mark the source structure releasedby setting therelease member of the embeddedArrowArray structure toNULL, butwithout calling that release callback. This ensures that onlyone live copy of the struct is active at any given time and that lifetime iscorrectly communicated to the producer.

As usual, the release callback will be called on the destination structurewhen it is not needed anymore.

Record batches#

As with the C data interface itself, a record batch can be trivially consideredas an equivalent struct array. In this case the metadata of the top-levelArrowSchema can be used for schema-level metadata of the record batch.

Mutability#

Both the producer and the consumer SHOULD consider the exported data (thatis, the data reachable on the device through thebuffers member ofthe embeddedArrowArray) to be immutable, as either party could otherwisesee inconsistent data while the other is mutating it.

Synchronization#

If thesync_event member is non-NULL, the consumer should not attemptto access or read the data until they have synchronized on that event. Ifthesync_event member is NULL, then it MUST be safe to access the datawithout any synchronization necessary on the part of the consumer.

C producer example#

Exporting a simpleint32 device array#

Export a non-nullableint32 type with empty metadata. An example of thiscan be seen in theC data interface docs directly.

To export the data itself, we transfer ownership to the consumer throughthe release callback. This example will use CUDA, but the equivalent callscould be used for any device:

staticvoidrelease_int32_device_array(structArrowArray*array){assert(array->n_buffers==2);// destroy the eventcudaEvent_t*ev_ptr=(cudaEvent_t*)(array->private_data);cudaError_tstatus=cudaEventDestroy(*ev_ptr);assert(status==cudaSuccess);free(ev_ptr);// free the buffers and the buffers arraystatus=cudaFree(array->buffers[1]);assert(status==cudaSuccess);free(array->buffers);// mark releasedarray->release=NULL;}voidexport_int32_device_array(void*cudaAllocedPtr,cudaStream_tstream,int64_tlength,structArrowDeviceArray*array){// get device idintdevice;cudaError_tstatus;status=cudaGetDevice(&device);assert(status==cudaSuccess);cudaEvent_t*ev_ptr=(cudaEvent_t*)malloc(sizeof(cudaEvent_t));assert(ev_ptr!=NULL);status=cudaEventCreate(ev_ptr);assert(status==cudaSuccess);// record event on the stream, assuming that the passed in// stream is where the work to produce the data will be processing.status=cudaEventRecord(*ev_ptr,stream);assert(status==cudaSuccess);memset(array,0,sizeof(structArrowDeviceArray));// initialize fields*array=(structArrowDeviceArray){.array=(structArrowArray){.length=length,.null_count=0,.offset=0,.n_buffers=2,.n_children=0,.children=NULL,.dictionary=NULL,// bookkeeping.release=&release_int32_device_array,// store the event pointer as private data in the array// so that we can access it in the release callback..private_data=(void*)(ev_ptr),},.device_id=(int64_t)(device),.device_type=ARROW_DEVICE_CUDA,// pass the event pointer to the consumer.sync_event=(void*)(ev_ptr),};// allocate list of buffersarray->array.buffers=(constvoid**)malloc(sizeof(void*)*array->array.n_buffers);assert(array->array.buffers!=NULL);array->array.buffers[0]=NULL;array->array.buffers[1]=cudaAllocedPtr;}// calling the release callback should be done using the array member// of the device array.staticvoidrelease_device_array_helper(structArrowDeviceArray*arr){arr->array.release(&arr->array);}

Device Stream Interface#

Like theC stream interface, the C Device datainterface also specifies a higher-level structure for easing communicationof streaming data within a single process.

Semantics#

An Arrow C device stream exposes a streaming source of data chunks, each withthe same schema. Chunks are obtained by calling a blocking pull-style iterationfunction. It is expected that all chunks should be providing data on the samedevice type (but not necessarily the same device id). If it is necessaryto provide a stream of data on multiple device types, a producer shouldprovide a separate stream object for each device type.

Structure definition#

The C device stream interface is defined by a singlestruct definition:

#ifndef ARROW_C_DEVICE_STREAM_INTERFACE#define ARROW_C_DEVICE_STREAM_INTERFACEstructArrowDeviceArrayStream{// device type that all arrays will be accessible fromArrowDeviceTypedevice_type;// callbacksint(*get_schema)(structArrowDeviceArrayStream*,structArrowSchema*);int(*get_next)(structArrowDeviceArrayStream*,structArrowDeviceArray*);constchar*(*get_last_error)(structArrowDeviceArrayStream*);// release callbackvoid(*release)(structArrowDeviceArrayStream*);// opaque producer-specific datavoid*private_data;};#endif// ARROW_C_DEVICE_STREAM_INTERFACE

Note

The canonical guardARROW_C_DEVICE_STREAM_INTERFACE is meant to avoidduplicate definitions if two projects copy the C device stream interfacedefinitions into their own headers, and a third-party project includesfrom these two projects. It is therefore important that this guard iskept exactly as-is when these definitions are copied.

The ArrowDeviceArrayStream structure#

TheArrowDeviceArrayStream provides a device type that can access theresulting data along with the required callbacks to interact with astreaming source of Arrow arrays. It has the following fields:

ArrowDeviceTypedevice_type#

Mandatory. The device type that this stream produces data on. AllArrowDeviceArray s that are produced by this stream should have thesame device type as is set here. This is a convenience for the consumerto not have to check every array that is retrieved and instead allowshigher-level coding constructs for streams.

int(*ArrowDeviceArrayStream.get_schema)(structArrowDeviceArrayStream*,structArrowSchema*out)#

Mandatory. This callback allows the consumer to query the schema ofthe chunks of data in the stream. The schema is the same for all datachunks.

This callback must NOT be called on a releasedArrowDeviceArrayStream.

Return value: 0 on success, a non-zeroerror code otherwise.

int(*ArrowDeviceArrayStream.get_next)(structArrowDeviceArrayStream*,structArrowDeviceArray*out)#

Mandatory. This callback allows the consumer to get the next chunk ofdata in the stream.

This callback must NOT be called on a releasedArrowDeviceArrayStream.

The next chunk of data MUST be accessible from a device type matching theArrowDeviceArrayStream.device_type.

Return value: 0 on success, a non-zeroerror code otherwise.

On success, the consumer must check whether theArrowDeviceArray’sembeddedArrowArray is markedreleased.If the embeddedArrowDeviceArray.array is released, then the end of thestream has been reached. Otherwise, theArrowDeviceArray contains avalid data chunk.

constchar*(*ArrowDeviceArrayStream.get_last_error)(structArrowDeviceArrayStream*)#

Mandatory. This callback allows the consumer to get a textual descriptionof the last error.

This callback must ONLY be called if the last operation on theArrowDeviceArrayStream returned an error. It must NOT be called on areleasedArrowDeviceArrayStream.

Return value: a pointer to a NULL-terminated character string(UTF8-encoded). NULL can also be returned if no detailed description isavailable.

The returned pointer is only guaranteed to be valid until the next callof one of the stream’s callbacks. The character string it points to shouldbe copied to consumer-managed storage if it is intended to survive longer.

void(*ArrowDeviceArrayStream.release)(structArrowDeviceArrayStream*)#

Mandatory. A pointer to a producer-provided release callback.

void*ArrowDeviceArrayStream.private_data#

Optional. An opaque pointer to producer-provided private data.

Consumers MUST NOT process this member. Lifetime of this member ishandled by the producer, and especially by the release callback.

Result lifetimes#

The data returned by theget_schema andget_next callbacks must bereleased independently. Their lifetimes are not tied to that ofArrowDeviceArrayStream.

Stream lifetime#

Lifetime of the C stream is managed using a release callback with similarusage as inC data interface.

Thread safety#

The stream source is not assumed to be thread-safe. Consumers wanting tocallget_next from several threads should ensure those calls areserialized.

Async Device Stream Interface#

Warning

Experimental: The Async C Device Stream interface is experimental in its currentform. Based on feedback and usage the protocol definition may change untilit is fully standardized.

TheC stream interface provides a synchronousAPI centered around the consumer calling the producer functions to retrievethe next record batch. For concurrent communication between producer and consumer,theArrowAsyncDeviceStreamHandler can be used. This interface is non-opinionatedand may fit into different concurrent communication models.

Semantics#

Rather than the producer providing a structure of callbacks for a consumer tocall and retrieve records, the Async interface is a structure allocated and populated by the consumer.The consumer allocated struct provides handler callbacks for the producer to callwhen the schema and chunks of data are available.

In addition to theArrowAsyncDeviceStreamHandler, there are also two additionalstructs used for the full data flow:ArrowAsyncTask andArrowAsyncProducer.

Structure Definition#

The C device async stream interface consists of threestruct definitions:

#ifndef ARROW_C_ASYNC_STREAM_INTERFACE#define ARROW_C_ASYNC_STREAM_INTERFACEstructArrowAsyncTask{int(*extract_data)(structArrowArrayTask*self,structArrowDeviceArray*out);void*private_data;};structArrowAsyncProducer{ArrowDeviceTypedevice_type;void(*request)(structArrowAsyncProducer*self,int64_tn);void(*cancel)(structArrowAsyncProducer*self);void(*release)(structArrowAsyncProducer*self);constchar*additional_metadata;void*private_data;};structArrowAsyncDeviceStreamHandler{// consumer-specific handlersint(*on_schema)(structArrowAsyncDeviceStreamHandler*self,structArrowSchema*stream_schema);int(*on_next_task)(structArrowAsyncDeviceStreamHandler*self,structArrowAsyncTask*task,constchar*metadata);void(*on_error)(structArrowAsyncDeviceStreamHandler*self,intcode,constchar*message,constchar*metadata);// release callbackvoid(*release)(structArrowAsyncDeviceStreamHandler*self);// must be populated before calling any callbacksstructArrowAsyncProducer*producer;// opaque handler-specific datavoid*private_data;};#endif// ARROW_C_ASYNC_STREAM_INTERFACE

Note

The canonical guardARROW_C_ASYNC_STREAM_INTERFACE is meant to avoidduplicate definitions if two projects copy the C async stream interfacedefinitions into their own headers, and a third-party project includesfrom these two projects. It is therefore important that this guard is keptexactly as-is when these definitions are copied.

The ArrowAsyncDeviceStreamHandler structure#

The structure has the following fields:

int(*ArrowAsyncDeviceStreamHandler.on_schema)(structArrowAsyncDeviceStreamHandler*,structArrowSchema*)#

Mandatory. Handler for receiving the schema of the stream. All incoming records shouldmatch the provided schema. If successful, the function should return 0, otherwiseit should return anerrno-compatible error code.

If there is any extra contextual information that the producer wants to provide, it can setArrowAsyncProducer.additional_metadata to a non-NULL value. This is encoded in thesame format asArrowSchema.metadata. The lifetime of this metadata, if notNULL,should be tied to the lifetime of theArrowAsyncProducer object.

Unless theon_error handler is called, this will always get called exactly once and will bethe first method called on this object. As such the producerMUST populate theArrowAsyncProducermember before calling this function to allow the consumer to apply back-pressure and control the flow of data.The producer maintains ownership of theArrowAsyncProducer and must clean it upaftercalling the release callback on theArrowAsyncDeviceStreamHandler.

A producer that receives a non-zero result here must not subsequently call anything other thanthe release callback on this object.

int(*ArrowAsyncDeviceStreamHandler.on_next_task)(structArrowAsyncDeviceStreamHandler*,structArrowAsyncTask*,constchar*)#

Mandatory. Handler to be called when a new record is available for processing. Theschema for each record should be the same as the schema thaton_schema was called with.If successfully handled, the function should return 0, otherwise it should return anerrno-compatible error code.

Rather than passing the record itself it receives anArrowAsyncTask instead to facilitatebetter consumer-focused thread control as far as receiving the data. A call to this functionsimply indicates that data is available via the provided task.

The producer signals the end of the stream by passingNULL for theArrowAsyncTaskpointer instead of a valid address. This task object is only valid during the lifetime ofthis function call. If the consumer wants to use the task beyond the scope of this method, itmust copy or move its contents to a new ArrowAsyncTask object.

Theconstchar* parameter exists for producers to provide any extra contextual informationthey want. This is encoded in the same format asArrowSchema.metadata. If notNULL,the lifetime is only the scope of the call to this function. A consumer who wants to maintainthe additional metadata beyond the lifetime of this callMUST copy the value themselves.

A producerMUST NOT call this concurrently from multiple threads.

TheArrowAsyncProducer.request callback must be called to start receiving calls to thishandler.

void(*ArrowAsyncDeviceStreamHandler.on_error)(structArrowAsyncDeviceStreamHandler,int,constchar*,constchar*)#

Mandatory. Handler to be called when an error is encountered by the producer. After callingthis, therelease callback will be called as the last call on this struct. The parametersare anerrno-compatible error code and an optional error message and metadata.

If the message and metadata are notNULL, their lifetime is only valid during the scopeof this call. A consumer who wants to maintain these values past the return of this functionMUST copy the values themselves.

If the metadata parameter is notNULL, to provide key-value error metadata, then it shouldbe encoded identically to the way that metadata is encoded inArrowSchema.metadata.

It is valid for this to be called by a producer with or without a preceding call toArrowAsyncProducer.request. This callbackMUST NOT call any methods of anArrowAsyncProducer object.

void(*ArrowAsyncDeviceStreamHandler.release)(structArrowAsyncDeviceStreamHandler*)#

Mandatory. A pointer to a consumer-provided release callback for the handler.

It is valid for this to be called by a producer with or without a preceding call toArrowAsyncProducer.request. This must not call any methods of anArrowAsyncProducerobject.

structArrowAsyncProducerArrowAsyncDeviceStreamHandler.producer#

Mandatory. The producer object that the consumer will use to request additional data or cancel.

This objectMUST be populated by the producer before calling theArrowAsyncDeviceStreamHandler.on_schemacallback. The producer maintains ownership of this object and must clean it upafter callingthe release callback on theArrowAsyncDeviceStreamHandler.

The consumerCANNOT assume that this is valid until theon_schema callback is called.

void*ArrowAsyncDeviceStreamHandler.private_data#

Optional. An opaque pointer to consumer-provided private data.

ProducersMUST NOT process this member. Lifetime of this member is handled bythe consumer, and especially by the release callback.

The ArrowAsyncTask structure#

The purpose of using a Task object rather than passing the array directly to theon_nextcallback is to allow for more complex and efficient thread handling. Utilizing a Taskobject allows for a producer to separate the “decoding” logic from the I/O, enabling aconsumer to avoid transferring data between CPU cores (e.g. from one L1/L2 cache to another).

This producer-provided structure has the following fields:

int(*ArrowArrayTask.extract_data)(structArrowArrayTask*,structArrowDeviceArray*)#

Mandatory. A callback to populate the providedArrowDeviceArray with the available data.The order ofArrowAsyncTasks provided by the producer enables a consumer to know the order ofthe data to process. If the consumer does not care about the data that is owned by this task,it must still callextract_data so that the producer can perform any required cleanup.NULLshould be passed as the device array pointer to indicate that the consumer doesn’t want theactual data, letting the task perform necessary cleanup.

If a non-zero value is returned from this, it should be followed only by the producer callingtheon_error callback of theArrowAsyncDeviceStreamHandler. Because calling this methodis likely to be separate from the current control flow, returning a non-zero value to signalan error occurring allows the current thread to decide handle the case accordingly, while stillallowing all error logging and handling to be centralized in theArrowAsyncDeviceStreamHandler.on_error callback.

Rather than having a separate release callback, any required cleanup should be performed as partof the invocation of this callback. Ownership of the Array is given to the pointer passed in asa parameter, and this array must be released separately.

It is only valid to call this method exactly once.

void*ArrowArrayTask.private_data#

Optional. An opaque pointer to producer-provided private data.

ConsumersMUST NOT process this member. Lifetime of this member is handled bythe producer who created this object, and should be cleaned up if necessary duringthe call toArrowArrayTask.extract_data.

The ArrowAsyncProducer structure#

This producer-provided and managed object has the following fields:

ArrowDeviceTypeArrowAsyncProducer.device_type#

Mandatory. The device type that this producer will provide data on. AllArrowDeviceArray structs that are produced by this producer should have thesame device type as is set here.

void(*ArrowAsyncProducer.request)(structArrowAsyncProducer*,uint64_t)#

Mandatory. This function must be called by a consumer to start receiving calls toArrowAsyncDeviceStreamHandler.on_next_task. ItMUST be valid to callthis synchronously from withinArrowAsyncDeviceStreamHandler.on_next_taskorArrowAsyncDeviceStreamHandler.on_schema. As a result, this functionMUST NOT synchronously callon_next_task oron_error to avoid recursiveand reentrant callbacks.

Aftercancel is called, additional calls to this function must be a NOP, but allowed.

While not cancelled, calling this function registers the given number of additionalarrays/batches to be produced by the producer. A producer should only callthe appropriateon_next_task callback up to a maximum of the total sum of calls tothis method before propagating back-pressure / waiting.

Any error encountered by calling request must be propagated by calling theon_errorcallback of theArrowAsyncDeviceStreamHandler.

It is invalid to call this function with a value ofn that is<=0. Producers shoulderror (e.g. callon_error) if receiving such a value forn.

void(*ArrowAsyncProducer.cancel)(structArrowAsyncProducer*)#

Mandatory. This function signals to the producer that it musteventually stop callingon_next_task. Calls tocancel must be idempotent and thread-safe. After callingit once, subsequent callsMUST be a NOP. ThisMUST NOT call any consumer-side handlersother thanon_error.

It is not required that callingcancel affect the producerimmediately, only that itmust eventually stop callingon_next_task and then subsequently callreleaseon the async handler object. As such, a consumerMUST be prepared to receive one or morecalls toon_next_task oron_error even after callingcancel if there are stillrequested arrays pending.

Successful cancellingMUST NOT result in a producer callingArrowAsyncDeviceStreamHandler.on_error, instead it should finish out any remainingtasks (callingon_next_task accordingly) and eventually just callrelease.

Any error encountered during handling a call to cancel must be reported via theon_errorcallback on the async stream handler.

constchar*ArrowAsyncProducer.additional_metadata#

Optional. An additional metadata string to provide any extra context to the consumer. ThisMUSTeither beNULL or a valid string that is encoded in the same way asArrowSchema.metadata.As an example, a producer could utilize this metadata to provide the total number of rows and/or batchesin the stream if known.

If notNULL itMUST be valid for at least the lifetime of this object.

void*ArrowAsyncProducer.private_data#

Optional. An opaque pointer to producer-provided specific data.

ConsumersMUST NOT process this member, the lifetime is owned by the producerthat constructed this object.

Error Handling#

Unlike the regular C Stream interface, the Async interface allows for errors to flow inboth directions. As a result, error handling can be slightly more complex. Thus this specdesignates the following rules:

  • If the producer encounters an error during processing, it should call theon_errorcallback, and then callrelease after it returns.

  • Ifon_schema oron_next_task returns a non-zero integer value, the producershould notcall theon_error callback, but instead should eventually callrelease at some pointbefore or after any logging or processing of the error code.

Result lifetimes#

TheArrowSchema passed to theon_schema callback must be released independently,with the object itself needing to be moved to a consumer ownedArrowSchema object. TheArrowSchema* passed as a parameter to the callbackMUST NOT be stored and kept.

TheArrowAsyncTask object provided toon_next_task is owned by the producer andwill be cleaned up during the invocation of callingextract_data on it. If the consumerdoesn’t care about the data, it should passNULL instead of a validArrowDeviceArray*.

Theconstchar* errormessage andmetadata which are passed toon_errorare only valid within the scope of theon_error function itself. They must be copiedif it is necessary for them to exist after it returns.

Stream Handler Lifetime#

Lifetime of the async stream handler is managed using a release callback with similarusage as inC data interface.

ArrowAsyncProducer Lifetime#

The lifetime of theArrowAsyncProducer is owned by the producer itself and shouldbe managed by it. ItMUST be populated before calling any methods other thanreleaseandMUST remain valid at least until just before callingrelease on the stream handler object.

Thread safety#

All handler functions on theArrowAsyncDeviceStreamHandler should only be called in aserialized manner, but are not guaranteed to be called from the same thread every time. Aproducer should wait for handler callbacks to return before calling the next handler callback,and before calling therelease callback.

Back-pressure is managed by the consumer making calls toArrowAsyncProducer.requestto indicate how many arrays it is ready to receive.

TheArrowAsyncDeviceStreamHandler object should be able to handle callbacks as soon asit is passed to the producer, any initialization should be performed before it is provided.

Possible Sequence Diagram#

sequenceDiagram Consumer->>+Producer: ArrowAsyncDeviceStreamHandler* Producer-->>+Consumer: on_schema(ArrowAsyncProducer*, ArrowSchema*) Consumer->>Producer: ArrowAsyncProducer->request(n) par loop up to n times Producer-->>Consumer: on_next_task(ArrowAsyncTask*) end and for each task Consumer-->>Producer: ArrowAsyncTask.extract_data(...) Consumer-->>Producer: ArrowAsyncProducer->request(1) end break Optionally Consumer->>-Producer: ArrowAsyncProducer->cancel() end loop possible remaining Producer-->>Consumer: on_next_task(ArrowAsyncTask*) end Producer->>-Consumer: ArrowAsyncDeviceStreamHandler->release()

Interoperability with other interchange formats#

Other interchange APIs, such as theCUDA Array Interface, includemembers to pass the shape and the data types of the data buffers beingexported. This information is necessary to interpret the raw bytes in thedevice data buffers that are being shared. Rather than store theshape / types of the data alongside theArrowDeviceArray, usersshould utilize the existingArrowSchema structure to pass any datatype and shape information.

Updating this specification#

Note

Since this specification is still considered experimental, there is the(still very low) possibility it might change slightly. The reason fortagging this as “experimental” is because we don’t know what we don’t know.Work and research was done to ensure a generic ABI compatible with manydifferent frameworks, but it is always possible something was missed.Once this is supported in an official Arrow release and usage is observedto confirm there aren’t any modifications necessary, the “experimental”tag will be removed and the ABI frozen.

Once this specification is supported in an official Arrow release, the C ABIis frozen. This means that theArrowDeviceArray structure definitionshould not change in any way – including adding new members.

Backwards-compatible changes are allowed, for example new macro values forArrowDeviceType or converting the reserved 24 bytes into adifferent type/member without changing the size of the structure.

Any incompatible changes should be part of a new specification, for exampleArrowDeviceArrayV2.