The Arrow C stream interface#
The C stream interface builds on the structures defined in theC data interface and combines them into a higher-levelspecification so as to ease the communication of streaming data within a singleprocess.
Semantics#
An Arrow C stream exposes a streaming source of data chunks, each with thesame schema. Chunks are obtained by calling a blocking pull-style iterationfunction.
Structure definition#
The C stream interface is defined by a singlestruct definition:
#ifndef ARROW_C_STREAM_INTERFACE#define ARROW_C_STREAM_INTERFACEstructArrowArrayStream{// Callbacks providing stream functionalityint(*get_schema)(structArrowArrayStream*,structArrowSchema*out);int(*get_next)(structArrowArrayStream*,structArrowArray*out);constchar*(*get_last_error)(structArrowArrayStream*);// Release callbackvoid(*release)(structArrowArrayStream*);// Opaque producer-specific datavoid*private_data;};#endif// ARROW_C_STREAM_INTERFACE
Note
The canonical guardARROW_C_STREAM_INTERFACE is meant to avoidduplicate definitions if two projects copy the C data interfacedefinitions in their own headers, and a third-party projectincludes from these two projects. It is therefore important thatthis guard is kept exactly as-is when these definitions are copied.
The ArrowArrayStream structure#
TheArrowArrayStream provides the required callbacks to interact with astreaming source of Arrow arrays. It has the following fields:
- int(*ArrowArrayStream.get_schema)(structArrowArrayStream*,structArrowSchema*out)#
Mandatory. This callback allows the consumer to query the schema ofthe chunks of data in the stream. The schema is the same for alldata chunks.
This callback must NOT be called on a released
ArrowArrayStream.Return value: 0 on success, a non-zeroerror code otherwise.
- int(*ArrowArrayStream.get_next)(structArrowArrayStream*,structArrowArray*out)#
Mandatory. This callback allows the consumer to get the next chunkof data in the stream.
This callback must NOT be called on a released
ArrowArrayStream.Return value: 0 on success, a non-zeroerror code otherwise.
On success, the consumer must check whether the
ArrowArrayismarkedreleased. If theArrowArrayis released, then the end of stream has been reached.Otherwise, theArrowArraycontains a valid data chunk.
- constchar*(*ArrowArrayStream.get_last_error)(structArrowArrayStream*)#
Mandatory. This callback allows the consumer to get a textual descriptionof the last error.
This callback must ONLY be called if the last operation on the
ArrowArrayStreamreturned an error. It must NOT be called on areleasedArrowArrayStream.Return value: a pointer to a NULL-terminated character string (UTF8-encoded).NULL can also be returned if no detailed description is available.
The returned pointer is only guaranteed to be valid until the next call ofone of the stream’s callbacks. The character string it points to shouldbe copied to consumer-managed storage if it is intended to survive longer.
- void(*ArrowArrayStream.release)(structArrowArrayStream*)#
Mandatory. A pointer to a producer-provided release callback.
- void*ArrowArrayStream.private_data#
Optional. An opaque pointer to producer-provided private data.
Consumers MUST not process this member. Lifetime of this memberis handled by the producer, and especially by the release callback.
Error codes#
Theget_schema andget_next callbacks may return an error under the formof a non-zero integer code. Such error codes should be interpreted likeerrno numbers (as defined by the local platform). Note that the symbolicforms of these constants are stable from platform to platform, but their numericvalues are platform-specific.
In particular, it is recommended to recognize the following values:
EINVAL: for a parameter or input validation errorENOMEM: for a memory allocation failure (out of memory)EIO: for a generic input/output error
Result lifetimes#
The data returned by theget_schema andget_next callbacks must bereleased independently. Their lifetimes are not tied to that of theArrowArrayStream.
Stream lifetime#
Lifetime of the C stream is managed using a release callback with similarusage as in theC data interface.
Thread safety#
The stream source is not assumed to be thread-safe. Consumers wanting tocallget_next from several threads should ensure those calls areserialized.
C consumer example#
Let’s say a particular database provides the following C API to executea SQL query and return the result set as a Arrow C stream:
voidMyDB_Query(constchar*query,structArrowArrayStream*result_set);
Then a consumer could use the following code to iterate over the results:
staticvoidhandle_error(interrcode,structArrowArrayStream*stream){// Print stream errorconstchar*errdesc=stream->get_last_error(stream);if(errdesc!=NULL){fputs(errdesc,stderr);}else{fputs(strerror(errcode),stderr);}// Release stream and abortstream->release(stream),exit(1);}voidrun_query(){structArrowArrayStreamstream;structArrowSchemaschema;structArrowArraychunk;interrcode;MyDB_Query("SELECT * FROM my_table",&stream);// Query result set schemaerrcode=stream.get_schema(&stream,&schema);if(errcode!=0){handle_error(errcode,&stream);}int64_tnum_rows=0;// Iterate over results: loop until error or end of streamwhile((errcode=stream.get_next(&stream,&chunk)==0)&&chunk.release!=NULL){// Do something with chunk...fprintf(stderr,"Result chunk: got %lld rows\n",chunk.length);num_rows+=chunk.length;// Release chunkchunk.release(&chunk);}// Was it an error?if(errcode!=0){handle_error(errcode,&stream);}fprintf(stderr,"Result stream ended: total %lld rows\n",num_rows);// Release schema and streamschema.release(&schema);stream.release(&stream);}

