Tabular Data#
See also
While arrays and chunked arrays represent a one-dimensional sequence ofhomogeneous values, data often comes in the form of two-dimensional sets ofheterogeneous data (such as database tables, CSV files…). Arrow providesseveral abstractions to handle such data conveniently and efficiently.
Fields#
Fields are used to denote the particular columns of a table (and alsothe particular members of a nested data type such asarrow::StructType).A field, i.e. an instance ofarrow::Field, holds together a datatype, a field name and some optional metadata.
The recommended way to create a field is to call thearrow::field()factory function.
Schemas#
A schema describes the overall structure of a two-dimensional dataset suchas a table. It holds a sequence of fields together with some optionalschema-wide metadata (in addition to per-field metadata). The recommendedway to create a schema is to call one thearrow::schema() factoryfunction overloads:
// Create a schema describing datasets with two columns:// a int32 column "A" and a utf8-encoded string column "B"std::shared_ptr<arrow::Field>field_a,field_b;std::shared_ptr<arrow::Schema>schema;field_a=arrow::field("A",arrow::int32());field_b=arrow::field("B",arrow::utf8());schema=arrow::schema({field_a,field_b});
Tables#
Aarrow::Table is a two-dimensional dataset with chunked arrays forcolumns, together with a schema providing field names. Also, each chunkedcolumn must have the same logical length in number of elements (although eachcolumn can be chunked in a different way).
Record Batches#
Aarrow::RecordBatch is a two-dimensional dataset of a number ofcontiguous arrays, each the same length. Like a table, a record batch alsohas a schema which must match its arrays’ datatypes.
Record batches are a convenient unit of work for various serializationand computation functions, possibly incremental.
Record batches can be sent between implementations, such as viaIPC orvia theC Data Interface. Tables andchunked arrays, on the other hand, are concepts in the C++ implementation,not in the Arrow format itself, so they aren’t directly portable.
However, a table can be converted to and built from a sequence of recordbatches easily without needing to copy the underlying array buffers.A table can be streamed as an arbitrary number of record batches usingaarrow::TableBatchReader. Conversely, a logical sequence ofrecord batches can be assembled to form a table using one of thearrow::Table::FromRecordBatches() factory function overloads.

