High-Level Overview#
The Arrow C++ library is comprised of different parts, each of which servesa specific purpose.
The physical layer#
Memory management abstractions provide a uniform API over memory thatmay be allocated through various means, such as heap allocation, the memorymapping of a file or a static memory area. In particular, thebufferabstraction represents a contiguous area of physical data.
The one-dimensional layer#
Data types govern thelogical interpretation ofphysical data.Many operations in Arrow are parameterized, at compile-time or at runtime,by a data type.
Arrays assemble one or several buffers with a data type, allowing toview them as a logical contiguous sequence of values (possibly nested).
Chunked arrays are a generalization of arrays, comprising several same-typearrays into a longer logical sequence of values.
The two-dimensional layer#
Schemas describe a logical collection of several pieces of data,each with a distinct name and type, and optional metadata.
Tables are collections of chunked array in accordance to a schema. Theyare the most capable dataset-providing abstraction in Arrow.
Record batches are collections of contiguous arrays, describedby a schema. They allow incremental construction or serialization of tables.
The compute layer#
Datums are flexible dataset references, able to hold for example an array or tablereference.
Kernels are specialized computation functions running in a loop over agiven set of datums representing input and output parameters to the functions.
Acero (pronounced [aˈsɜɹo] / ah-SERR-oh) is a streaming execution engine that allowscomputation to be expressed as a graph of operators which can transform streams of data.
The IO layer#
Streams allow untyped sequential or seekable access over external dataof various kinds (for example compressed or memory-mapped).
The Inter-Process Communication (IPC) layer#
Amessaging format allows interchange of Arrow data between processes, usingas few copies as possible.
The file formats layer#
Reading and writing Arrow data from/to various file formats is possible, forexampleParquet,CSV,Orc or the Arrow-specificFeather format.
The devices layer#
BasicCUDA integration is provided, allowing to describe Arrow data backedby GPU-allocated memory.
The filesystem layer#
A filesystem abstraction allows reading and writing data from different storagebackends, such as the local filesystem or a S3 bucket.

