Data Types#
See also
Data types govern how physical data is interpreted. Theirspecification allows binary interoperability between different Arrowimplementations, including from different programming languages and runtimes(for example it is possible to access the same data, without copying, fromboth Python and Java using thepyarrow.jvm bridge module).
Information about a data type in C++ can be represented in three ways:
Using a
arrow::DataTypeinstance (e.g. as a function argument)Using a
arrow::DataTypeconcrete subclass (e.g. as a templateparameter)Using a
arrow::Type::typeenum value (e.g. as the condition ofa switch statement)
The first form (using aarrow::DataType instance) is the most idiomaticand flexible. Runtime-parametric types can only be fully represented witha DataType instance. For example, aarrow::TimestampType needs to beconstructed at runtime with aarrow::TimeUnit::type parameter; aarrow::Decimal128Type withscale andprecision parameters;aarrow::ListType with a full child type (itself aarrow::DataType instance).
The two other forms can be used where performance is critical, in order toavoid paying the price of dynamic typing and polymorphism. However, someamount of runtime switching can still be required for parametric types.It is not possible to reify all possible types at compile time, since Arrowdata types allows arbitrary nesting.
Creating data types#
To instantiate data types, it is recommended to call the providedfactory functions:
std::shared_ptr<arrow::DataType>type;// A 16-bit integer typetype=arrow::int16();// A 64-bit timestamp type (with microsecond granularity)type=arrow::timestamp(arrow::TimeUnit::MICRO);// A list type of single-precision floating-point valuestype=arrow::list(arrow::float32());
Type Traits#
Writing code that can handle concretearrow::DataType subclasses wouldbe verbose, if it weren’t for type traits. Arrow’s type traits map the Arrowdata types to the specialized array, scalar, builder, and other associated types.For example, the Boolean type has traits:
template<>structTypeTraits<BooleanType>{usingArrayType=BooleanArray;usingBuilderType=BooleanBuilder;usingScalarType=BooleanScalar;usingCType=bool;staticconstexprint64_tbytes_required(int64_telements){returnbit_util::BytesForBits(elements);}constexprstaticboolis_parameter_free=true;staticinlinestd::shared_ptr<DataType>type_singleton(){returnboolean();}};
See theType Traits for an explanation of each of these fields.
Using type traits, one can write template functions that can handle a varietyof Arrow types. For example, to write a function that creates an array ofFibonacci values for any Arrow numeric type:
template<typenameDataType,typenameBuilderType=typenamearrow::TypeTraits<DataType>::BuilderType,typenameArrayType=typenamearrow::TypeTraits<DataType>::ArrayType,typenameCType=typenamearrow::TypeTraits<DataType>::CType>arrow::Result<std::shared_ptr<ArrayType>>MakeFibonacci(int32_tn){BuilderTypebuilder;CTypeval=0;CTypenext_val=1;for(int32_ti=0;i<n;++i){builder.Append(val);CTypetemp=val+next_val;val=next_val;next_val=temp;}std::shared_ptr<ArrayType>out;ARROW_RETURN_NOT_OK(builder.Finish(&out));returnout;}
For some common cases, there are type associations on the classes themselves. Use:
Scalar::TypeClassto get data type class of a scalarArray::TypeClassto get data type class of an arrayDataType::c_typeto get associated C type of an Arrow data type
Similar to the type traits provided instd::type_traits,Arrow provides type predicates such asis_number_type as well ascorresponding templates that wrapstd::enable_if_t such asenable_if_number.These can constrain template functions to only compile for relevant types, whichis useful if other overloads need to be implemented. For example, to write a sumfunction for any numeric (integer or float) array:
template<typenameArrayType,typenameDataType=typenameArrayType::TypeClass,typenameCType=typenameDataType::c_type>arrow::enable_if_number<DataType,CType>SumArray(constArrayType&array){CTypesum=0;for(std::optional<CType>value:array){if(value.has_value()){sum+=value.value();}}returnsum;}
SeeType Predicates for a list of these.
Visitor Pattern#
In order to processarrow::DataType,arrow::Scalar, orarrow::Array, you may need to write logic that specializes basedon the particular Arrow type. In these cases, use thevisitor pattern. Arrow providesthe template functions:
To use these, implementStatusVisit() methods for each specialized type, thenpass the class instance to the inline visit function. To avoid repetitive code,use type traits as documented in the previous section. As a brief example,here is how one might sum across columns of arbitrary numeric types:
classTableSummation{doublepartial=0.0;public:arrow::Result<double>Compute(std::shared_ptr<arrow::RecordBatch>batch){for(std::shared_ptr<arrow::Array>array:batch->columns()){ARROW_RETURN_NOT_OK(arrow::VisitArrayInline(*array,this));}returnpartial;}// Default implementationarrow::StatusVisit(constarrow::Array&array){returnarrow::Status::NotImplemented("Cannot compute sum for array of type ",array.type()->ToString());}template<typenameArrayType,typenameT=typenameArrayType::TypeClass>arrow::enable_if_number<T,arrow::Status>Visit(constArrayType&array){for(std::optional<typenameT::c_type>value:array){if(value.has_value()){partial+=static_cast<double>(value.value());}}returnarrow::Status::OK();}};
Arrow also provides abstract visitor classes (arrow::TypeVisitor,arrow::ScalarVisitor,arrow::ArrayVisitor) and anAccept()method on each of the corresponding base types (e.g.arrow::Array::Accept()).However, these are not able to be implemented using template functions, so youwill typically prefer using the inline type visitors.

