Arrays#
Base classes#
- classArrayStatistics#
Statistics for anArray.
Apache Arrow format doesn’t have statistics but data source such as Apache Parquet may have statistics. Statistics associated with data source can be read unified API via this class.
Public Types
- usingValueType=std::variant<bool,int64_t,uint64_t,double,std::string>#
The type for maximum and minimum values.
If the target value exists, one of them is used.
std::nulloptis used otherwise.
Public Functions
- inlineconststd::shared_ptr<DataType>&MinArrowType(conststd::shared_ptr<DataType>&array_type)#
Compute Arrow type of the minimum value.
IfValueType is
std::string,array_typemay be used. Ifarray_typeis a binary-like type such asarrow::binary andarrow::large_utf8,array_typeis returned.arrow::utf8 is returned otherwise.IfValueType isn’t
std::string,array_typeisn’t used.- Parameters:
array_type – The Arrow type of the associated array.
- Returns:
arrow::null if the minimum value is
std::nullopt, Arrow type based onValueType of themin otherwise.
- inlineconststd::shared_ptr<DataType>&MaxArrowType(conststd::shared_ptr<DataType>&array_type)#
Compute Arrow type of the maximum value.
IfValueType is
std::string,array_typemay be used. Ifarray_typeis a binary-like type such asarrow::binary andarrow::large_utf8,array_typeis returned.arrow::utf8 is returned otherwise.IfValueType isn’t
std::string,array_typeisn’t used.- Parameters:
array_type – The Arrow type of the associated array.
- Returns:
arrow::null if the maximum value is
std::nullopt, Arrow type based onValueType of themax otherwise.
- inlineboolEquals(constArrayStatistics&other,constEqualOptions&equal_options=EqualOptions::Defaults())const#
Check twoarrow::ArrayStatistics for equality.
- Parameters:
other – Thearrow::ArrayStatistics instance to compare against.
equal_options – Options used to compare double values for equality.
- Returns:
True if the twoarrow::ArrayStatistics instances are equal; otherwise, false.
- inlinebooloperator==(constArrayStatistics&other)const#
Check two statistics for equality.
- inlinebooloperator!=(constArrayStatistics&other)const#
Check two statistics for not equality.
Public Members
- std::optional<int64_t>null_count=std::nullopt#
The number of null values, may not be set.
- std::optional<CountType>distinct_count=std::nullopt#
The number of distinct values, may not be set Note: when set to
int64_t, it representsexact_distinct_count, and when set todouble, it representsapproximate_distinct_count.
- std::optional<SizeType>max_byte_width=std::nullopt#
The maximum length in bytes of the rows in an array; may not be set Note: when the type is
int64_t, it representsmax_byte_width_exact, and when the type isdouble, it representsmax_byte_width_approximate.
- std::optional<double>average_byte_width=std::nullopt#
The average size in bytes of a row in an array, may not be set.
- boolis_average_byte_width_exact=false#
Whether the average size in bytes is exact or not.
- boolis_min_exact=false#
Whether the minimum value is exact or not.
- boolis_max_exact=false#
Whether the maximum value is exact or not.
- usingValueType=std::variant<bool,int64_t,uint64_t,double,std::string>#
- classArrayData#
Mutable container for generic Arrow array data.
This data structure is a self-contained representation of the memory and metadata inside an Arrow array data structure (called vectors in Java). TheArray class and its concrete subclasses provide strongly-typed accessors with support for the visitor pattern and other affordances.
This class is designed for easy internal data manipulation, analytical data processing, and data transport to and from IPC messages.
This class is also useful in an analytics setting where memory may be efficiently reused. For example, computing the Abs of a numeric array should return null iff the input is null: therefore, an Abs function can reuse the validity bitmap (aBuffer) of its input as the validity bitmap of its output.
This class is meant mostly for immutable data access. Any mutable access (either toArrayData members or to the contents of its Buffers) should take into account the fact thatArrayData instances are typically wrapped in a shared_ptr and can therefore have multiple owners at any given time. Therefore, mutable access is discouraged except when initially populating theArrayData.
Public Functions
- Result<std::shared_ptr<ArrayData>>CopyTo(conststd::shared_ptr<MemoryManager>&to)const#
Deep copy thisArrayData to destination memory manager.
Returns a newArrayData object with buffers and all child buffers copied to the destination memory manager. This includes dictionaries if applicable.
- Result<std::shared_ptr<ArrayData>>ViewOrCopyTo(conststd::shared_ptr<MemoryManager>&to)const#
View or copy thisArrayData to destination memory manager.
Tries to view the buffer contents on the given memory manager’s device if possible (to avoid a copy) but falls back to copying if a no-copy view isn’t supported.
- inlineboolIsNull(int64_ti)const#
Return the null-ness of a given array element.
Calling
IsNull(i)is the same as!IsValid(i).
- inlineboolIsValid(int64_ti)const#
Return the validity of a given array element.
For most data types, this will simply query the validity bitmap. For union and run-end-encoded arrays, the underlying child data is queried instead. For dictionary arrays, this reflects the validity of the dictionary index, but the corresponding dictionary value might still be null. For null arrays, this always returns false.
- template<typenameT>
inlineconstT*GetValues(inti,int64_tabsolute_offset)const# Access a buffer’s data as a typed C pointer.
If
absolute_offsetis non-zero, the typeTmust match the layout of buffer numberifor the array’s data type; otherwise offset computation would be incorrect.If the given buffer is bit-packed (such as a validity bitmap, or the data buffer of a boolean array), then
absolute_offsetmust be zero for correct results, and any bit offset must be applied manually by the caller.- Parameters:
i – the buffer index
absolute_offset – the offset into the buffer
- template<typenameT>
inlineconstT*GetValues(inti)const# Access a buffer’s data as a typed C pointer.
This method uses the array’s offset to index into buffer number
i.Calling this method on a bit-packed buffer (such as a validity bitmap, or the data buffer of a boolean array) will lead to incorrect results. You should instead call
GetValues(i,0)and apply the bit offset manually.- Parameters:
i – the buffer index
- template<typenameT>
inlineconstT*GetValuesSafe(inti,int64_tabsolute_offset)const# Access a buffer’s data as a typed C pointer.
Like
GetValues(i,absolute_offset), but returns nullptr if the given buffer is not a CPU buffer.- Parameters:
i – the buffer index
absolute_offset – the offset into the buffer
- template<typenameT>
inlineconstT*GetValuesSafe(inti)const# Access a buffer’s data as a typed C pointer.
Like
GetValues(i), but returns nullptr if the given buffer is not a CPU buffer.- Parameters:
i – the buffer index
- template<typenameT>
inlineT*GetMutableValues(inti,int64_tabsolute_offset)# Access a buffer’s data as a mutable typed C pointer.
Like
GetValues(i,absolute_offset), but allows mutating buffer contents. This should only be used when initially populating theArrayData, before it is attached to aArray instance.- Parameters:
i – the buffer index
absolute_offset – the offset into the buffer
- template<typenameT>
inlineT*GetMutableValues(inti)# Access a buffer’s data as a mutable typed C pointer.
Like
GetValues(i), but allows mutating buffer contents. This should only be used when initially populating theArrayData, before it is attached to aArray instance.- Parameters:
i – the buffer index
- std::shared_ptr<ArrayData>Slice(int64_toffset,int64_tlength)const#
Construct a zero-copy slice of the data with the given offset and length.
This method applies the given slice to thisArrayData, taking into account its existing offset and length. If the given
lengthis too large, the slice length is clamped so as not to go past the offset end. If the givenoftenis too large, or if eitheroffsetorlengthis negative, behavior is undefined.The associatedArrayStatistics is always discarded in a slicedArrayData, even if the slice is trivially equal to the originalArrayData. If you want to reuse the statistics from the originalArrayData, you must explicitly reattach them.
- Result<std::shared_ptr<ArrayData>>SliceSafe(int64_toffset,int64_tlength)const#
Construct a zero-copy slice of the data with the given offset and length.
Like
Slice(offset,length), but returns an error if the requested slice falls out of bounds. Unlike Slice,lengthisn’t clamped to the available buffer size.
- inlinevoidSetNullCount(int64_tv)#
Set the cached physical null count.
This should only be used when initially populating theArrayData, if it possible to compute the null count without visiting the entire validity bitmap. In most cases, relying on
GetNullCountis sufficient.- Parameters:
v – the number of nulls in theArrayData
- int64_tGetNullCount()const#
Return the physical null count.
This method returns the number of array elements for which
IsValidwould return false.A cached value is returned if already available, otherwise it is first computed and stored. How it is is computed depends on the data type, see
IsValidfor details.Note that this method is typically much faster than calling
IsValidfor all elements. Therefore, it helps avoid per-element validity bitmap lookups in the common cases where the array contains zero or only nulls.
- inlineboolMayHaveNulls()const#
Return true if the array may have nulls in its validity bitmap.
This method returns true if the data has a validity bitmap, and the physical null count is either known to be non-zero or not yet known.
Unlike
MayHaveLogicalNulls, this does not check for the presence of nulls in child data for data types such as unions and run-end encoded types.See also
See also
- inlineboolHasValidityBitmap()const#
Return true if the array has a validity bitmap.
- inlineboolMayHaveLogicalNulls()const#
Return true if the array may have logical nulls.
Unlike
MayHaveNulls, this method checks for null child values for types without a validity bitmap, such as unions and run-end encoded types, and for null dictionary values for dictionary types.This implies that
MayHaveLogicalNullsmay return true for arrays that don’t have a top-level validity bitmap. It is therefore necessary to callHasValidityBitmapbefore accessing a top-level validity bitmap.Code that previously used MayHaveNulls and then dealt with the validity bitmap directly can be fixed to handle all types correctly without performance degradation when handling most types by adopting HasValidityBitmap and MayHaveLogicalNulls.
Before:
After:uint8_t* validity = array.MayHaveNulls() ? array.buffers[0].data : NULLPTR;for (int64_t i = 0; i < array.length; ++i) { if (validity && !bit_util::GetBit(validity, i)) { continue; // skip a NULL } ...}bool all_valid = !array.MayHaveLogicalNulls();uint8_t* validity = array.HasValidityBitmap() ? array.buffers[0].data : NULLPTR;for (int64_t i = 0; i < array.length; ++i) { bool is_valid = all_valid || (validity && bit_util::GetBit(validity, i)) || array.IsValid(i); if (!is_valid) { continue; // skip a NULL } ...}
- int64_tComputeLogicalNullCount()const#
Compute the logical null count for arrays of all types.
If the array has a validity bitmap, this function behaves the same as GetNullCount. For arrays that have no validity bitmap but whose values may be logically null (such as union arrays and run-end encoded arrays), this function recomputes the null count every time it is called.
See also
- DeviceAllocationTypedevice_type()const#
Return the device_type of the underlying buffers and children.
If there are no buffers in thisArrayData object, it just returns DeviceAllocationType::kCPU as a default. We also assume that all buffers should be allocated on the same device type and perform DCHECKs to confirm this in debug mode.
- Returns:
DeviceAllocationType
- Result<std::shared_ptr<ArrayData>>CopyTo(conststd::shared_ptr<MemoryManager>&to)const#
- classArray#
Array base type Immutable data array with some logical type and some length.
Any memory is owned by the respectiveBuffer instance (or its parents).
The base class is only required to have a null bitmap buffer if the null count is greater than 0
If known, the null count can be provided in the baseArray constructor. If the null count is not known, pass -1 to indicate that the null count is to be computed on the first call tonull_count()
Subclassed byarrow::VarLengthListLikeArray< LargeListType >,arrow::VarLengthListLikeArray< LargeListViewType >,arrow::VarLengthListLikeArray< ListType >,arrow::VarLengthListLikeArray< ListViewType >,arrow::DictionaryArray,arrow::ExtensionArray,arrow::FixedSizeListArray,arrow::FlatArray,arrow::RunEndEncodedArray,arrow::StructArray,arrow::UnionArray,arrow::VarLengthListLikeArray< TYPE >
Public Functions
- inlineboolIsNull(int64_ti)const#
Return true if value at index is null. Does not boundscheck.
- inlineboolIsValid(int64_ti)const#
Return true if value at index is valid (not null).
Does not boundscheck
- Result<std::shared_ptr<Scalar>>GetScalar(int64_ti)const#
Return aScalar containing the value of this array at i.
- inlineint64_tlength()const#
Size in the number of elements this array contains.
- inlineint64_toffset()const#
A relative position into another array’s data, to enable zero-copy slicing.
This value defaults to zero
- int64_tnull_count()const#
The number of null entries in the array.
If the null count was not known at time of construction (and set to a negative value), then the null count will be computed and cached on the first invocation of this function
- int64_tComputeLogicalNullCount()const#
Computes the logical null count for arrays of all types including those that do not have a validity bitmap like union and run-end encoded arrays.
If the array has a validity bitmap, this function behaves the same asnull_count(). For types that have no validity bitmap, this function will recompute the null count every time it is called.
See also
GetNullCount
- inlineconststd::shared_ptr<Buffer>&null_bitmap()const#
Buffer for the validity (null) bitmap, if any.
Note that Union types never have a null bitmap.
Note that for
null_count==0or for null type, this will be null. This buffer does not account for any slice offset
- inlineconstuint8_t*null_bitmap_data()const#
Raw pointer to the null bitmap.
Note that for
null_count==0or for null type, this will be null. This buffer does not account for any slice offset
- boolEquals(constArray&arr,constEqualOptions&=EqualOptions::Defaults())const#
Equality comparison with another array.
Note thatarrow::ArrayStatistics is not included in the comparison.
- std::stringDiff(constArray&other)const#
Return the formatted unified diff of arrow::Diff between thisArray and anotherArray.
- boolApproxEquals(conststd::shared_ptr<Array>&arr,constEqualOptions&=EqualOptions::Defaults())const#
Approximate equality comparison with another array.
epsilon is only used if this is FloatArray or DoubleArray
Note thatarrow::ArrayStatistics is not included in the comparison.
- boolRangeEquals(int64_tstart_idx,int64_tend_idx,int64_tother_start_idx,constArray&other,constEqualOptions&=EqualOptions::Defaults())const#
Compare if the range of slots specified are equal for the given array and this array.
end_idx exclusive. This methods does not bounds check.
Note thatarrow::ArrayStatistics is not included in the comparison.
- StatusAccept(ArrayVisitor*visitor)const#
Apply theArrayVisitor::Visit() method specialized to the array type.
- Result<std::shared_ptr<Array>>View(conststd::shared_ptr<DataType>&type)const#
Construct a zero-copy view of this array with the given type.
This method checks if the types are layout-compatible. Nested types are traversed in depth-first order. Data buffers must have the same item sizes, even though the logical types may be different. An error is returned if the types are not layout-compatible.
- Result<std::shared_ptr<Array>>CopyTo(conststd::shared_ptr<MemoryManager>&to)const#
Construct a copy of the array with all buffers on destination Memory Manager.
This method recursively copies the array’s buffers and those of its children onto the destinationMemoryManager device and returns the newArray.
- Result<std::shared_ptr<Array>>ViewOrCopyTo(conststd::shared_ptr<MemoryManager>&to)const#
Construct a new array attempting to zero-copy view if possible.
Like CopyTo this method recursively goes through all of the array’s buffers and those of it’s children and first attempts to create zero-copy views on the destinationMemoryManager device. If it can’t, it falls back to performing a copy. SeeBuffer::ViewOrCopy.
- std::shared_ptr<Array>Slice(int64_toffset,int64_tlength)const#
Construct a zero-copy slice of the array with the indicated offset and length.
- Parameters:
offset –[in] the position of the first element in the constructed slice
length –[in] the length of the slice. If there are not enough elements in the array, the length will be adjusted accordingly
- Returns:
a new object wrapped in std::shared_ptr<Array>
- Result<std::shared_ptr<Array>>SliceSafe(int64_toffset,int64_tlength)const#
Input-checking variant ofArray::Slice.
- Result<std::shared_ptr<Array>>SliceSafe(int64_toffset)const#
Input-checking variant ofArray::Slice.
- std::stringToString()const#
- Returns:
PrettyPrint representation of array suitable for debugging
- StatusValidate()const#
Perform cheap validation checks to determine obvious inconsistencies within the array’s internal data.
This is O(k) where k is the number of descendents.
- Returns:
- StatusValidateFull()const#
Perform extensive validation checks to determine inconsistencies within the array’s internal data.
This is potentially O(k*n) where k is the number of descendents and n is the array length.
- Returns:
- inlineDeviceAllocationTypedevice_type()const#
Return the device_type that this array’s data is allocated on.
This just delegates to calling device_type on the underlyingArrayData object which backs thisArray.
- Returns:
DeviceAllocationType
- inlineconststd::shared_ptr<ArrayStatistics>&statistics()const#
Return the statistics of thisArray.
This just delegates to calling statistics on the underlyingArrayData object which backs thisArray.
- Returns:
const std::shared_ptr<ArrayStatistics>&
- inlineboolIsNull(int64_ti)const#
- classFlatArray:publicarrow::Array#
Base class for non-nested arrays.
Subclassed byarrow::BaseBinaryArray< BinaryType >,arrow::BaseBinaryArray< LargeBinaryType >,arrow::BaseBinaryArray< TYPE >,arrow::BinaryViewArray,arrow::NullArray,arrow::PrimitiveArray
- classPrimitiveArray:publicarrow::FlatArray#
Base class for arrays of fixed-size logical types.
Subclassed byarrow::BooleanArray,arrow::DayTimeIntervalArray,arrow::FixedSizeBinaryArray,arrow::MonthDayNanoIntervalArray,arrow::NumericArray< TYPE >
Factory functions#
- std::shared_ptr<Array>MakeArray(conststd::shared_ptr<ArrayData>&data)#
Create a strongly-typedArray instance from genericArrayData.
- Parameters:
data –[in] the array contents
- Returns:
the resultingArray instance
- Result<std::shared_ptr<Array>>MakeArrayOfNull(conststd::shared_ptr<DataType>&type,int64_tlength,MemoryPool*pool=default_memory_pool())#
Create a strongly-typedArray instance with all elements null.
- Parameters:
type –[in] the array type
length –[in] the array length
pool –[in] the memory pool to allocate memory from
- Result<std::shared_ptr<Array>>MakeArrayFromScalar(constScalar&scalar,int64_tlength,MemoryPool*pool=default_memory_pool())#
Create anArray instance whose slots are the given scalar.
- Parameters:
scalar –[in] the value with which to fill the array
length –[in] the array length
pool –[in] the memory pool to allocate memory from
Concrete array subclasses#
Primitive and temporal#
- classBooleanArray:publicarrow::PrimitiveArray#
ConcreteArray class for boolean data.
- usingDecimalArray=Decimal128Array#
- classDecimal32Array:publicarrow::FixedSizeBinaryArray#
- #include <arrow/array/array_decimal.h>
ConcreteArray class for 32-bit decimal data.
Public Functions
- explicitDecimal32Array(conststd::shared_ptr<ArrayData>&data)#
ConstructDecimal32Array fromArrayData instance.
- explicitDecimal32Array(conststd::shared_ptr<ArrayData>&data)#
- classDecimal64Array:publicarrow::FixedSizeBinaryArray#
- #include <arrow/array/array_decimal.h>
ConcreteArray class for 64-bit decimal data.
Public Functions
- explicitDecimal64Array(conststd::shared_ptr<ArrayData>&data)#
ConstructDecimal64Array fromArrayData instance.
- explicitDecimal64Array(conststd::shared_ptr<ArrayData>&data)#
- classDecimal128Array:publicarrow::FixedSizeBinaryArray#
- #include <arrow/array/array_decimal.h>
ConcreteArray class for 128-bit decimal data.
Public Functions
- explicitDecimal128Array(conststd::shared_ptr<ArrayData>&data)#
ConstructDecimal128Array fromArrayData instance.
- explicitDecimal128Array(conststd::shared_ptr<ArrayData>&data)#
- classDecimal256Array:publicarrow::FixedSizeBinaryArray#
- #include <arrow/array/array_decimal.h>
ConcreteArray class for 256-bit decimal data.
Public Functions
- explicitDecimal256Array(conststd::shared_ptr<ArrayData>&data)#
ConstructDecimal256Array fromArrayData instance.
- explicitDecimal256Array(conststd::shared_ptr<ArrayData>&data)#
- template<typenameTYPE>
classNumericArray:publicarrow::PrimitiveArray# - #include <arrow/array/array_primitive.h>
ConcreteArray class for numeric data with a corresponding C type.
This class is templated on the correspondingDataType subclass for the given data, for example NumericArray<Int8Type> or NumericArray<Date32Type>.
Note that convenience aliases are available for all accepted types (for example Int8Array for NumericArray<Int8Type>).
- classDayTimeIntervalArray:publicarrow::PrimitiveArray#
- #include <arrow/array/array_primitive.h>
Array of Day and Millisecond values.
DayTimeArray
- classMonthDayNanoIntervalArray:publicarrow::PrimitiveArray#
- #include <arrow/array/array_primitive.h>
Array of Month, Day and nanosecond values.
Binary-like#
- template<typenameTYPE>
classBaseBinaryArray:publicarrow::FlatArray# - #include <arrow/array/array_binary.h>
Base class for variable-sized binary arrays, regardless of offset size and logical interpretation.
Public Functions
- inlineconstuint8_t*GetValue(int64_ti,offset_type*out_length)const#
Return the pointer to the given elements bytes.
- inlinestd::string_viewGetView(int64_ti)const#
Get binary value as a string_view.
- Parameters:
i – the value index
- Returns:
the view over the selected value
- inlinestd::string_viewValue(int64_ti)const#
Get binary value as a string_view Provided for consistency with other arrays.
- Parameters:
i – the value index
- Returns:
the view over the selected value
- inlinestd::stringGetString(int64_ti)const#
Get binary value as a std::string.
- Parameters:
i – the value index
- Returns:
the value copied into a std::string
- inlinestd::shared_ptr<Buffer>value_offsets()const#
Note that this buffer does not account for any slice offset.
- inlinestd::shared_ptr<Buffer>value_data()const#
Note that this buffer does not account for any slice offset.
- inlineoffset_typevalue_offset(int64_ti)const#
Return the data buffer absolute offset of the data for the value at the passed index.
Does not perform boundschecking
- inlineoffset_typevalue_length(int64_ti)const#
Return the length of the data for the value at the passed index.
Does not perform boundschecking
- inlineoffset_typetotal_values_length()const#
Return the total length of the memory in the data buffer referenced by this array.
If the array has been sliced then this may be less than the size of the data buffer (data_->buffers[2]).
- inlineconstuint8_t*GetValue(int64_ti,offset_type*out_length)const#
- classBinaryArray:publicarrow::BaseBinaryArray<BinaryType>#
- #include <arrow/array/array_binary.h>
ConcreteArray class for variable-size binary data.
Subclassed byarrow::StringArray
- classStringArray:publicarrow::BinaryArray#
- #include <arrow/array/array_binary.h>
ConcreteArray class for variable-size string (utf-8) data.
Public Functions
- StatusValidateUTF8()const#
Validate that this array contains only valid UTF8 entries.
This check is also implied byValidateFull()
- StatusValidateUTF8()const#
- classLargeBinaryArray:publicarrow::BaseBinaryArray<LargeBinaryType>#
- #include <arrow/array/array_binary.h>
ConcreteArray class for large variable-size binary data.
Subclassed byarrow::LargeStringArray
- classLargeStringArray:publicarrow::LargeBinaryArray#
- #include <arrow/array/array_binary.h>
ConcreteArray class for large variable-size string (utf-8) data.
Public Functions
- StatusValidateUTF8()const#
Validate that this array contains only valid UTF8 entries.
This check is also implied byValidateFull()
- StatusValidateUTF8()const#
- classBinaryViewArray:publicarrow::FlatArray#
- #include <arrow/array/array_binary.h>
ConcreteArray class for variable-size binary view data using theBinaryViewType::c_type struct to reference in-line or out-of-line string values.
Subclassed byarrow::StringViewArray
- classStringViewArray:publicarrow::BinaryViewArray#
- #include <arrow/array/array_binary.h>
ConcreteArray class for variable-size string view (utf-8) data usingBinaryViewType::c_type to reference in-line or out-of-line string values.
Public Functions
- StatusValidateUTF8()const#
Validate that this array contains only valid UTF8 entries.
This check is also implied byValidateFull()
- StatusValidateUTF8()const#
- classFixedSizeBinaryArray:publicarrow::PrimitiveArray#
- #include <arrow/array/array_binary.h>
ConcreteArray class for fixed-size binary data.
Subclassed byarrow::Decimal128Array,arrow::Decimal256Array,arrow::Decimal32Array,arrow::Decimal64Array
Nested#
- template<typenameTYPE>
classVarLengthListLikeArray:publicarrow::Array# - #include <arrow/array/array_nested.h>
Base class for variable-sized list and list-view arrays, regardless of offset size.
Subclassed byarrow::BaseListArray< TYPE >,arrow::BaseListViewArray< TYPE >
Public Functions
- inlineconststd::shared_ptr<Array>&values()const#
Return array object containing the list’s values.
Note that this buffer does not account for any slice offset or length.
- inlineconststd::shared_ptr<Buffer>&value_offsets()const#
Note that this buffer does not account for any slice offset or length.
- inlineconstoffset_type*raw_value_offsets()const#
Return pointer to raw value offsets accounting for any slice offset.
- virtualoffset_typevalue_length(int64_ti)const=0#
Return the size of the value at a particular index.
Since non-empty null lists and list-views are possible, avoid calling this function when the list at slot i is null.
- Pre:
IsValid(i)
- inlineResult<std::shared_ptr<Array>>FlattenRecursively(MemoryPool*memory_pool=default_memory_pool())const#
Flatten all level recursively until reach a non-list type, and return a non-list typeArray.
See also
internal::FlattenLogicalListRecursively
- inlineconststd::shared_ptr<Array>&values()const#
- template<typenameTYPE>
classBaseListArray:publicarrow::VarLengthListLikeArray<TYPE># - #include <arrow/array/array_nested.h>
Public Functions
- inlinevirtualoffset_typevalue_length(int64_ti)constfinal#
Return the size of the value at a particular index.
Since non-empty null lists are possible, avoid calling this function when the list at slot i is null.
- Pre:
IsValid(i)
- inlinevirtualoffset_typevalue_length(int64_ti)constfinal#
- classListArray:publicarrow::BaseListArray<ListType>#
- #include <arrow/array/array_nested.h>
ConcreteArray class for list data.
Subclassed byarrow::MapArray
Public Functions
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
Return anArray that is a concatenation of the lists in this array.
Note that it’s different from
values()in that it takes into consideration of this array’s offsets as well as null elements backed by non-empty lists (they are skipped, thus copying may be needed).
- std::shared_ptr<Array>offsets()const#
Return list offsets as an Int32Array.
The returned array will not have a validity bitmap, so you cannot expect to pass it toListArray::FromArrays() and get back the same list array if the original one has nulls.
Public Static Functions
- staticResult<std::shared_ptr<ListArray>>FromArrays(constArray&offsets,constArray&values,MemoryPool*pool=default_memory_pool(),std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount)#
ConstructListArray from array of offsets and child value array.
This function does the bare minimum of validation of the offsets and input types, and will allocate a new offsets array if necessary (i.e. if the offsets contain any nulls). If the offsets do not have nulls, they are assumed to be well-formed.
If a null_bitmap is not provided, the nulls will be inferred from the offsets’ null bitmap. But if a null_bitmap is provided, the offsets array can’t have nulls.
And when a null_bitmap is provided, the offsets array cannot be a slice (i.e. an array withoffset() > 0).
- Parameters:
offsets –[in]Array containing n + 1 offsets encoding length and size. Must be of int32 type
values –[in]Array containing list values
pool –[in]MemoryPool in case new offsets array needs to be allocated because of null values
null_bitmap –[in] Optional validity bitmap
null_count –[in] Optional null count in null_bitmap
- staticResult<std::shared_ptr<ListArray>>FromListView(constListViewArray&source,MemoryPool*pool)#
Build aListArray from aListViewArray.
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
- classLargeListArray:publicarrow::BaseListArray<LargeListType>#
- #include <arrow/array/array_nested.h>
ConcreteArray class for large list data (with 64-bit offsets)
Public Functions
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
Return anArray that is a concatenation of the lists in this array.
Note that it’s different from
values()in that it takes into consideration of this array’s offsets as well as null elements backed by non-empty lists (they are skipped, thus copying may be needed).
Public Static Functions
- staticResult<std::shared_ptr<LargeListArray>>FromArrays(constArray&offsets,constArray&values,MemoryPool*pool=default_memory_pool(),std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount)#
ConstructLargeListArray from array of offsets and child value array.
This function does the bare minimum of validation of the offsets and input types, and will allocate a new offsets array if necessary (i.e. if the offsets contain any nulls). If the offsets do not have nulls, they are assumed to be well-formed.
If a null_bitmap is not provided, the nulls will be inferred from the offsets’ null bitmap. But if a null_bitmap is provided, the offsets array can’t have nulls.
And when a null_bitmap is provided, the offsets array cannot be a slice (i.e. an array withoffset() > 0).
- Parameters:
offsets –[in]Array containing n + 1 offsets encoding length and size. Must be of int64 type
values –[in]Array containing list values
pool –[in]MemoryPool in case new offsets array needs to be allocated because of null values
null_bitmap –[in] Optional validity bitmap
null_count –[in] Optional null count in null_bitmap
- staticResult<std::shared_ptr<LargeListArray>>FromListView(constLargeListViewArray&source,MemoryPool*pool)#
Build aLargeListArray from aLargeListViewArray.
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
- template<typenameTYPE>
classBaseListViewArray:publicarrow::VarLengthListLikeArray<TYPE># - #include <arrow/array/array_nested.h>
Public Functions
- inlineconststd::shared_ptr<Buffer>&value_sizes()const#
Note that this buffer does not account for any slice offset or length.
- inlineconstoffset_type*raw_value_sizes()const#
Return pointer to raw value offsets accounting for any slice offset.
- inlinevirtualoffset_typevalue_length(int64_ti)constfinal#
Return the size of the value at a particular index.
This should not be called if the list-view at slot i is null. The returned size in those cases could be any value from 0 to the length of the child values array.
- Pre:
IsValid(i)
- inlineconststd::shared_ptr<Buffer>&value_sizes()const#
- classListViewArray:publicarrow::BaseListViewArray<ListViewType>#
- #include <arrow/array/array_nested.h>
ConcreteArray class for list-view data.
Public Functions
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
Return anArray that is a concatenation of the list-views in this array.
Note that it’s different from
values()in that it takes into consideration this array’s offsets (which can be in any order) and sizes. Nulls are skipped.This function invokes Concatenate() if list-views are non-contiguous. It will try to minimize the number of array slices passed to Concatenate() by maximizing the size of each slice (containing as many contiguous list-views as possible).
- std::shared_ptr<Array>offsets()const#
Return list-view offsets as an Int32Array.
The returned array will not have a validity bitmap, so you cannot expect to pass it toListArray::FromArrays() and get back the same list array if the original one has nulls.
- std::shared_ptr<Array>sizes()const#
Return list-view sizes as an Int32Array.
The returned array will not have a validity bitmap, so you cannot expect to pass it toListViewArray::FromArrays() and get back the same list array if the original one has nulls.
Public Static Functions
- staticResult<std::shared_ptr<ListViewArray>>FromArrays(constArray&offsets,constArray&sizes,constArray&values,MemoryPool*pool=default_memory_pool(),std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount)#
ConstructListViewArray from array of offsets, sizes, and child value array.
Construct aListViewArray using buffers from offsets and sizes arrays that project views into the child values array.
This function does the bare minimum of validation of the offsets/sizes and input types. The offset and length of the offsets and sizes arrays must match and that will be checked, but their contents will be assumed to be well-formed.
If a null_bitmap is not provided, the nulls will be inferred from the offsets’s null bitmap. But if a null_bitmap is provided, the offsets array can’t have nulls.
And when a null_bitmap is provided, neither the offsets or sizes array can be a slice (i.e. an array withoffset() > 0).
- Parameters:
offsets –[in] An array of int32 offsets into the values array. NULL values are supported if the corresponding values in sizes is NULL or 0.
sizes –[in] An array containing the int32 sizes of every view. NULL values are taken to represent a NULL list-view in the array being created.
values –[in]Array containing list values
pool –[in]MemoryPool
null_bitmap –[in] Optional validity bitmap
null_count –[in] Optional null count in null_bitmap
- staticResult<std::shared_ptr<ListViewArray>>FromList(constListArray&list_array,MemoryPool*pool)#
Build aListViewArray from aListArray.
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
- classLargeListViewArray:publicarrow::BaseListViewArray<LargeListViewType>#
- #include <arrow/array/array_nested.h>
ConcreteArray class for large list-view data (with 64-bit offsets and sizes)
Public Functions
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
Return anArray that is a concatenation of the large list-views in this array.
Note that it’s different from
values()in that it takes into consideration this array’s offsets (which can be in any order) and sizes. Nulls are skipped.
- std::shared_ptr<Array>offsets()const#
Return list-view offsets as an Int64Array.
The returned array will not have a validity bitmap, so you cannot expect to pass it toLargeListArray::FromArrays() and get back the same list array if the original one has nulls.
- std::shared_ptr<Array>sizes()const#
Return list-view sizes as an Int64Array.
The returned array will not have a validity bitmap, so you cannot expect to pass it toLargeListViewArray::FromArrays() and get back the same list array if the original one has nulls.
Public Static Functions
- staticResult<std::shared_ptr<LargeListViewArray>>FromArrays(constArray&offsets,constArray&sizes,constArray&values,MemoryPool*pool=default_memory_pool(),std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount)#
ConstructLargeListViewArray from array of offsets, sizes, and child value array.
Construct anLargeListViewArray using buffers from offsets and sizes arrays that project views into the values array.
This function does the bare minimum of validation of the offsets/sizes and input types. The offset and length of the offsets and sizes arrays must match and that will be checked, but their contents will be assumed to be well-formed.
If a null_bitmap is not provided, the nulls will be inferred from the offsets’ or sizes’ null bitmap. Only one of these two is allowed to have a null bitmap. But if a null_bitmap is provided, the offsets array and the sizes array can’t have nulls.
And when a null_bitmap is provided, neither the offsets or sizes array can be a slice (i.e. an array withoffset() > 0).
- Parameters:
offsets –[in] An array of int64 offsets into the values array. NULL values are supported if the corresponding values in sizes is NULL or 0.
sizes –[in] An array containing the int64 sizes of every view. NULL values are taken to represent a NULL list-view in the array being created.
values –[in]Array containing list values
pool –[in]MemoryPool
null_bitmap –[in] Optional validity bitmap
null_count –[in] Optional null count in null_bitmap
- staticResult<std::shared_ptr<LargeListViewArray>>FromList(constLargeListArray&list_array,MemoryPool*pool)#
Build aLargeListViewArray from aLargeListArray.
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
- classMapArray:publicarrow::ListArray#
- #include <arrow/array/array_nested.h>
ConcreteArray class for map data.
NB: “value” in this context refers to a pair of a key and the corresponding item
Public Functions
Public Static Functions
- staticResult<std::shared_ptr<Array>>FromArrays(conststd::shared_ptr<Array>&offsets,conststd::shared_ptr<Array>&keys,conststd::shared_ptr<Array>&items,MemoryPool*pool=default_memory_pool(),std::shared_ptr<Buffer>null_bitmap=NULLPTR)#
ConstructMapArray from array of offsets and child key, item arrays.
This function does the bare minimum of validation of the offsets and input types, and will allocate a new offsets array if necessary (i.e. if the offsets contain any nulls). If the offsets do not have nulls, they are assumed to be well-formed
- Parameters:
offsets –[in]Array containing n + 1 offsets encoding length and size. Must be of int32 type
keys –[in]Array containing key values
items –[in]Array containing item values
pool –[in]MemoryPool in case new offsets array needs to be
null_bitmap –[in] Optional validity bitmap allocated because of null values
- staticResult<std::shared_ptr<Array>>FromArrays(conststd::shared_ptr<Array>&offsets,conststd::shared_ptr<Array>&keys,conststd::shared_ptr<Array>&items,MemoryPool*pool=default_memory_pool(),std::shared_ptr<Buffer>null_bitmap=NULLPTR)#
- classFixedSizeListArray:publicarrow::Array#
- #include <arrow/array/array_nested.h>
ConcreteArray class for fixed size list data.
Public Functions
- inlineint32_tvalue_length(int64_ti=0)const#
Return the fixed-size of the values.
No matter the value of the index parameter, the result is the same. So even when the value at slot i is null, this function will return a non-zero size.
- Pre:
IsValid(i)
- Result<std::shared_ptr<Array>>Flatten(MemoryPool*memory_pool=default_memory_pool())const#
Return anArray that is a concatenation of the lists in this array.
Note that it’s different from
values()in that it takes into consideration null elements (they are skipped, thus copying may be needed).
- inlineResult<std::shared_ptr<Array>>FlattenRecursively(MemoryPool*memory_pool=default_memory_pool())const#
Flatten all level recursively until reach a non-list type, and return a non-list typeArray.
See also
internal::FlattenLogicalListRecursively
Public Static Functions
- staticResult<std::shared_ptr<Array>>FromArrays(conststd::shared_ptr<Array>&values,int32_tlist_size,std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount)#
ConstructFixedSizeListArray from child value array and value_length.
- Parameters:
values –[in]Array containing list values
list_size –[in] The fixed length of each list
null_bitmap –[in] Optional validity bitmap
null_count –[in] Optional null count in null_bitmap
- Returns:
Will have length equal to values.length() / list_size
- staticResult<std::shared_ptr<Array>>FromArrays(conststd::shared_ptr<Array>&values,std::shared_ptr<DataType>type,std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount)#
ConstructFixedSizeListArray from child value array and type.
- Parameters:
values –[in]Array containing list values
type –[in] The fixed sized list type
null_bitmap –[in] Optional validity bitmap
null_count –[in] Optional null count in null_bitmap
- Returns:
Will have length equal to values.length() / type.list_size()
- inlineint32_tvalue_length(int64_ti=0)const#
- classStructArray:publicarrow::Array#
- #include <arrow/array/array_nested.h>
ConcreteArray class for struct data.
Public Functions
- StatusCanReferenceFieldByName(conststd::string&name)const#
Indicate if field named
namecan be found unambiguously in the struct.
- StatusCanReferenceFieldsByNames(conststd::vector<std::string>&names)const#
Indicate if fields named
namescan be found unambiguously in the struct.
- Result<ArrayVector>Flatten(MemoryPool*pool=default_memory_pool())const#
Flatten this array as a vector of arrays, one for each field.
- Parameters:
pool –[in] The pool to allocate null bitmaps from, if necessary
- Result<std::shared_ptr<Array>>GetFlattenedField(intindex,MemoryPool*pool=default_memory_pool())const#
Get one of the child arrays, combining its null bitmap with the parent struct array’s bitmap.
- Parameters:
index –[in] Which child array to get
pool –[in] The pool to allocate null bitmaps from, if necessary
Public Static Functions
- staticResult<std::shared_ptr<StructArray>>Make(constArrayVector&children,conststd::vector<std::string>&field_names,std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount,int64_toffset=0)#
Return aStructArray from child arrays and field names.
The length and data type are automatically inferred from the arguments. There should be at least one child array.
- staticResult<std::shared_ptr<StructArray>>Make(constArrayVector&children,constFieldVector&fields,std::shared_ptr<Buffer>null_bitmap=NULLPTR,int64_tnull_count=kUnknownNullCount,int64_toffset=0)#
Return aStructArray from child arrays and fields.
The length is automatically inferred from the arguments. There should be at least one child array. This method does not check that field types and child array types are consistent.
- StatusCanReferenceFieldByName(conststd::string&name)const#
- classUnionArray:publicarrow::Array#
- #include <arrow/array/array_nested.h>
Base class forSparseUnionArray andDenseUnionArray.
Subclassed byarrow::DenseUnionArray,arrow::SparseUnionArray
- classSparseUnionArray:publicarrow::UnionArray#
- #include <arrow/array/array_nested.h>
ConcreteArray class for sparse union data.
Public Functions
- Result<std::shared_ptr<Array>>GetFlattenedField(intindex,MemoryPool*pool=default_memory_pool())const#
Get one of the child arrays, adjusting its null bitmap where the union array type code does not match.
- Parameters:
index –[in] Which child array to get (i.e. the physical index, not the type code)
pool –[in] The pool to allocate null bitmaps from, if necessary
Public Static Functions
- staticinlineResult<std::shared_ptr<Array>>Make(constArray&type_ids,ArrayVectorchildren,std::vector<type_code_t>type_codes)#
ConstructSparseUnionArray from type_ids and children.
This function does the bare minimum of validation of the input types.
- Parameters:
type_ids –[in] An array of logical type ids for the union type
children –[in] Vector of children Arrays containing the data for each type.
type_codes –[in] Vector of type codes.
- staticResult<std::shared_ptr<Array>>Make(constArray&type_ids,ArrayVectorchildren,std::vector<std::string>field_names={},std::vector<type_code_t>type_codes={})#
ConstructSparseUnionArray with custom field names from type_ids and children.
This function does the bare minimum of validation of the input types.
- Parameters:
type_ids –[in] An array of logical type ids for the union type
children –[in] Vector of children Arrays containing the data for each type.
field_names –[in] Vector of strings containing the name of each field.
type_codes –[in] Vector of type codes.
- Result<std::shared_ptr<Array>>GetFlattenedField(intindex,MemoryPool*pool=default_memory_pool())const#
- classDenseUnionArray:publicarrow::UnionArray#
- #include <arrow/array/array_nested.h>
ConcreteArray class for dense union data.
Note that union types do not have a validity bitmap
Public Functions
Public Static Functions
- staticinlineResult<std::shared_ptr<Array>>Make(constArray&type_ids,constArray&value_offsets,ArrayVectorchildren,std::vector<type_code_t>type_codes)#
ConstructDenseUnionArray from type_ids, value_offsets, and children.
This function does the bare minimum of validation of the offsets and input types.
- Parameters:
type_ids –[in] An array of logical type ids for the union type
value_offsets –[in] An array of signed int32 values indicating the relative offset into the respective child array for the type in a given slot. The respective offsets for each child value array must be in order / increasing.
children –[in] Vector of children Arrays containing the data for each type.
type_codes –[in] Vector of type codes.
- staticResult<std::shared_ptr<Array>>Make(constArray&type_ids,constArray&value_offsets,ArrayVectorchildren,std::vector<std::string>field_names={},std::vector<type_code_t>type_codes={})#
ConstructDenseUnionArray with custom field names from type_ids, value_offsets, and children.
This function does the bare minimum of validation of the offsets and input types.
- Parameters:
type_ids –[in] An array of logical type ids for the union type
value_offsets –[in] An array of signed int32 values indicating the relative offset into the respective child array for the type in a given slot. The respective offsets for each child value array must be in order / increasing.
children –[in] Vector of children Arrays containing the data for each type.
field_names –[in] Vector of strings containing the name of each field.
type_codes –[in] Vector of type codes.
- staticinlineResult<std::shared_ptr<Array>>Make(constArray&type_ids,constArray&value_offsets,ArrayVectorchildren,std::vector<type_code_t>type_codes)#
Dictionary-encoded#
- classDictionaryArray:publicarrow::Array#
Array type for dictionary-encoded data with a data-dependent dictionary.
A dictionary array contains an array of non-negative integers (the “dictionary indices”) along with a data type containing a “dictionary” corresponding to the distinct values represented in the data.
For example, the array
[“foo”, “bar”, “foo”, “bar”, “foo”, “bar”]
with dictionary [“bar”, “foo”], would have dictionary array representation
indices: [1, 0, 1, 0, 1, 0] dictionary: [“bar”, “foo”]
The indices in principle may be any integer type.
Public Functions
- Result<std::shared_ptr<Array>>Transpose(conststd::shared_ptr<DataType>&type,conststd::shared_ptr<Array>&dictionary,constint32_t*transpose_map,MemoryPool*pool=default_memory_pool())const#
Transpose thisDictionaryArray.
This method constructs a new dictionary array with the given dictionary type, transposing indices using the transpose map. The type and the transpose map are typically computed using DictionaryUnifier.
- Parameters:
type –[in] the new type object
dictionary –[in] the new dictionary
transpose_map –[in] transposition array of this array’s indices into the target array’s indices
pool –[in] a pool to allocate the array data from
- boolCanCompareIndices(constDictionaryArray&other)const#
Determine whether dictionary arrays may be compared without unification.
- conststd::shared_ptr<Array>&dictionary()const#
Return the dictionary for this array, which is stored as a member of theArrayData internal structure.
- int64_tGetValueIndex(int64_ti)const#
Return the ith value of indices, cast to int64_t.
Not recommended for use in performance-sensitive code. Does not validate whether the value is null or out-of-bounds.
Public Static Functions
- staticResult<std::shared_ptr<Array>>FromArrays(conststd::shared_ptr<DataType>&type,conststd::shared_ptr<Array>&indices,conststd::shared_ptr<Array>&dictionary)#
ConstructDictionaryArray from dictionary and indices array and validate.
This function does the validation of the indices and input type. It checks if all indices are non-negative and smaller than the size of the dictionary.
- Parameters:
type –[in] a dictionary type
dictionary –[in] the dictionary with same value type as the type object
indices –[in] an array of non-negative integers smaller than the size of the dictionary
- Result<std::shared_ptr<Array>>Transpose(conststd::shared_ptr<DataType>&type,conststd::shared_ptr<Array>&dictionary,constint32_t*transpose_map,MemoryPool*pool=default_memory_pool())const#
Extension arrays#
- classExtensionArray:publicarrow::Array#
Base array class for user-defined extension types.
Subclassed byarrow::extension::Bool8Array,arrow::extension::FixedShapeTensorArray,arrow::extension::OpaqueArray,arrow::extension::UuidArray
Public Functions
- explicitExtensionArray(conststd::shared_ptr<ArrayData>&data)#
Construct anExtensionArray from anArrayData.
TheArrayData must have the rightExtensionType.
- ExtensionArray(conststd::shared_ptr<DataType>&type,conststd::shared_ptr<Array>&storage)#
Construct anExtensionArray from a type and the underlying storage.
- explicitExtensionArray(conststd::shared_ptr<ArrayData>&data)#
Run-End Encoded Array#
- classRunEndEncodedArray:publicarrow::Array#
Array type for run-end encoded data.
Public Functions
- RunEndEncodedArray(conststd::shared_ptr<DataType>&type,int64_tlength,conststd::shared_ptr<Array>&run_ends,conststd::shared_ptr<Array>&values,int64_toffset=0)#
Construct aRunEndEncodedArray from all parameters.
The length and offset parameters refer to the dimensions of the logical array which is the array we would get after expanding all the runs into repeated values. As such, length can be much greater than the length of the child run_ends and values arrays.
- inlineconststd::shared_ptr<Array>&run_ends()const#
Returns an array holding the logical indexes of each run-end.
The physical offset to the array is applied.
- inlineconststd::shared_ptr<Array>&values()const#
Returns an array holding the values of each run.
The physical offset to the array is applied.
- Result<std::shared_ptr<Array>>LogicalRunEnds(MemoryPool*pool)const#
Returns an array holding the logical indexes of each run end.
If a non-zero logical offset is set, this function allocates a new array and rewrites all the run end values to be relative to the logical offset and cuts the end of the array to the logical length.
- std::shared_ptr<Array>LogicalValues()const#
Returns an array holding the values of each run.
If a non-zero logical offset is set, this function allocates a new array containing only the values within the logical range.
- int64_tFindPhysicalOffset()const#
Find the physical offset of this REE array.
This function uses binary-search, so it has a O(log N) cost.
- int64_tFindPhysicalLength()const#
Find the physical length of this REE array.
The physical length of an REE is the number of physical values (and run-ends) necessary to represent the logical range of values from offset to length.
Avoid calling this function if the physical length can be established in some other way (e.g. when iterating over the runs sequentially until the end). This function uses binary-search, so it has a O(log N) cost.
Public Static Functions
- staticResult<std::shared_ptr<RunEndEncodedArray>>Make(conststd::shared_ptr<DataType>&type,int64_tlogical_length,conststd::shared_ptr<Array>&run_ends,conststd::shared_ptr<Array>&values,int64_tlogical_offset=0)#
Construct aRunEndEncodedArray from all parameters.
The length and offset parameters refer to the dimensions of the logical array which is the array we would get after expanding all the runs into repeated values. As such, length can be much greater than the length of the child run_ends and values arrays.
- staticResult<std::shared_ptr<RunEndEncodedArray>>Make(int64_tlogical_length,conststd::shared_ptr<Array>&run_ends,conststd::shared_ptr<Array>&values,int64_tlogical_offset=0)#
Construct aRunEndEncodedArray from values and run ends arrays.
The data type is automatically inferred from the arguments. The run_ends and values arrays must have the same length.
- RunEndEncodedArray(conststd::shared_ptr<DataType>&type,int64_tlength,conststd::shared_ptr<Array>&run_ends,conststd::shared_ptr<Array>&values,int64_toffset=0)#
Chunked Arrays#
- classChunkedArray#
A data structure managing a list of primitive Arrow arrays logically as one large array.
Data chunking is treated throughout this project largely as an implementation detail for performance and memory use optimization.ChunkedArray allowsArray objects to be collected and interpreted as a single logical array without requiring an expensive concatenation step.
In some cases, data produced by a function may exceed the capacity of anArray (likeBinaryArray orStringArray) and so returning multiple Arrays is the only possibility. In these cases, we recommend returning aChunkedArray instead of vector of Arrays or some alternative.
When data is processed in parallel, it may not be practical or possible to create large contiguous memory allocations and write output into them. With some data types, like binary and string types, it is not possible at all to produce non-chunked array outputs without requiring a concatenation step at the end of processing.
Application developers may tune chunk sizes based on analysis of performance profiles but many developer-users will not need to be especially concerned with the chunking details.
Preserving the chunk layout/sizes in processing steps is generally not considered to be a contract in APIs. A function may decide to alter the chunking of its result. Similarly, APIs accepting multipleChunkedArray inputs should not expect the chunk layout to be the same in each input.
Public Functions
- inlineexplicitChunkedArray(std::shared_ptr<Array>chunk)#
Construct a chunked array from a singleArray.
- explicitChunkedArray(ArrayVectorchunks,std::shared_ptr<DataType>type=NULLPTR)#
Construct a chunked array from a vector of arrays and an optional data type.
The vector elements must have the same data type. If the data type is passed explicitly, the vector may be empty. If the data type is omitted, the vector must be non-empty.
- inlineint64_tlength()const#
- Returns:
the total length of the chunked array; computed on construction
- inlineint64_tnull_count()const#
- Returns:
the total number of nulls among all chunks
- inlineintnum_chunks()const#
- Returns:
the total number of chunks in the chunked array
- inlineconststd::shared_ptr<Array>&chunk(inti)const#
- Returns:
chunk a particular chunk from the chunked array
- inlineconstArrayVector&chunks()const#
- Returns:
an ArrayVector of chunks
- DeviceAllocationTypeSetdevice_types()const#
- Returns:
The set of device allocation types used by the chunks in this chunked array.
- inlineboolis_cpu()const#
- Returns:
true if all chunks are allocated on CPU-accessible memory.
- std::shared_ptr<ChunkedArray>Slice(int64_toffset,int64_tlength)const#
Construct a zero-copy slice of the chunked array with the indicated offset and length.
- Parameters:
offset –[in] the position of the first element in the constructed slice
length –[in] the length of the slice. If there are not enough elements in the chunked array, the length will be adjusted accordingly
- Returns:
a new object wrapped in std::shared_ptr<ChunkedArray>
- std::shared_ptr<ChunkedArray>Slice(int64_toffset)const#
Slice from offset until end of the chunked array.
- Result<std::vector<std::shared_ptr<ChunkedArray>>>Flatten(MemoryPool*pool=default_memory_pool())const#
Flatten this chunked array as a vector of chunked arrays, one for each struct field.
- Parameters:
pool –[in] The pool for buffer allocations, if any
- Result<std::shared_ptr<ChunkedArray>>View(conststd::shared_ptr<DataType>&type)const#
Construct a zero-copy view of this chunked array with the given type.
CallsArray::View on each constituent chunk. Always succeeds if there are zero chunks
- Result<std::shared_ptr<Scalar>>GetScalar(int64_tindex)const#
Return aScalar containing the value of this array at index.
- boolEquals(constChunkedArray&other,constEqualOptions&opts=EqualOptions::Defaults())const#
Determine if two chunked arrays are equal.
Two chunked arrays can be equal only if they have equal datatypes. However, they may be equal even if they have different chunkings.
- boolEquals(conststd::shared_ptr<ChunkedArray>&other,constEqualOptions&opts=EqualOptions::Defaults())const#
Determine if two chunked arrays are equal.
- boolApproxEquals(constChunkedArray&other,constEqualOptions&=EqualOptions::Defaults())const#
Determine if two chunked arrays approximately equal.
- std::stringToString()const#
- Returns:
PrettyPrint representation suitable for debugging
Public Static Functions
- staticResult<std::shared_ptr<ChunkedArray>>MakeEmpty(std::shared_ptr<DataType>type,MemoryPool*pool=default_memory_pool())#
Create an emptyChunkedArray of a given type.
The outputChunkedArray will have one chunk with an empty array of the given type.
- Parameters:
type –[in] the data type of the emptyChunkedArray
pool –[in] the memory pool to allocate memory from
- Returns:
the resultingChunkedArray
- inlineexplicitChunkedArray(std::shared_ptr<Array>chunk)#
- usingarrow::ChunkLocation=TypedChunkLocation<int64_t>#
- template<typenameIndexType>
structTypedChunkLocation#
- classChunkResolver#
An utility that incrementally resolves logical indices into physical indices in a chunked array.
Public Functions
- inlineexplicitChunkResolver(std::vector<int64_t>offsets)noexcept#
Construct aChunkResolver from a vector of chunks.size() + 1 offsets.
The first offset must be 0 and the last offset must be the logical length of the chunked array. Each offset before the last represents the starting logical index of the corresponding chunk.
- inlineChunkLocationResolve(int64_tindex)const#
Resolve a logical index to a ChunkLocation.
The returned ChunkLocation contains the chunk index and the within-chunk index equivalent to the logical index.
- Parameters:
index – The logical index to resolve
- Pre:
index>=0- Post:
location.chunk_indexin[0,chunks.size()]- Returns:
ChunkLocation with a valid chunk_index if index is within bounds, or with
chunk_index==chunks.size()if logical index is>=chunked_array.length().
- inlineChunkLocationResolveWithHint(int64_tindex,ChunkLocationhint)const#
Resolve a logical index to a ChunkLocation.
The returned ChunkLocation contains the chunk index and the within-chunk index equivalent to the logical index.
- Parameters:
index – The logical index to resolve
hint – ChunkLocation{} or the last ChunkLocation returned by thisChunkResolver.
- Pre:
index>=0- Post:
location.chunk_indexin[0,chunks.size()]- Returns:
ChunkLocation with a valid chunk_index if index is within bounds, or with
chunk_index==chunks.size()if logical index is>=chunked_array.length().
- template<typenameIndexType>
inlineboolResolveMany(int64_tn_indices,constIndexType*logical_index_vec,TypedChunkLocation<IndexType>*out_chunk_location_vec,IndexTypechunk_hint=0)const# Resolve
n_indiceslogical indices to chunk indices.- Parameters:
n_indices – The number of logical indices to resolve
logical_index_vec – The logical indices to resolve
out_chunk_location_vec – The output array where the locations will be written
chunk_hint – 0 or the last chunk_index produced by ResolveMany
- Pre:
0 <= logical_index_vec[i] < logical_array_length() (for well-defined and valid chunk index results)
- Pre:
out_chunk_location_vec has space for
n_indiceslocations- Pre:
chunk_hint in [0, chunks.size()]
- Post:
out_chunk_location_vec[i].chunk_index in [0, chunks.size()] for i in [0, n)
- Post:
if logical_index_vec[i] >= chunked_array.length(), then out_chunk_location_vec[i].chunk_index == chunks.size() and out_chunk_location_vec[i].index_in_chunk is UNDEFINED (can be out-of-bounds)
- Post:
if logical_index_vec[i] < 0, then both values in out_chunk_index_vec[i] are UNDEFINED
- Returns:
false iff chunks.size() > std::numeric_limits<IndexType>::max()
Public Static Functions
- staticinlineint32_tBisect(int64_tindex,constint64_t*offsets,int32_tlo,int32_thi)#
Find the index of the chunk that contains the logical index.
Any non-negative index is accepted. When
hi=num_offsets, the largest possible return value isnum_offsets-1which is equal tochunks.size(). Which is returned when the logical index is greater or equal the logical length of the chunked array.- Pre:
index >= 0 (otherwise, when index is negative, hi-1 is returned)
- Pre:
lo < hi
- Pre:
lo >= 0 && hi <= offsets_.size()
- inlineexplicitChunkResolver(std::vector<int64_t>offsets)noexcept#
Utilities#
- classArrayVisitor#
Abstract array visitor class.
Subclass this to create a visitor that can be used with theArray::Accept() method.
Public Functions
- virtual~ArrayVisitor()=default#
- virtualStatusVisit(constBooleanArray&array)#
- virtualStatusVisit(constStringArray&array)#
- virtualStatusVisit(constStringViewArray&array)#
- virtualStatusVisit(constBinaryArray&array)#
- virtualStatusVisit(constBinaryViewArray&array)#
- virtualStatusVisit(constLargeStringArray&array)#
- virtualStatusVisit(constLargeBinaryArray&array)#
- virtualStatusVisit(constFixedSizeBinaryArray&array)#
- virtualStatusVisit(constDayTimeIntervalArray&array)#
- virtualStatusVisit(constMonthDayNanoIntervalArray&array)#
- virtualStatusVisit(constDecimal32Array&array)#
- virtualStatusVisit(constDecimal64Array&array)#
- virtualStatusVisit(constDecimal128Array&array)#
- virtualStatusVisit(constDecimal256Array&array)#
- virtualStatusVisit(constLargeListArray&array)#
- virtualStatusVisit(constListViewArray&array)#
- virtualStatusVisit(constLargeListViewArray&array)#
- virtualStatusVisit(constFixedSizeListArray&array)#
- virtualStatusVisit(constStructArray&array)#
- virtualStatusVisit(constSparseUnionArray&array)#
- virtualStatusVisit(constDenseUnionArray&array)#
- virtualStatusVisit(constDictionaryArray&array)#
- virtualStatusVisit(constRunEndEncodedArray&array)#
- virtualStatusVisit(constExtensionArray&array)#
- virtual~ArrayVisitor()=default#
FromJSONString Helpers#
- groupFromJSONStringHelpers
These helpers are intended to be used in examples, tests, or for quick prototyping and are not intended to be used where performance matters.
See theUser Guide for more information.
Functions
- Result<std::shared_ptr<Array>>ArrayFromJSONString(conststd::shared_ptr<DataType>&,conststd::string&json)#
Create anArray from a JSON string.
Result<std::shared_ptr<Array>>maybe_array=ArrayFromJSONString(int64(),"[2, 3, null, 7, 11]");
- Result<std::shared_ptr<Array>>ArrayFromJSONString(conststd::shared_ptr<DataType>&,std::string_viewjson)#
Create anArray from a JSON string.
Result<std::shared_ptr<Array>>maybe_array=ArrayFromJSONString(int64(),"[2, 3, null, 7, 11]");
- Result<std::shared_ptr<Array>>ArrayFromJSONString(conststd::shared_ptr<DataType>&,constchar*json)#
Create anArray from a JSON string.
Result<std::shared_ptr<Array>>maybe_array=ArrayFromJSONString(int64(),"[2, 3, null, 7, 11]");
- Result<std::shared_ptr<ChunkedArray>>ChunkedArrayFromJSONString(conststd::shared_ptr<DataType>&type,conststd::vector<std::string>&json_strings)#
Create aChunkedArray from a JSON string.
Result<std::shared_ptr<ChunkedArray>>maybe_chunked_array=ChunkedArrayFromJSONString(int64(),{R"([5, 10])",R"([null])",R"([16])"});
- Result<std::shared_ptr<Array>>DictArrayFromJSONString(conststd::shared_ptr<DataType>&,std::string_viewindices_json,std::string_viewdictionary_json)#
Create aDictionaryArray from a JSON string.
Result<std::shared_ptr<Array>>maybe_dict_array=DictArrayFromJSONString(dictionary(int32(),utf8()),"[0, 1, 0, 2, 0, 3]",R"(["k1", "k2", "k3", "k4"])");
- Result<std::shared_ptr<Scalar>>ScalarFromJSONString(conststd::shared_ptr<DataType>&,std::string_viewjson)#
Create aScalar from a JSON string.
Result<std::shared_ptr<Scalar>>maybe_scalar=ScalarFromJSONString(float64(),"42",&scalar);
- Result<std::shared_ptr<Scalar>>DictScalarFromJSONString(conststd::shared_ptr<DataType>&,std::string_viewindex_json,std::string_viewdictionary_json)#
Create aDictionaryScalar from a JSON string.
Result<std::shared_ptr<Scalar>>maybe_dict_scalar=DictScalarFromJSONString(dictionary(int32(),utf8()),"3",R"(["k1", "k2", "k3", "k4"])",&scalar);
- Result<std::shared_ptr<Array>>ArrayFromJSONString(conststd::shared_ptr<DataType>&,conststd::string&json)#

