Compute Functions#
Datum class#
- classDatum#
Variant type for various Arrow C++ data structures.
Public Types
Public Functions
- Datum(std::shared_ptr<ChunkedArray>value)#
Construct from aChunkedArray.
- Datum(std::shared_ptr<RecordBatch>value)#
Construct from aRecordBatch.
- explicitDatum(constChunkedArray&value)#
Construct from aChunkedArray.
This can be expensive, prefer the shared_ptr<ChunkedArray> constructor
- explicitDatum(constRecordBatch&value)#
Construct from aRecordBatch.
This can be expensive, prefer the shared_ptr<RecordBatch> constructor
- explicitDatum(constTable&value)#
Construct from aTable.
This can be expensive, prefer the shared_ptr<Table> constructor
- template<typenameT,boolIsArray=std::is_base_of_v<Array,T>,boolIsScalar=std::is_base_of_v<Scalar,T>,typename=enable_if_t<IsArray||IsScalar>>
inlineDatum(std::shared_ptr<T>value)#
- template<typenameT,typenameTV=typenamestd::remove_reference_t<T>,boolIsArray=std::is_base_of_v<Array,T>,boolIsScalar=std::is_base_of_v<Scalar,T>,typename=enable_if_t<IsArray||IsScalar>>
inlineDatum(T&&value)#
- template<typenameT,typename=enable_if_t<std::is_base_of_v<Scalar,T>>>
inlineDatum(constT&value)# Copy from concrete subtypes ofScalar.
The concrete scalar type must be copyable (not all of them are).
- explicitDatum(boolvalue)#
Convenience constructor storing a bool scalar.
- explicitDatum(int8_tvalue)#
Convenience constructor storing an int8 scalar.
- explicitDatum(uint8_tvalue)#
Convenience constructor storing a uint8 scalar.
- explicitDatum(int16_tvalue)#
Convenience constructor storing an int16 scalar.
- explicitDatum(uint16_tvalue)#
Convenience constructor storing a uint16 scalar.
- explicitDatum(int32_tvalue)#
Convenience constructor storing an int32 scalar.
- explicitDatum(uint32_tvalue)#
Convenience constructor storing a uint32 scalar.
- explicitDatum(int64_tvalue)#
Convenience constructor storing an int64 scalar.
- explicitDatum(uint64_tvalue)#
Convenience constructor storing a uint64 scalar.
- explicitDatum(floatvalue)#
Convenience constructor storing a float scalar.
- explicitDatum(doublevalue)#
Convenience constructor storing a double scalar.
- explicitDatum(std::stringvalue)#
Convenience constructor storing a string scalar.
- explicitDatum(constchar*value)#
Convenience constructor storing a string scalar.
- template<template<typename,typename>classStdDuration,typenameRep,typenamePeriod,typename=decltype(DurationScalar{StdDuration<Rep,Period>{}})>
inlineexplicitDatum(StdDuration<Rep,Period>d)# Convenience constructor for aDurationScalar from std::chrono::duration.
- inlineconststd::shared_ptr<ArrayData>&array()const#
Retrieve the stored array asArrayData.
Usemake_array() if anArray is desired (which is more expensive).
- Throws:
std::bad_variant_access – if the datum is not an array
- int64_tTotalBufferSize()const#
The sum of bytes in each buffer referenced by the datum Note: Scalars report a size of 0.
See also
arrow::util::TotalBufferSize for caveats
- inlineArrayData*mutable_array()const#
Get the storedArrayData in mutable form.
For internal use primarily. Keep in mind a shared_ptr<Datum> may have multiple owners.
- std::shared_ptr<Array>make_array()const#
Retrieve the stored array asArray.
- Throws:
std::bad_variant_access – if the datum is not an array
- inlineconststd::shared_ptr<ChunkedArray>&chunked_array()const#
Retrieve the chunked array stored.
- Throws:
std::bad_variant_access – if the datum is not a chunked array
- inlineconststd::shared_ptr<RecordBatch>&record_batch()const#
Retrieve the record batch stored.
- Throws:
std::bad_variant_access – if the datum is not a record batch
- inlineconststd::shared_ptr<Table>&table()const#
Retrieve the table stored.
- Throws:
std::bad_variant_access – if the datum is not a table
- inlineconststd::shared_ptr<Scalar>&scalar()const#
Retrieve the scalar stored.
- Throws:
std::bad_variant_access – if the datum is not a scalar
- template<typenameExactType>
inlinestd::shared_ptr<ExactType>array_as()const# Retrieve the datum as its concrete array type.
- Throws:
std::bad_variant_access – if the datum is not an array
- Template Parameters:
ExactType – the expected array type, may cause undefined behavior if it is not the type of the stored array
- template<typenameExactType>
inlineconstExactType&scalar_as()const# Retrieve the datum as its concrete scalar type.
- Throws:
std::bad_variant_access – if the datum is not a scalar
- Template Parameters:
ExactType – the expected scalar type, may cause undefined behavior if it is not the type of the stored scalar
- int64_tnull_count()const#
Return the null count.
Only valid for scalar and array-like data.
- conststd::shared_ptr<DataType>&type()const#
The value type of the variant, if any.
- Returns:
nullptr if no type
- conststd::shared_ptr<Schema>&schema()const#
The schema of the variant, if any.
- Returns:
nullptr if no schema
- int64_tlength()const#
The value length of the variant, if any.
- Returns:
kUnknownLength if no type
- ArrayVectorchunks()const#
The array chunks of the variant, if any.
- Returns:
empty if not arraylike
Public Members
- std::variant<Empty,std::shared_ptr<Scalar>,std::shared_ptr<ArrayData>,std::shared_ptr<ChunkedArray>,std::shared_ptr<RecordBatch>,std::shared_ptr<Table>>value#
Storage of the actual datum.
Note: For arrays,ArrayData is stored instead ofArray for easier processing
Public Static Attributes
- staticconstexprint64_tkUnknownLength=-1#
Datums variants may have a length.
This special value indicate that the current variant does not have a length.
- structEmpty#
A placeholder type to represent empty datum.
- Datum(std::shared_ptr<ChunkedArray>value)#
Abstract Function classes#
- voidPrintTo(constFunctionOptions&,std::ostream*)#
- structArity#
- #include <arrow/compute/function.h>
Contains the number of required arguments for the function.
Naming conventions taken fromhttps://en.wikipedia.org/wiki/Arity.
Public Members
- intnum_args#
The number of required arguments (or the minimum number for varargs functions).
- boolis_varargs=false#
If true, then the num_args is the minimum number of required arguments.
Public Static Functions
- intnum_args#
- structFunctionDoc#
- #include <arrow/compute/function.h>
Public Members
- std::stringsummary#
A one-line summary of the function, using a verb.
For example, “Add two numeric arrays or scalars”.
- std::stringdescription#
A detailed description of the function, meant to follow the summary.
- std::vector<std::string>arg_names#
Symbolic names (identifiers) for the function arguments.
Some bindings may use this to generate nicer function signatures.
- std::stringoptions_class#
Name of the options class, if any.
- booloptions_required#
Whether options are required for function execution.
If false, then either the function does not have an options class or there is a usable default options value.
- std::stringsummary#
- classFunctionExecutor#
- #include <arrow/compute/function.h>
An executor of a function with a preconfigured kernel.
Public Functions
- virtualStatusInit(constFunctionOptions*options=NULLPTR,ExecContext*exec_ctx=NULLPTR)=0#
Initialize or re-initialize the preconfigured kernel.
This method may be called zero or more times. Depending on how theFunctionExecutor was obtained, it may already have been initialized.
- virtualResult<Datum>Execute(conststd::vector<Datum>&args,int64_tlength=-1)=0#
Execute the preconfigured kernel with arguments that must fit it.
The method requires the arguments be castable to the preconfigured types.
- Parameters:
args –[in] Arguments to execute the function on
length –[in] Length of arguments batch or -1 to default it. If the function has no parameters, this determines the batch length, defaulting to 0. Otherwise, if the function is scalar, this must equal the argument batch’s inferred length or be -1 to default to it. This is ignored for vector functions.
- virtualStatusInit(constFunctionOptions*options=NULLPTR,ExecContext*exec_ctx=NULLPTR)=0#
- classFunction#
- #include <arrow/compute/function.h>
Base class for compute functions.
Function implementations contain a collection of “kernels” which are implementations of the function for specific argument types. Selecting a viable kernel for executing a function is referred to as “dispatching”.
Subclassed by arrow::compute::detail::FunctionImpl< HashAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarAggregateKernel >, arrow::compute::detail::FunctionImpl< ScalarKernel >, arrow::compute::detail::FunctionImpl< VectorKernel >,arrow::compute::MetaFunction, arrow::compute::detail::FunctionImpl< KernelType >
Public Types
- enumKind#
The kind of function, which indicates in what contexts it is valid for use.
Values:
- enumeratorSCALAR#
A function that performs scalar data operations on whole arrays of data.
Can generally processArray orScalar values. The size of the output will be the same as the size (or broadcasted size, in the case of mixingArray andScalar inputs) of the input.
- enumeratorVECTOR#
A function with array input and output whose behavior depends on the values of the entire arrays passed, rather than the value of each scalar value.
- enumeratorSCALAR_AGGREGATE#
A function that computes scalar summary statistics from array input.
- enumeratorHASH_AGGREGATE#
A function that computes grouped summary statistics from array input and an array of group identifiers.
- enumeratorMETA#
A function that dispatches to other functions and does not contain its own kernels.
- enumeratorSCALAR#
Public Functions
- inlineconststd::string&name()const#
The name of the kernel. The registry enforces uniqueness of names.
- inlineFunction::Kindkind()const#
The kind of kernel, which indicates in what contexts it is valid for use.
- inlineconstArity&arity()const#
Contains the number of arguments the function requires, or if the function accepts variable numbers of arguments.
- inlineconstFunctionDoc&doc()const#
Return the function documentation.
- virtualintnum_kernels()const=0#
Returns the number of registered kernels for this function.
- virtualResult<constKernel*>DispatchExact(conststd::vector<TypeHolder>&types)const#
Return a kernel that can execute the function given the exact argument types (without implicit type casts).
NB: This function is overridden in CastFunction.
- virtualResult<constKernel*>DispatchBest(std::vector<TypeHolder>*values)const#
Return a best-match kernel that can execute the function given the argument types, after implicit casts are applied.
- Parameters:
values –[inout] Argument types. An element may be modified to indicate that the returned kernel only approximately matches the input value descriptors; callers are responsible for casting inputs to the type required by the kernel.
- virtualResult<std::shared_ptr<FunctionExecutor>>GetBestExecutor(std::vector<TypeHolder>inputs)const#
Get a function executor with a best-matching kernel.
The returned executor will by default work with the defaultFunctionOptions and KernelContext. If you want to change that, call
FunctionExecutor::Init.
- virtualResult<Datum>Execute(conststd::vector<Datum>&args,constFunctionOptions*options,ExecContext*ctx)const#
Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
optionspointer is null, thendefault_options()will be used.This function can be overridden in subclasses.
- inlineconstFunctionOptions*default_options()const#
Returns the default options for this function.
Whatever option semantics aFunction has, implementations must guarantee thatdefault_options() is valid to pass to Execute as options.
- inlinevirtualboolis_pure()const#
Returns the pure property for this function.
Impure functions are those that may return different results for the same input arguments. For example, a function that returns a random number is not pure. An expression containing only pure functions can be simplified by pre-evaluating any sub-expressions that have constant arguments.
- enumKind#
- classScalarFunction:publicarrow::compute::detail::FunctionImpl<ScalarKernel>#
- #include <arrow/compute/function.h>
A function that executes elementwise operations on arrays or scalars, and therefore whose results generally do not depend on the order of the values in the arguments.
Accepts and returns arrays that are all of the same size. These functions roughly correspond to the functions used in SQL expressions.
Public Functions
- StatusAddKernel(std::vector<InputType>in_types,OutputTypeout_type,ArrayKernelExecexec,KernelInitinit=NULLPTR,std::shared_ptr<MatchConstraint>constraint=NULLPTR)#
Add a kernel with given input/output types, no required state initialization, preallocation for fixed-width types, and default null handling (intersect validity bitmaps of inputs).
- StatusAddKernel(ScalarKernelkernel)#
Add a kernel (function implementation).
Returns error if the kernel’s signature does not match the function’s arity.
- inlinevirtualboolis_pure()constoverride#
Returns the pure property for this function.
- StatusAddKernel(std::vector<InputType>in_types,OutputTypeout_type,ArrayKernelExecexec,KernelInitinit=NULLPTR,std::shared_ptr<MatchConstraint>constraint=NULLPTR)#
- classVectorFunction:publicarrow::compute::detail::FunctionImpl<VectorKernel>#
- #include <arrow/compute/function.h>
A function that executes general array operations that may yield outputs of different sizes or have results that depend on the whole array contents.
These functions roughly correspond to the functions found in non-SQL array languages like APL and its derivatives.
- classScalarAggregateFunction:publicarrow::compute::detail::FunctionImpl<ScalarAggregateKernel>#
- #include <arrow/compute/function.h>
- classHashAggregateFunction:publicarrow::compute::detail::FunctionImpl<HashAggregateKernel>#
- #include <arrow/compute/function.h>
- classMetaFunction:publicarrow::compute::Function#
- #include <arrow/compute/function.h>
A function that dispatches to other functions.
Must implement MetaFunction::ExecuteImpl.
ForArray,ChunkedArray, andScalarDatum kinds, may rely on the execution of concreteFunction types, but must handle otherDatum kinds on its own.
Public Functions
- inlinevirtualintnum_kernels()constoverride#
Returns the number of registered kernels for this function.
- virtualResult<Datum>Execute(conststd::vector<Datum>&args,constFunctionOptions*options,ExecContext*ctx)constoverride#
Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
optionspointer is null, thendefault_options()will be used.This function can be overridden in subclasses.
- inlinevirtualintnum_kernels()constoverride#
- classFunctionOptionsType#
- #include <arrow/compute/function_options.h>
Extension point for defining options outside libarrow (but still within this project).
- classFunctionOptions:publicarrow::util::EqualityComparable<FunctionOptions>#
- #include <arrow/compute/function_options.h>
Base class for specifying options configuring a function’s behavior, such as error handling.
Subclassed byarrow::compute::ArithmeticOptions,arrow::compute::ArraySortOptions,arrow::compute::AssumeTimezoneOptions,arrow::compute::CastOptions,arrow::compute::CountOptions,arrow::compute::CumulativeOptions,arrow::compute::DayOfWeekOptions,arrow::compute::DictionaryEncodeOptions,arrow::compute::ElementWiseAggregateOptions,arrow::compute::ExtractRegexOptions,arrow::compute::ExtractRegexSpanOptions,arrow::compute::FilterOptions,arrow::compute::IndexOptions,arrow::compute::InversePermutationOptions,arrow::compute::JoinOptions,arrow::compute::ListFlattenOptions,arrow::compute::ListSliceOptions,arrow::compute::MakeStructOptions,arrow::compute::MapLookupOptions,arrow::compute::MatchSubstringOptions,arrow::compute::ModeOptions,arrow::compute::NullOptions,arrow::compute::PadOptions,arrow::compute::PairwiseOptions,arrow::compute::PartitionNthOptions,arrow::compute::PivotWiderOptions,arrow::compute::QuantileOptions,arrow::compute::RandomOptions,arrow::compute::RankOptions,arrow::compute::RankQuantileOptions,arrow::compute::ReplaceSliceOptions,arrow::compute::ReplaceSubstringOptions,arrow::compute::RoundBinaryOptions,arrow::compute::RoundOptions,arrow::compute::RoundTemporalOptions,arrow::compute::RoundToMultipleOptions,arrow::compute::RunEndEncodeOptions,arrow::compute::ScalarAggregateOptions,arrow::compute::ScatterOptions,arrow::compute::SelectKOptions,arrow::compute::SetLookupOptions,arrow::compute::SkewOptions,arrow::compute::SliceOptions,arrow::compute::SortOptions,arrow::compute::SplitOptions,arrow::compute::SplitPatternOptions,arrow::compute::StrftimeOptions,arrow::compute::StrptimeOptions,arrow::compute::StructFieldOptions,arrow::compute::TDigestOptions,arrow::compute::TakeOptions,arrow::compute::TrimOptions,arrow::compute::Utf8NormalizeOptions,arrow::compute::VarianceOptions,arrow::compute::WeekOptions,arrow::compute::WinsorizeOptions,arrow::compute::ZeroFillOptions
Public Functions
Public Static Functions
- staticResult<std::unique_ptr<FunctionOptions>>Deserialize(conststd::string&type_name,constBuffer&buffer)#
Deserialize an options struct from a buffer.
Note: this will only look for
type_namein the defaultFunctionRegistry; to use a customFunctionRegistry, look up theFunctionOptionsType, then call FunctionOptionsType::Deserialize().
- staticResult<std::unique_ptr<FunctionOptions>>Deserialize(conststd::string&type_name,constBuffer&buffer)#
Function execution#
- Result<std::shared_ptr<FunctionExecutor>>GetFunctionExecutor(conststd::string&func_name,std::vector<TypeHolder>in_types,constFunctionOptions*options=NULLPTR,FunctionRegistry*func_registry=NULLPTR)#
One-shot executor provider for all types of functions.
This function creates and initializes a
FunctionExecutorappropriate for the given function name, input types and function options.
- Result<std::shared_ptr<FunctionExecutor>>GetFunctionExecutor(conststd::string&func_name,conststd::vector<Datum>&args,constFunctionOptions*options=NULLPTR,FunctionRegistry*func_registry=NULLPTR)#
One-shot executor provider for all types of functions.
This function creates and initializes a
FunctionExecutorappropriate for the given function name, input types (taken from theDatum arguments) and function options.
Function registry#
- classFunctionRegistry#
A mutable central function registry for built-in functions as well as user-defined functions.
Functions are implementations ofarrow::compute::Function.
Generally, each function contains kernels which are implementations of a function for a specific argument signature. After looking up a function in the registry, one can either execute it eagerly withFunction::Execute or use one of the function’s dispatch methods to pick a suitable kernel for lower-level function execution.
Public Functions
- StatusCanAddFunction(std::shared_ptr<Function>function,boolallow_overwrite=false)#
Check whether a new function can be added to the registry.
- Returns:
Status::KeyError if a function with the same name is already registered.
- StatusAddFunction(std::shared_ptr<Function>function,boolallow_overwrite=false)#
Add a new function to the registry.
- Returns:
Status::KeyError if a function with the same name is already registered.
- StatusCanAddAlias(conststd::string&target_name,conststd::string&source_name)#
Check whether an alias can be added for the given function name.
- Returns:
Status::KeyError if the function with the given name is not registered.
- StatusAddAlias(conststd::string&target_name,conststd::string&source_name)#
Add alias for the given function name.
- Returns:
Status::KeyError if the function with the given name is not registered.
- StatusCanAddFunctionOptionsType(constFunctionOptionsType*options_type,boolallow_overwrite=false)#
Check whether a new function options type can be added to the registry.
- Returns:
Status::KeyError if a function options type with the same name is already registered.
- StatusAddFunctionOptionsType(constFunctionOptionsType*options_type,boolallow_overwrite=false)#
Add a new function options type to the registry.
- Returns:
Status::KeyError if a function options type with the same name is already registered.
- Result<std::shared_ptr<Function>>GetFunction(conststd::string&name)const#
Retrieve a function by name from the registry.
- std::vector<std::string>GetFunctionNames()const#
Return vector of all entry names in the registry.
Helpful for displaying a manifest of available functions.
- Result<constFunctionOptionsType*>GetFunctionOptionsType(conststd::string&name)const#
Retrieve a function options type by name from the registry.
- intnum_functions()const#
The number of currently registered functions.
Public Static Functions
- staticstd::unique_ptr<FunctionRegistry>Make()#
Construct a new registry.
Most users only need to use the global registry.
- staticstd::unique_ptr<FunctionRegistry>Make(FunctionRegistry*parent)#
Construct a new nested registry with the given parent.
Most users only need to use the global registry. The returned registry never changes its parent, even when an operation allows overwriting.
- StatusCanAddFunction(std::shared_ptr<Function>function,boolallow_overwrite=false)#
- FunctionRegistry*arrow::compute::GetFunctionRegistry()#
Return the process-global function registry.
Convenience functions#
- Result<Datum>CallFunction(conststd::string&func_name,conststd::vector<Datum>&args,constFunctionOptions*options,ExecContext*ctx=NULLPTR)#
One-shot invoker for all types of functions.
Does kernel dispatch, argument checking, iteration ofChunkedArray inputs, and wrapping of outputs.
- Result<Datum>CallFunction(conststd::string&func_name,conststd::vector<Datum>&args,ExecContext*ctx=NULLPTR)#
Variant of CallFunction which uses a function’s default options.
NB: Some functions requireFunctionOptions be provided.
- Result<Datum>CallFunction(conststd::string&func_name,constExecBatch&batch,constFunctionOptions*options,ExecContext*ctx=NULLPTR)#
One-shot invoker for all types of functions.
Does kernel dispatch, argument checking, iteration ofChunkedArray inputs, and wrapping of outputs.
- Result<Datum>CallFunction(conststd::string&func_name,constExecBatch&batch,ExecContext*ctx=NULLPTR)#
Variant of CallFunction which uses a function’s default options.
NB: Some functions requireFunctionOptions be provided.
Concrete options classes#
- enumclassRoundMode:int8_t#
Rounding and tie-breaking modes for round compute functions.
Additional details and examples are provided in compute.rst.
Values:
- enumeratorDOWN#
Round to nearest integer less than or equal in magnitude (aka “floor”)
- enumeratorUP#
Round to nearest integer greater than or equal in magnitude (aka “ceil”)
- enumeratorTOWARDS_ZERO#
Get the integral part without fractional digits (aka “trunc”)
- enumeratorTOWARDS_INFINITY#
Round negative values with DOWN rule and positive values with UP rule (aka “away from zero”)
- enumeratorHALF_DOWN#
Round ties with DOWN rule (also called “round half towards negative infinity”)
- enumeratorHALF_UP#
Round ties with UP rule (also called “round half towards positive infinity”)
- enumeratorHALF_TOWARDS_ZERO#
Round ties with TOWARDS_ZERO rule (also called “round half away from infinity”)
- enumeratorHALF_TOWARDS_INFINITY#
Round ties with TOWARDS_INFINITY rule (also called “round half away from zero”)
- enumeratorHALF_TO_EVEN#
Round ties to nearest even integer.
- enumeratorHALF_TO_ODD#
Round ties to nearest odd integer.
- enumeratorDOWN#
- enumclassCalendarUnit:int8_t#
Values:
- enumeratorNANOSECOND#
- enumeratorMICROSECOND#
- enumeratorMILLISECOND#
- enumeratorSECOND#
- enumeratorMINUTE#
- enumeratorHOUR#
- enumeratorDAY#
- enumeratorWEEK#
- enumeratorMONTH#
- enumeratorQUARTER#
- enumeratorYEAR#
- enumeratorNANOSECOND#
- enumCompareOperator#
Values:
- enumeratorEQUAL#
- enumeratorNOT_EQUAL#
- enumeratorGREATER#
- enumeratorGREATER_EQUAL#
- enumeratorLESS#
- enumeratorLESS_EQUAL#
- enumeratorEQUAL#
- usingCumulativeSumOptions=CumulativeOptions#
- classScalarAggregateOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control general scalar aggregate kernel behavior.
By default, null values are ignored (skip_nulls = true).
Public Functions
- explicitScalarAggregateOptions(boolskip_nulls=true,uint32_tmin_count=1)#
Public Members
- boolskip_nulls#
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
- uint32_tmin_count#
If less than this many non-null values are observed, emit null.
Public Static Functions
- staticinlineScalarAggregateOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ScalarAggregateOptions"#
- explicitScalarAggregateOptions(boolskip_nulls=true,uint32_tmin_count=1)#
- classCountOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control count aggregate kernel behavior.
By default, only non-null values are counted.
Public Types
Public Functions
- explicitCountOptions(CountModemode=CountMode::ONLY_VALID)#
Public Static Functions
- staticinlineCountOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="CountOptions"#
- explicitCountOptions(CountModemode=CountMode::ONLY_VALID)#
- classModeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control Mode kernel behavior.
Returns top-n common values and counts. By default, returns the most common value and count.
Public Functions
- explicitModeOptions(int64_tn=1,boolskip_nulls=true,uint32_tmin_count=0)#
Public Members
- int64_tn=1#
- boolskip_nulls#
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
- uint32_tmin_count#
If less than this many non-null values are observed, emit null.
Public Static Functions
- staticinlineModeOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ModeOptions"#
- explicitModeOptions(int64_tn=1,boolskip_nulls=true,uint32_tmin_count=0)#
- classVarianceOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control Delta Degrees of Freedom (ddof) of Variance and Stddev kernel.
The divisor used in calculations is N - ddof, where N is the number of elements. By default, ddof is zero, and population variance or stddev is returned.
Public Functions
- explicitVarianceOptions(intddof=0,boolskip_nulls=true,uint32_tmin_count=0)#
Public Members
- intddof=0#
- boolskip_nulls#
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
- uint32_tmin_count#
If less than this many non-null values are observed, emit null.
Public Static Functions
- staticinlineVarianceOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="VarianceOptions"#
- explicitVarianceOptions(intddof=0,boolskip_nulls=true,uint32_tmin_count=0)#
- classSkewOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control Skew and Kurtosis kernel behavior.
Public Functions
- explicitSkewOptions(boolskip_nulls=true,boolbiased=true,uint32_tmin_count=0)#
Public Members
- boolskip_nulls#
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
- boolbiased#
If true (the default), the calculated value is biased.
If false, the calculated value includes a correction factor to reduce bias, making it more accurate for small sample sizes.
- uint32_tmin_count#
If less than this many non-null values are observed, emit null.
Public Static Functions
- staticinlineSkewOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SkewOptions"#
- explicitSkewOptions(boolskip_nulls=true,boolbiased=true,uint32_tmin_count=0)#
- classQuantileOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control Quantile kernel behavior.
By default, returns the median value.
Public Types
Public Functions
- explicitQuantileOptions(doubleq=0.5,enumInterpolationinterpolation=LINEAR,boolskip_nulls=true,uint32_tmin_count=0)#
- explicitQuantileOptions(std::vector<double>q,enumInterpolationinterpolation=LINEAR,boolskip_nulls=true,uint32_tmin_count=0)#
Public Members
- std::vector<double>q#
probability level of quantile must be between 0 and 1 inclusive
- enumInterpolationinterpolation#
- boolskip_nulls#
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
- uint32_tmin_count#
If less than this many non-null values are observed, emit null.
Public Static Functions
- staticinlineQuantileOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="QuantileOptions"#
- explicitQuantileOptions(doubleq=0.5,enumInterpolationinterpolation=LINEAR,boolskip_nulls=true,uint32_tmin_count=0)#
- classTDigestOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control TDigest approximate quantile kernel behavior.
By default, returns the median value.
Public Functions
- explicitTDigestOptions(doubleq=0.5,uint32_tdelta=100,uint32_tbuffer_size=500,boolskip_nulls=true,uint32_tmin_count=0)#
- explicitTDigestOptions(std::vector<double>q,uint32_tdelta=100,uint32_tbuffer_size=500,boolskip_nulls=true,uint32_tmin_count=0)#
Public Members
- std::vector<double>q#
probability level of quantile must be between 0 and 1 inclusive
- uint32_tdelta#
compression parameter, default 100
- uint32_tbuffer_size#
input buffer size, default 500
- boolskip_nulls#
If true (the default), null values are ignored.
Otherwise, if any value is null, emit null.
- uint32_tmin_count#
If less than this many non-null values are observed, emit null.
Public Static Functions
- staticinlineTDigestOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="TDigestOptions"#
- explicitTDigestOptions(doubleq=0.5,uint32_tdelta=100,uint32_tbuffer_size=500,boolskip_nulls=true,uint32_tmin_count=0)#
- classPivotWiderOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control Pivot kernel behavior.
These options apply to the “pivot_wider” and “hash_pivot_wider” functions.
Constraints:
The corresponding
Aggregate::targetmust have twoFieldRef elements; the first one points to the pivot key column, the second points to the pivoted data column.The pivot key column can be string, binary or integer; its values will be matched against
key_namesin order to dispatch the pivoted data into the output. If the pivot key column is not string-like, thekey_nameswill be cast to the pivot key type.
“pivot_wider” example
Assuming the following two input columns with types utf8 and int16 (respectively):
and the optionswidth|11height|13
PivotWiderOptions(.key_names={"height","width"})then the output will be a scalar with the type
struct{"height":int16,"width":int16}and the value{"height":13,"width":11}.“hash_pivot_wider” example
Assuming the following input with schema
{"group":int32,"key":utf8,"value":int16}:and the following settings:group|key|value-----------------------------1|height|111|width|122|width|133|height|143|depth|15
a hash grouping key “group”
Aggregate( .function = “hash_pivot_wider”, .options =PivotWiderOptions(.key_names = {“height”, “width”}), .target = {“key”, “value”}, .name = {“properties”})
then the output will have the schema
{"group":int32,"properties":struct{"height":int16,"width":int16}}and the following value:group|properties|height|width-----------------------------1|11|122|null|133|14|null
Public Types
Public Functions
- explicitPivotWiderOptions(std::vector<std::string>key_names,UnexpectedKeyBehaviorunexpected_key_behavior=kIgnore)#
- PivotWiderOptions()#
Public Members
- std::vector<std::string>key_names#
The values expected in the pivot key column.
- UnexpectedKeyBehaviorunexpected_key_behavior=kIgnore#
The behavior when pivot keys not in
key_namesare encountered.
Public Static Functions
- staticinlinePivotWiderOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="PivotWiderOptions"#
- classIndexOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_aggregate.h>
Control Index kernel behavior.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="IndexOptions"#
- staticconstexprconstcharkTypeName[]="IndexOptions"#
- structAggregate#
- #include <arrow/compute/api_aggregate.h>
Configure a grouped aggregation.
Public Functions
- Aggregate()=default#
- inlineAggregate(std::stringfunction,std::shared_ptr<FunctionOptions>options,std::vector<FieldRef>target,std::stringname="")#
- inlineAggregate(std::stringfunction,std::shared_ptr<FunctionOptions>options,FieldReftarget,std::stringname="")#
- inlineAggregate(std::stringfunction,std::stringname)#
- Aggregate()=default#
- classArithmeticOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitArithmeticOptions(boolcheck_overflow=false)#
Public Members
- boolcheck_overflow#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ArithmeticOptions"#
- explicitArithmeticOptions(boolcheck_overflow=false)#
- classElementWiseAggregateOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitElementWiseAggregateOptions(boolskip_nulls=true)#
Public Members
- boolskip_nulls#
Public Static Functions
- staticinlineElementWiseAggregateOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ElementWiseAggregateOptions"#
- explicitElementWiseAggregateOptions(boolskip_nulls=true)#
- classRoundOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitRoundOptions(int64_tndigits=0,RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
Public Members
- int64_tndigits#
Rounding precision (number of digits to round to)
Public Static Functions
- staticinlineRoundOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RoundOptions"#
- explicitRoundOptions(int64_tndigits=0,RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
- classRoundBinaryOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitRoundBinaryOptions(RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
Public Static Functions
- staticinlineRoundBinaryOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RoundBinaryOptions"#
- explicitRoundBinaryOptions(RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
- classRoundTemporalOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitRoundTemporalOptions(intmultiple=1,CalendarUnitunit=CalendarUnit::DAY,boolweek_starts_monday=true,boolceil_is_strictly_greater=false,boolcalendar_based_origin=false)#
Public Members
- intmultiple#
Number of units to round to.
- CalendarUnitunit#
The unit used for rounding of time.
- boolweek_starts_monday#
What day does the week start with (Monday=true, Sunday=false)
- boolceil_is_strictly_greater#
Enable this flag to return a rounded value that is strictly greater than the input.
For example: ceiling 1970-01-01T00:00:00 to 3 hours would yield 1970-01-01T03:00:00 if set to true and 1970-01-01T00:00:00 if set to false. This applies for ceiling only.
- boolcalendar_based_origin#
By default time is rounded to a multiple of units since 1970-01-01T00:00:00.
By setting calendar_based_origin to true, time will be rounded to a number of units since the last greater calendar unit. For example: rounding to a multiple of days since the beginning of the month or to hours since the beginning of the day. Exceptions: week and quarter are not used as greater units, therefore days will will be rounded to the beginning of the month not week. Greater unit of week is year. Note that ceiling and rounding might change sorting order of an array near greater unit change. For example rounding YYYY-mm-dd 23:00:00 to 5 hours will ceil and round to YYYY-mm-dd+1 01:00:00 and floor to YYYY-mm-dd 20:00:00. On the other hand YYYY-mm-dd+1 00:00:00 will ceil, round and floor to YYYY-mm-dd+1 00:00:00. This can break the order of an already ordered array.
Public Static Functions
- staticinlineRoundTemporalOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RoundTemporalOptions"#
- explicitRoundTemporalOptions(intmultiple=1,CalendarUnitunit=CalendarUnit::DAY,boolweek_starts_monday=true,boolceil_is_strictly_greater=false,boolcalendar_based_origin=false)#
- classRoundToMultipleOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitRoundToMultipleOptions(doublemultiple=1.0,RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
- explicitRoundToMultipleOptions(std::shared_ptr<Scalar>multiple,RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
Public Members
Public Static Functions
- staticinlineRoundToMultipleOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RoundToMultipleOptions"#
- explicitRoundToMultipleOptions(doublemultiple=1.0,RoundModeround_mode=RoundMode::HALF_TO_EVEN)#
- classJoinOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Options for var_args_join.
Public Types
- enumNullHandlingBehavior#
How to handle null values. (A null separator always results in a null output.)
Values:
- enumeratorEMIT_NULL#
A null in any input results in a null in the output.
- enumeratorSKIP#
Nulls in inputs are skipped.
- enumeratorREPLACE#
Nulls in inputs are replaced with the replacement string.
- enumeratorEMIT_NULL#
Public Functions
- explicitJoinOptions(NullHandlingBehaviornull_handling=EMIT_NULL,std::stringnull_replacement="")#
Public Static Functions
- staticinlineJoinOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="JoinOptions"#
- enumNullHandlingBehavior#
- classMatchSubstringOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitMatchSubstringOptions(std::stringpattern,boolignore_case=false)#
- MatchSubstringOptions()#
Public Members
- std::stringpattern#
The exact substring (or regex, depending on kernel) to look for inside input values.
- boolignore_case#
Whether to perform a case-insensitive match.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="MatchSubstringOptions"#
- explicitMatchSubstringOptions(std::stringpattern,boolignore_case=false)#
- classSplitOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitSplitOptions(int64_tmax_splits=-1,boolreverse=false)#
Public Members
- int64_tmax_splits#
Maximum number of splits allowed, or unlimited when -1.
- boolreverse#
Start splitting from the end of the string (only relevant when max_splits != -1)
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SplitOptions"#
- explicitSplitOptions(int64_tmax_splits=-1,boolreverse=false)#
- classSplitPatternOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitSplitPatternOptions(std::stringpattern,int64_tmax_splits=-1,boolreverse=false)#
- SplitPatternOptions()#
Public Members
- std::stringpattern#
The exact substring to split on.
- int64_tmax_splits#
Maximum number of splits allowed, or unlimited when -1.
- boolreverse#
Start splitting from the end of the string (only relevant when max_splits != -1)
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SplitPatternOptions"#
- explicitSplitPatternOptions(std::stringpattern,int64_tmax_splits=-1,boolreverse=false)#
- classReplaceSliceOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitReplaceSliceOptions(int64_tstart,int64_tstop,std::stringreplacement)#
- ReplaceSliceOptions()#
Public Members
- int64_tstart#
Index to start slicing at.
- int64_tstop#
Index to stop slicing at.
- std::stringreplacement#
String to replace the slice with.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ReplaceSliceOptions"#
- explicitReplaceSliceOptions(int64_tstart,int64_tstop,std::stringreplacement)#
- classReplaceSubstringOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitReplaceSubstringOptions(std::stringpattern,std::stringreplacement,int64_tmax_replacements=-1)#
- ReplaceSubstringOptions()#
Public Members
- std::stringpattern#
Pattern to match, literal, or regular expression depending on which kernel is used.
- std::stringreplacement#
String to replace the pattern with.
- int64_tmax_replacements#
Max number of substrings to replace (-1 means unbounded)
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ReplaceSubstringOptions"#
- explicitReplaceSubstringOptions(std::stringpattern,std::stringreplacement,int64_tmax_replacements=-1)#
- classExtractRegexOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Members
- std::stringpattern#
Regular expression with named capture fields.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ExtractRegexOptions"#
- std::stringpattern#
- classExtractRegexSpanOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Members
- std::stringpattern#
Regular expression with named capture fields.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ExtractRegexSpanOptions"#
- std::stringpattern#
- classSetLookupOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Options for IsIn and IndexIn functions.
Public Types
- enumNullMatchingBehavior#
How to handle null values.
Values:
- enumeratorMATCH#
MATCH, any null in
value_setis successfully matched in the input.
- enumeratorSKIP#
SKIP, any null in
value_setis ignored and nulls in the input produce null (IndexIn) or false (IsIn) values in the output.
- enumeratorEMIT_NULL#
EMIT_NULL, any null in
value_setis ignored and nulls in the input produce null (IndexIn and IsIn) values in the output.
- enumeratorINCONCLUSIVE#
INCONCLUSIVE, null values are regarded as unknown values, which is sql-compatible.
nulls in the input produce null (IndexIn and IsIn) values in the output. Besides, if
value_setcontains a null, non-null unmatched values in the input also produce null values (IndexIn and IsIn) in the output.
- enumeratorMATCH#
Public Functions
- explicitSetLookupOptions(Datumvalue_set,NullMatchingBehavior=MATCH)#
- SetLookupOptions()#
- NullMatchingBehaviorGetNullMatchingBehavior()const#
Public Members
- NullMatchingBehaviornull_matching_behavior#
- std::optional<bool>skip_nulls#
Whether nulls in
value_setcount for lookup.If true, any null in
value_setis ignored and nulls in the input produce null (IndexIn) or false (IsIn) values in the output. If false, any null invalue_setis successfully matched in the input.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SetLookupOptions"#
- enumNullMatchingBehavior#
- classStructFieldOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Options for struct_field function.
Public Functions
- explicitStructFieldOptions(std::vector<int>indices)#
- explicitStructFieldOptions(std::initializer_list<int>)#
- StructFieldOptions()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="StructFieldOptions"#
- explicitStructFieldOptions(std::vector<int>indices)#
- classStrptimeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- StrptimeOptions()#
Public Members
- std::stringformat#
The desired format string.
- boolerror_is_null#
Return null on parsing errors if true or raise if false.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="StrptimeOptions"#
- StrptimeOptions()#
- classStrftimeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
- classPadOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitPadOptions(int64_twidth,std::stringpadding="",boollean_left_on_odd_padding=true)#
- PadOptions()#
Public Members
- int64_twidth#
The desired string length.
- std::stringpadding#
What to pad the string with. Should be one codepoint (Unicode)/byte (ASCII).
- boollean_left_on_odd_padding=true#
What to do if there is an odd number of padding characters (in case of centered padding).
Defaults to aligning on the left (i.e. adding the extra padding character on the right)
Public Static Attributes
- staticconstexprconstcharkTypeName[]="PadOptions"#
- explicitPadOptions(int64_twidth,std::stringpadding="",boollean_left_on_odd_padding=true)#
- classZeroFillOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Members
- int64_twidth#
The desired string length.
- std::stringpadding#
What to pad the string with. Should be one codepoint (Unicode).
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ZeroFillOptions"#
- int64_twidth#
- classTrimOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Members
- std::stringcharacters#
The individual characters to be trimmed from the string.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="TrimOptions"#
- std::stringcharacters#
- classSliceOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitSliceOptions(int64_tstart,int64_tstop=std::numeric_limits<int64_t>::max(),int64_tstep=1)#
- SliceOptions()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SliceOptions"#
- explicitSliceOptions(int64_tstart,int64_tstop=std::numeric_limits<int64_t>::max(),int64_tstep=1)#
- classListSliceOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitListSliceOptions(int64_tstart,std::optional<int64_t>stop=std::nullopt,int64_tstep=1,std::optional<bool>return_fixed_size_list=std::nullopt)#
- ListSliceOptions()#
Public Members
- int64_tstart#
The start of list slicing.
- std::optional<int64_t>stop#
Optional stop of list slicing. If not set, then slice to end. (NotImplemented)
- int64_tstep#
Slicing step.
- std::optional<bool>return_fixed_size_list#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ListSliceOptions"#
- explicitListSliceOptions(int64_tstart,std::optional<int64_t>stop=std::nullopt,int64_tstep=1,std::optional<bool>return_fixed_size_list=std::nullopt)#
- classNullOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitNullOptions(boolnan_is_null=false)#
Public Members
- boolnan_is_null#
Public Static Functions
- staticinlineNullOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="NullOptions"#
- explicitNullOptions(boolnan_is_null=false)#
- structCompareOptions#
- #include <arrow/compute/api_scalar.h>
Public Members
- enumCompareOperatorop#
- enumCompareOperatorop#
- classMakeStructOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- MakeStructOptions(std::vector<std::string>n,std::vector<bool>r,std::vector<std::shared_ptr<constKeyValueMetadata>>m)#
- explicitMakeStructOptions(std::vector<std::string>n)#
- MakeStructOptions()#
Public Members
- std::vector<std::string>field_names#
Names for wrapped columns.
- std::vector<bool>field_nullability#
Nullability bits for wrapped columns.
- std::vector<std::shared_ptr<constKeyValueMetadata>>field_metadata#
Metadata attached to wrapped columns.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="MakeStructOptions"#
- MakeStructOptions(std::vector<std::string>n,std::vector<bool>r,std::vector<std::shared_ptr<constKeyValueMetadata>>m)#
- structDayOfWeekOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitDayOfWeekOptions(boolcount_from_zero=true,uint32_tweek_start=1)#
Public Members
- boolcount_from_zero#
Number days from 0 if true and from 1 if false.
- uint32_tweek_start#
What day does the week start with (Monday=1, Sunday=7).
The numbering is unaffected by the count_from_zero parameter.
Public Static Functions
- staticinlineDayOfWeekOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="DayOfWeekOptions"#
- explicitDayOfWeekOptions(boolcount_from_zero=true,uint32_tweek_start=1)#
- structAssumeTimezoneOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Used to control timestamp timezone conversion and handling ambiguous/nonexistent times.
Public Types
- enumAmbiguous#
How to interpret ambiguous local times that can be interpreted as multiple instants (normally two) due to DST shifts.
AMBIGUOUS_EARLIEST emits the earliest instant amongst possible interpretations. AMBIGUOUS_LATEST emits the latest instant amongst possible interpretations.
Values:
- enumeratorAMBIGUOUS_RAISE#
- enumeratorAMBIGUOUS_EARLIEST#
- enumeratorAMBIGUOUS_LATEST#
- enumeratorAMBIGUOUS_RAISE#
- enumNonexistent#
How to handle local times that do not exist due to DST shifts.
NONEXISTENT_EARLIEST emits the instant “just before” the DST shift instant in the given timestamp precision (for example, for a nanoseconds precision timestamp, this is one nanosecond before the DST shift instant). NONEXISTENT_LATEST emits the DST shift instant.
Values:
- enumeratorNONEXISTENT_RAISE#
- enumeratorNONEXISTENT_EARLIEST#
- enumeratorNONEXISTENT_LATEST#
- enumeratorNONEXISTENT_RAISE#
Public Functions
- explicitAssumeTimezoneOptions(std::stringtimezone,Ambiguousambiguous=AMBIGUOUS_RAISE,Nonexistentnonexistent=NONEXISTENT_RAISE)#
- AssumeTimezoneOptions()#
Public Members
- std::stringtimezone#
Timezone to convert timestamps from.
- Nonexistentnonexistent#
How to interpret nonexistent local times (due to DST shifts)
Public Static Attributes
- staticconstexprconstcharkTypeName[]="AssumeTimezoneOptions"#
- enumAmbiguous#
- structWeekOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Functions
- explicitWeekOptions(boolweek_starts_monday=true,boolcount_from_zero=false,boolfirst_week_is_fully_in_year=false)#
Public Members
- boolweek_starts_monday#
What day does the week start with (Monday=true, Sunday=false)
- boolcount_from_zero#
Dates from current year that fall into last ISO week of the previous year return 0 if true and 52 or 53 if false.
- boolfirst_week_is_fully_in_year#
Must the first week be fully in January (true), or is a week that begins on December 29, 30, or 31 considered to be the first week of the new year (false)?
Public Static Functions
- staticinlineWeekOptionsDefaults()#
- staticinlineWeekOptionsISODefaults()#
- staticinlineWeekOptionsUSDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="WeekOptions"#
- explicitWeekOptions(boolweek_starts_monday=true,boolcount_from_zero=false,boolfirst_week_is_fully_in_year=false)#
- structUtf8NormalizeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Static Functions
- staticinlineUtf8NormalizeOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="Utf8NormalizeOptions"#
- staticinlineUtf8NormalizeOptionsDefaults()#
- classRandomOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Public Members
- Initializerinitializer#
The type of initialization for random number generation - system or provided seed.
- uint64_tseed#
The seed value used to initialize the random number generation.
Public Static Functions
- staticinlineRandomOptionsFromSystemRandom()#
- staticinlineRandomOptionsFromSeed(uint64_tseed)#
- staticinlineRandomOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RandomOptions"#
- Initializerinitializer#
- classMapLookupOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_scalar.h>
Options for map_lookup function.
Public Types
Public Functions
- explicitMapLookupOptions(std::shared_ptr<Scalar>query_key,Occurrenceoccurrence)#
- MapLookupOptions()#
Public Members
- Occurrenceoccurrence#
Whether to return the first, last, or all matching values.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="MapLookupOptions"#
- explicitMapLookupOptions(std::shared_ptr<Scalar>query_key,Occurrenceoccurrence)#
- classFilterOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Public Types
Public Functions
- explicitFilterOptions(NullSelectionBehaviornull_selection=DROP)#
Public Members
- NullSelectionBehaviornull_selection_behavior=DROP#
Public Static Functions
- staticinlineFilterOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="FilterOptions"#
- explicitFilterOptions(NullSelectionBehaviornull_selection=DROP)#
- classTakeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Public Functions
- explicitTakeOptions(boolboundscheck=true)#
Public Members
- boolboundscheck=true#
Public Static Functions
- staticinlineTakeOptionsBoundsCheck()#
- staticinlineTakeOptionsNoBoundsCheck()#
- staticinlineTakeOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="TakeOptions"#
- explicitTakeOptions(boolboundscheck=true)#
- classDictionaryEncodeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for the dictionary encode function.
Public Types
Public Functions
- explicitDictionaryEncodeOptions(NullEncodingBehaviornull_encoding=MASK)#
Public Members
- NullEncodingBehaviornull_encoding_behavior=MASK#
Public Static Functions
- staticinlineDictionaryEncodeOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="DictionaryEncodeOptions"#
- explicitDictionaryEncodeOptions(NullEncodingBehaviornull_encoding=MASK)#
- classRunEndEncodeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for the run-end encode function.
Public Static Functions
- staticinlineRunEndEncodeOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RunEndEncodeOptions"#
- staticinlineRunEndEncodeOptionsDefaults()#
- classArraySortOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Public Functions
- explicitArraySortOptions(SortOrderorder=SortOrder::Ascending,NullPlacementnull_placement=NullPlacement::AtEnd)#
Public Members
- SortOrderorder#
Sorting order.
- NullPlacementnull_placement#
Whether nulls and NaNs are placed at the start or at the end.
Public Static Functions
- staticinlineArraySortOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ArraySortOptions"#
- explicitArraySortOptions(SortOrderorder=SortOrder::Ascending,NullPlacementnull_placement=NullPlacement::AtEnd)#
- classSortOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Public Functions
- explicitSortOptions(std::vector<SortKey>sort_keys={},NullPlacementnull_placement=NullPlacement::AtEnd)#
- explicitSortOptions(constOrdering&ordering)#
- inlineOrderingAsOrdering()&&#
Convenience constructor to create an ordering fromSortOptions.
Note: Both classes contain the exact same information. However, sort_options should only be used in a “function options” context while Ordering is used more generally.
- inlineOrderingAsOrdering()const&#
Public Members
- std::vector<SortKey>sort_keys#
Column key(s) to order by and how to order by these sort keys.
- NullPlacementnull_placement#
Whether nulls and NaNs are placed at the start or at the end.
Public Static Functions
- staticinlineSortOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SortOptions"#
- explicitSortOptions(std::vector<SortKey>sort_keys={},NullPlacementnull_placement=NullPlacement::AtEnd)#
- classSelectKOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
SelectK options.
Public Functions
- explicitSelectKOptions(int64_tk=-1,std::vector<SortKey>sort_keys={})#
Public Members
- int64_tk#
The number of
kelements to keep.
- std::vector<SortKey>sort_keys#
Column key(s) to order by and how to order by these sort keys.
Public Static Functions
- staticinlineSelectKOptionsDefaults()#
- staticinlineSelectKOptionsTopKDefault(int64_tk,std::vector<std::string>key_names={})#
- staticinlineSelectKOptionsBottomKDefault(int64_tk,std::vector<std::string>key_names={})#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="SelectKOptions"#
- explicitSelectKOptions(int64_tk=-1,std::vector<SortKey>sort_keys={})#
- classRankOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Rank options.
Public Types
- enumTiebreaker#
Configure how ties between equal values are handled.
Values:
- enumeratorMin#
Ties get the smallest possible rank in sorted order.
- enumeratorMax#
Ties get the largest possible rank in sorted order.
- enumeratorFirst#
Ranks are assigned in order of when ties appear in the input.
This ensures the ranks are a stable permutation of the input.
- enumeratorDense#
The ranks span a dense [1, M] interval where M is the number of distinct values in the input.
- enumeratorMin#
Public Functions
- explicitRankOptions(std::vector<SortKey>sort_keys={},NullPlacementnull_placement=NullPlacement::AtEnd,Tiebreakertiebreaker=RankOptions::First)#
- inlineexplicitRankOptions(SortOrderorder,NullPlacementnull_placement=NullPlacement::AtEnd,Tiebreakertiebreaker=RankOptions::First)#
Convenience constructor for array inputs.
Public Members
- std::vector<SortKey>sort_keys#
Column key(s) to order by and how to order by these sort keys.
- NullPlacementnull_placement#
Whether nulls and NaNs are placed at the start or at the end.
- Tiebreakertiebreaker#
Tiebreaker for dealing with equal values in ranks.
Public Static Functions
- staticinlineRankOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RankOptions"#
- enumTiebreaker#
- classRankQuantileOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Quantile rank options.
Public Functions
- explicitRankQuantileOptions(std::vector<SortKey>sort_keys={},NullPlacementnull_placement=NullPlacement::AtEnd)#
- inlineexplicitRankQuantileOptions(SortOrderorder,NullPlacementnull_placement=NullPlacement::AtEnd)#
Convenience constructor for array inputs.
Public Members
- std::vector<SortKey>sort_keys#
Column key(s) to order by and how to order by these sort keys.
- NullPlacementnull_placement#
Whether nulls and NaNs are placed at the start or at the end.
Public Static Functions
- staticinlineRankQuantileOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="RankQuantileOptions"#
- explicitRankQuantileOptions(std::vector<SortKey>sort_keys={},NullPlacementnull_placement=NullPlacement::AtEnd)#
- classPartitionNthOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Partitioning options for NthToIndices.
Public Functions
- explicitPartitionNthOptions(int64_tpivot,NullPlacementnull_placement=NullPlacement::AtEnd)#
- inlinePartitionNthOptions()#
Public Members
- int64_tpivot#
The index into the equivalent sorted array of the partition pivot element.
- NullPlacementnull_placement#
Whether nulls and NaNs are partitioned at the start or at the end.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="PartitionNthOptions"#
- explicitPartitionNthOptions(int64_tpivot,NullPlacementnull_placement=NullPlacement::AtEnd)#
- classWinsorizeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Public Members
- doublelower_limit#
The quantile below which all values are replaced with the quantile’s value.
For example, if lower_limit = 0.05, then all values in the lower 5% percentile will be replaced with the 5% percentile value.
- doubleupper_limit#
The quantile above which all values are replaced with the quantile’s value.
For example, if upper_limit = 0.95, then all values in the upper 95% percentile will be replaced with the 95% percentile value.
Public Static Attributes
- staticconstexprconstcharkTypeName[]="WinsorizeOptions"#
- doublelower_limit#
- classCumulativeOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for cumulative functions.
Note
Also aliased as CumulativeSumOptions for backward compatibility
Public Functions
- explicitCumulativeOptions(boolskip_nulls=false)#
- explicitCumulativeOptions(doublestart,boolskip_nulls=false)#
Public Members
- std::optional<std::shared_ptr<Scalar>>start#
Optional starting value for cumulative operation computation, default depends on the operation and input type.
sum: 0
prod: 1
min: maximum of the input type
max: minimum of the input type
mean: start is ignored because it has no meaning for mean
- boolskip_nulls=false#
If true, nulls in the input are ignored and produce a corresponding null output.
When false, the first null encountered is propagated through the remaining output.
Public Static Functions
- staticinlineCumulativeOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="CumulativeOptions"#
- explicitCumulativeOptions(boolskip_nulls=false)#
- classPairwiseOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for pairwise functions.
Public Functions
- explicitPairwiseOptions(int64_tperiods=1)#
Public Members
- int64_tperiods=1#
Periods to shift for applying the binary operation, accepts negative values.
Public Static Functions
- staticinlinePairwiseOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="PairwiseOptions"#
- explicitPairwiseOptions(int64_tperiods=1)#
- classListFlattenOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for list_flatten function.
Public Functions
- explicitListFlattenOptions(boolrecursive=false)#
Public Members
- boolrecursive=false#
If true, the list is flattened recursively until a non-list array is formed.
Public Static Functions
- staticinlineListFlattenOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ListFlattenOptions"#
- explicitListFlattenOptions(boolrecursive=false)#
- classInversePermutationOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for inverse_permutation function.
Public Functions
Public Members
- int64_tmax_index=-1#
The max value in the input indices to allow.
The length of the function’s output will be this value plus 1. If negative, this value will be set to the length of the input indices minus 1 and the length of the function’s output will be the length of the input indices.
Public Static Functions
- staticinlineInversePermutationOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="InversePermutationOptions"#
- int64_tmax_index=-1#
- classScatterOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/api_vector.h>
Options for scatter function.
Public Functions
- explicitScatterOptions(int64_tmax_index=-1)#
Public Members
- int64_tmax_index=-1#
The max value in the input indices to allow.
The length of the function’s output will be this value plus 1. If negative, this value will be set to the length of the input indices minus 1 and the length of the function’s output will be the length of the input indices.
Public Static Functions
- staticinlineScatterOptionsDefaults()#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="ScatterOptions"#
- explicitScatterOptions(int64_tmax_index=-1)#
- classCastOptions:publicarrow::compute::FunctionOptions#
- #include <arrow/compute/cast.h>
Public Functions
- explicitCastOptions(boolsafe=true)#
- boolis_safe()const#
true if the safety options all matchCastOptions::Safe
Note, if this returns false it does not mean is_unsafe will return true
- boolis_unsafe()const#
true if the safety options all matchCastOptions::Unsafe
Note, if this returns false it does not mean is_safe will return true
Public Members
- TypeHolderto_type#
- boolallow_int_overflow#
- boolallow_time_truncate#
- boolallow_time_overflow#
- boolallow_decimal_truncate#
- boolallow_float_truncate#
- boolallow_invalid_utf8#
Public Static Functions
- staticinlineCastOptionsSafe(TypeHolderto_type={})#
- staticinlineCastOptionsUnsafe(TypeHolderto_type={})#
Public Static Attributes
- staticconstexprconstcharkTypeName[]="CastOptions"#
- explicitCastOptions(boolsafe=true)#
Compute Expressions#
- inlinebooloperator==(constExpression&l,constExpression&r)#
- inlinebooloperator!=(constExpression&l,constExpression&r)#
- voidPrintTo(constExpression&,std::ostream*)#
- Expressionliteral(Datumlit)#
- Expressionfield_ref(FieldRefref)#
- Expressioncall(std::stringfunction,std::vector<Expression>arguments,std::shared_ptr<FunctionOptions>options=NULLPTR)#
- template<typenameOptions,typename=typenamestd::enable_if<std::is_base_of<FunctionOptions,Options>::value>::type>
Expressioncall(std::stringfunction,std::vector<Expression>arguments,Optionsoptions)#
- std::vector<FieldRef>FieldsInExpression(constExpression&)#
Assemble a list of all fields referenced by anExpression at any depth.
- boolExpressionHasFieldRefs(constExpression&)#
Check if the expression references any fields.
- Result<KnownFieldValues>ExtractKnownFieldValues(constExpression&guaranteed_true_predicate)#
Assemble a mapping from field references to known values.
This derives known values from “equal” and “is_null” Expressions referencing a field and a literal.
- classExpression#
- #include <arrow/compute/expression.h>
An unbound expression which maps a singleDatum to anotherDatum.
An expression is one of
A literalDatum.
A reference to a single (potentially nested) field of the inputDatum.
A call to a compute function, with arguments specified by other Expressions.
Public Functions
- Result<Expression>Bind(constTypeHolder&in,ExecContext*=NULLPTR)const#
Bind this expression to the given input type, looking up Kernels and field types.
Some expression simplification may be performed and implicit casts will be inserted. Any state necessary for execution will be initialized and returned.
- boolIsBound()const#
Return true if all an expression’s field references have explicit types and all of its functions’ kernels are looked up.
- boolIsScalarExpression()const#
Return true if this expression is composed only ofScalar literals, field references, and calls to ScalarFunctions.
- boolIsNullLiteral()const#
Return true if this expression is literal and entirely null.
- boolIsSatisfiable()const#
Return true if this expression could evaluate to true.
Will return true for any unbound or non-boolean Expressions. IsSatisfiable does not (currently) do any canonicalization or simplification of the expression, so even Expressions which are unsatisfiable may spuriously return
truehere. This function is intended for use in predicate pushdown where a filter expression is simplified by a guarantee, so it assumes that trying to simplify again would be redundant.
- structCall#
- #include <arrow/compute/expression.h>
- structHash#
- #include <arrow/compute/expression.h>
- structParameter#
- #include <arrow/compute/expression.h>
- Expressionproject(std::vector<Expression>values,std::vector<std::string>names)#
- Expressionequal(Expressionlhs,Expressionrhs)#
- Expressionnot_equal(Expressionlhs,Expressionrhs)#
- Expressionless(Expressionlhs,Expressionrhs)#
- Expressionless_equal(Expressionlhs,Expressionrhs)#
- Expressiongreater(Expressionlhs,Expressionrhs)#
- Expressiongreater_equal(Expressionlhs,Expressionrhs)#
- Expressionis_null(Expressionlhs,boolnan_is_null=false)#
- Expressionis_valid(Expressionlhs)#
- Expressionand_(Expressionlhs,Expressionrhs)#
- Expressionand_(conststd::vector<Expression>&)#
- Expressionor_(Expressionlhs,Expressionrhs)#
- Expressionor_(conststd::vector<Expression>&)#
- Expressionnot_(Expressionoperand)#
- Result<Expression>Canonicalize(Expression,ExecContext*=NULLPTR)#
Weak canonicalization which establishes guarantees for subsequent passes.
Even equivalent Expressions may result in different canonicalized expressions. TODO this could be a strong canonicalization
- Result<Expression>FoldConstants(Expression)#
Simplify Expressions based on literal arguments (for example, add(null, x) will always be null so replace the call with a null literal).
Includes early evaluation of all calls whose arguments are entirely literal.
- Result<Expression>ReplaceFieldsWithKnownValues(constKnownFieldValues&known_values,Expression)#
Simplify Expressions by replacing with known values of the fields which it references.
- Result<Expression>SimplifyWithGuarantee(Expression,constExpression&guaranteed_true_predicate)#
Simplify an expression by replacing subexpressions based on a guarantee: a boolean expression which is guaranteed to evaluate to
true.For example, this is used to remove redundant function calls from a filter expression or to replace a reference to a constant-value field with a literal.
- Result<Expression>RemoveNamedRefs(Expressionexpression)#
Replace all named field refs (e.g.
“x” or “x.y”) with field paths (e.g. [0] or [1,3])
This isn’t usually needed and does not offer any simplification by itself. However, it can be useful to normalize an expression to paths to make it simpler to work with.

