Using Acero with Substrait#

In order to use Acero you will need to create an execution plan. This is themodel that describes the computation you want to apply to your data. Acero hasits own internal representation for execution plans but most users should notinteract with this directly as it will couple their code to Acero.

Substrait is an open standard for execution plans.Acero implements the Substrait “consumer” interface. This means that Acero canaccept a Substrait plan and fulfill the plan, loading the requested data andapplying the desired computation. By using Substrait plans users can easilyswitch out to a different execution engine at a later time.

Substrait Conformance#

Substrait defines a broad set of operators and functions for many differentsituations and it is unlikely that Acero will ever completely satisfy alldefined Substrait operators and functions. To help understand what featuresare available the following sections define which features have been currentlyimplemented in Acero and any caveats that apply.

Plans#

  • A plan should have a single top-level relation.

  • The consumer is currently based on version 0.20.0 of Substrait.Any features added that are newer will not be supported.

  • Due to a breaking change in 0.20.0 any Substrait plan older than 0.20.0will be rejected.

Extensions#

  • If a plan contains any extension type variations it will be rejected.

  • Advanced extensions can be provided by supplying a custom implementation ofarrow::engine::ExtensionProvider.

Relations (in general)#

  • Any relation not explicitly listed below will not be supportedand will cause the plan to be rejected.

Read Relations#

  • Theprojection property is not supported and plans containing thisproperty will be rejected.

  • TheVirtualTable andExtensionTable read types are not supported.Plans containing these types will be rejected.

  • Only the parquet and arrow file formats are currently supported.

  • All URIs must use thefile scheme

  • partition_index,start, andlength are not supported. Plans containingnon-default values for these properties will be rejected.

  • The Substrait spec requires that afilter be completely satisfied by a readrelation. However, Acero only uses a read filter for pushdown projection andit may not be fully satisfied. Users should generally attach an additionalfilter relation with the same filter expression after the read relation.

Filter Relations#

  • No known caveats

Project Relations#

  • No known caveats

Join Relations#

  • The join typeJOIN_TYPE_SINGLE is not supported and plans containing thiswill be rejected.

  • The join expression must be a call to either theequal oris_not_distinct_fromfunctions. Both arguments to the call must be direct references. Only a singlejoin key is supported.

  • Thepost_join_filter property is not supported and will be ignored.

Aggregate Relations#

  • At most one grouping set is supported.

  • Each grouping expression must be a direct reference.

  • Each measure’s arguments must be direct references.

  • A measure may not have a filter

  • A measure may not have sorts

  • A measure’s invocation must be AGGREGATION_INVOCATION_ALL orAGGREGATION_INVOCATION_UNSPECIFIED

  • A measure’s phase must be AGGREGATION_PHASE_INITIAL_TO_RESULT

Expressions (general)#

  • Various places in the Substrait spec allow for expressions to be used outsideof a filter or project relation. For example, a join expression or an aggregategrouping set. Acero typically expects these expressions to be direct references.Planners should extract the implicit projection into a formal project relationbefore delivering the plan to Acero.

Literals#

  • A literal with non-default nullability will cause a plan to be rejected.

Types#

  • Acero does not have full support for non-nullable types and may allow inputto have nulls without rejecting it.

  • The table below shows the mapping between Arrow types and Substrait typeclasses that are currently supported

Substrait / Arrow Type Mapping#

Substrait Type

Arrow Type

Caveat

boolean

boolean

i8

int8

i16

int16

i32

int32

i64

int64

fp32

float32

fp64

float64

string

string

binary

binary

timestamp

timestamp<MICRO,””>

timestamp_tz

timestamp<MICRO,”UTC”>

date

date32<DAY>

time

time64<MICRO>

interval_year

Not currently supported

interval_day

Not currently supported

uuid

Not currently supported

FIXEDCHAR<L>

Not currently supported

VARCHAR<L>

Not currently supported

FIXEDBINARY<L>

fixed_size_binary<L>

DECIMAL<P,S>

decimal128<P,S>

STRUCT<T1…TN>

struct<T1…TN>

Arrow struct fields will have no name (empty string)

NSTRUCT<N:T1…N:Tn>

Not currently supported

LIST<T>

list<T>

MAP<K,V>

map<K,V>

K must not be nullable

Functions#

  • The following functions have caveats or are not supported at all. Note thatthis is not a comprehensive list. Functions are being added to Substrait ata rapid pace and new functions may be missing.

    • Acero does not support the SATURATE option for overflow

    • Acero does not support kernels that take more than two argumentsfor the functionsand,or,xor

  • Substrait has not yet clearly identified the form that URIs should take forstandard functions. Acero will look for the URIs to themain GitHub branch.In other words, for the filefunctions_arithmetic.yaml Acero expectshttps://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml

    • Acero has functions that are not yet a part of Substrait (or may never be addedas official functions). To invoke these functions you can use the special URIurn:arrow:substrait_simple_extension_function. If this URI is encounteredthen Acero will match only on function name and will ignore any function options.

    • Alternatively, the URI can be left completely empty and Acero will matchbased only on function name. This fallback mechanism is non-standard and shouldbe considered deprecated in favor of the special URI above.