pyarrow.substrait.run_query#

pyarrow.substrait.run_query(plan,*,table_provider=None,use_threads=True)#

Execute a Substrait plan and read the results as a RecordBatchReader.

Parameters:
planUnion[Buffer,bytes]

The serialized Substrait plan to execute.

table_providerobject (optional)

A function to resolve any NamedTable relation to a table.The function will receive two arguments which will be a listof strings representing the table name and a pyarrow.Schema representingthe expected schema and should return a pyarrow.Table.

use_threadsbool, defaultTrue

If True then multiple threads will be used to run the query. If False thenall CPU intensive work will be done on the calling thread.

Returns:
RecordBatchReader

A reader containing the result of the executed query

Examples

>>>importpyarrowaspa>>>frompyarrow.libimporttobytes>>>importpyarrow.substraitassubstrait>>>test_table_1=pa.Table.from_pydict({"x":[1,2,3]})>>>test_table_2=pa.Table.from_pydict({"x":[4,5,6]})>>>deftable_provider(names,schema):...ifnotnames:...raiseException("No names provided")...elifnames[0]=="t1":...returntest_table_1...elifnames[1]=="t2":...returntest_table_2...else:...raiseException("Unrecognized table name")...>>>substrait_query='''...        {...            "relations": [...            {"rel": {...                "read": {...                "base_schema": {...                    "struct": {...                    "types": [...                                {"i64":{}}...                            ]...                    },...                    "names": [...                            "x"...                            ]...                },...                "namedTable": {...                        "names": ["t1"]...                }...                }...            }}...            ]...        }...'''>>>buf=pa._substrait._parse_json_plan(tobytes(substrait_query))>>>reader=pa.substrait.run_query(buf,table_provider=table_provider)>>>reader.read_all()pyarrow.Tablex: int64----x: [[1,2,3]]