pyarrow.substrait.run_query#
- pyarrow.substrait.run_query(plan,*,table_provider=None,use_threads=True)#
Execute a Substrait plan and read the results as a RecordBatchReader.
- Parameters:
- plan
Union[Buffer,bytes] The serialized Substrait plan to execute.
- table_providerobject (optional)
A function to resolve any NamedTable relation to a table.The function will receive two arguments which will be a listof strings representing the table name and a pyarrow.Schema representingthe expected schema and should return a pyarrow.Table.
- use_threadsbool, default
True If True then multiple threads will be used to run the query. If False thenall CPU intensive work will be done on the calling thread.
- plan
- Returns:
- RecordBatchReader
A reader containing the result of the executed query
Examples
>>>importpyarrowaspa>>>frompyarrow.libimporttobytes>>>importpyarrow.substraitassubstrait>>>test_table_1=pa.Table.from_pydict({"x":[1,2,3]})>>>test_table_2=pa.Table.from_pydict({"x":[4,5,6]})>>>deftable_provider(names,schema):...ifnotnames:...raiseException("No names provided")...elifnames[0]=="t1":...returntest_table_1...elifnames[1]=="t2":...returntest_table_2...else:...raiseException("Unrecognized table name")...>>>substrait_query='''... {... "relations": [... {"rel": {... "read": {... "base_schema": {... "struct": {... "types": [... {"i64":{}}... ]... },... "names": [... "x"... ]... },... "namedTable": {... "names": ["t1"]... }... }... }}... ]... }...'''>>>buf=pa._substrait._parse_json_plan(tobytes(substrait_query))>>>reader=pa.substrait.run_query(buf,table_provider=table_provider)>>>reader.read_all()pyarrow.Tablex: int64----x: [[1,2,3]]
On this page

