pyarrow.record_batch#
- pyarrow.record_batch(data,names=None,schema=None,metadata=None)#
Create a pyarrow.RecordBatch from another Python data structure or sequenceof arrays.
- Parameters:
- data
dict,list,pandas.DataFrame, Arrow-compatibletable A mapping of strings to Arrays or Python lists, a list of Arrays,a pandas DataFame, or any tabular object implementing theArrow PyCapsule Protocol (has an
__arrow_c_array__or__arrow_c_device_array__method).- names
list, defaultNone Column names if list of arrays passed as data. Mutually exclusive with‘schema’ argument.
- schema
Schema, defaultNone The expected schema of the RecordBatch. If not passed, will be inferredfrom the data. Mutually exclusive with ‘names’ argument.
- metadata
dictor Mapping, defaultNone Optional metadata for the schema (if schema not passed).
- data
- Returns:
Examples
>>>importpyarrowaspa>>>n_legs=pa.array([2,2,4,4,5,100])>>>animals=pa.array(["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"])>>>names=["n_legs","animals"]
Construct a RecordBatch from a python dictionary:
>>>pa.record_batch({"n_legs":n_legs,"animals":animals})pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]>>>pa.record_batch({"n_legs":n_legs,"animals":animals}).to_pandas() n_legs animals0 2 Flamingo1 2 Parrot2 4 Dog3 4 Horse4 5 Brittle stars5 100 Centipede
Creating a RecordBatch from a list of arrays with names:
>>>pa.record_batch([n_legs,animals],names=names)pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
Creating a RecordBatch from a list of arrays with names and metadata:
>>>my_metadata={"n_legs":"How many legs does an animal have?"}>>>pa.record_batch([n_legs,animals],...names=names,...metadata=my_metadata)pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]>>>pa.record_batch([n_legs,animals],...names=names,...metadata=my_metadata).scheman_legs: int64animals: string-- schema metadata --n_legs: 'How many legs does an animal have?'
Creating a RecordBatch from a pandas DataFrame:
>>>importpandasaspd>>>df=pd.DataFrame({'year':[2020,2022,2021,2022],...'month':[3,5,7,9],...'day':[1,5,9,13],...'n_legs':[2,4,5,100],...'animals':["Flamingo","Horse","Brittle stars","Centipede"]})>>>pa.record_batch(df)pyarrow.RecordBatchyear: int64month: int64day: int64n_legs: int64animals: string----year: [2020,2022,2021,2022]month: [3,5,7,9]day: [1,5,9,13]n_legs: [2,4,5,100]animals: ["Flamingo","Horse","Brittle stars","Centipede"]
>>>pa.record_batch(df).to_pandas() year month day n_legs animals0 2020 3 1 2 Flamingo1 2022 5 5 4 Horse2 2021 7 9 5 Brittle stars3 2022 9 13 100 Centipede
Creating a RecordBatch from a pandas DataFrame with schema:
>>>my_schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>pa.record_batch(df,my_schema).scheman_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'pandas: ...>>>pa.record_batch(df,my_schema).to_pandas() n_legs animals0 2 Flamingo1 4 Horse2 5 Brittle stars3 100 Centipede

