pyarrow.record_batch#

pyarrow.record_batch(data,names=None,schema=None,metadata=None)#

Create a pyarrow.RecordBatch from another Python data structure or sequenceof arrays.

Parameters:
datadict,list,pandas.DataFrame, Arrow-compatibletable

A mapping of strings to Arrays or Python lists, a list of Arrays,a pandas DataFame, or any tabular object implementing theArrow PyCapsule Protocol (has an__arrow_c_array__ or__arrow_c_device_array__ method).

nameslist, defaultNone

Column names if list of arrays passed as data. Mutually exclusive with‘schema’ argument.

schemaSchema, defaultNone

The expected schema of the RecordBatch. If not passed, will be inferredfrom the data. Mutually exclusive with ‘names’ argument.

metadatadict or Mapping, defaultNone

Optional metadata for the schema (if schema not passed).

Returns:
RecordBatch

Examples

>>>importpyarrowaspa>>>n_legs=pa.array([2,2,4,4,5,100])>>>animals=pa.array(["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"])>>>names=["n_legs","animals"]

Construct a RecordBatch from a python dictionary:

>>>pa.record_batch({"n_legs":n_legs,"animals":animals})pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]>>>pa.record_batch({"n_legs":n_legs,"animals":animals}).to_pandas()   n_legs        animals0       2       Flamingo1       2         Parrot2       4            Dog3       4          Horse4       5  Brittle stars5     100      Centipede

Creating a RecordBatch from a list of arrays with names:

>>>pa.record_batch([n_legs,animals],names=names)pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]

Creating a RecordBatch from a list of arrays with names and metadata:

>>>my_metadata={"n_legs":"How many legs does an animal have?"}>>>pa.record_batch([n_legs,animals],...names=names,...metadata=my_metadata)pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]>>>pa.record_batch([n_legs,animals],...names=names,...metadata=my_metadata).scheman_legs: int64animals: string-- schema metadata --n_legs: 'How many legs does an animal have?'

Creating a RecordBatch from a pandas DataFrame:

>>>importpandasaspd>>>df=pd.DataFrame({'year':[2020,2022,2021,2022],...'month':[3,5,7,9],...'day':[1,5,9,13],...'n_legs':[2,4,5,100],...'animals':["Flamingo","Horse","Brittle stars","Centipede"]})>>>pa.record_batch(df)pyarrow.RecordBatchyear: int64month: int64day: int64n_legs: int64animals: string----year: [2020,2022,2021,2022]month: [3,5,7,9]day: [1,5,9,13]n_legs: [2,4,5,100]animals: ["Flamingo","Horse","Brittle stars","Centipede"]
>>>pa.record_batch(df).to_pandas()   year  month  day  n_legs        animals0  2020      3    1       2       Flamingo1  2022      5    5       4          Horse2  2021      7    9       5  Brittle stars3  2022      9   13     100      Centipede

Creating a RecordBatch from a pandas DataFrame with schema:

>>>my_schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>pa.record_batch(df,my_schema).scheman_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'pandas: ...>>>pa.record_batch(df,my_schema).to_pandas()   n_legs        animals0       2       Flamingo1       4          Horse2       5  Brittle stars3     100      Centipede