pyarrow.record_batch #

pyarrow.record_batch(data,names=None,schema=None,metadata=None)#

Create a pyarrow.RecordBatch from another Python data structure or sequenceof arrays.

Parameters:

datadict,list,pandas.DataFrame, Arrow-compatibletable: A mapping of strings to Arrays or Python lists, a list of Arrays,a pandas DataFame, or any tabular object implementing theArrow PyCapsule Protocol (has an__arrow_c_array__ or__arrow_c_device_array__ method).
nameslist, defaultNone: Column names if list of arrays passed as data. Mutually exclusive with‘schema’ argument.
schemaSchema, defaultNone: The expected schema of the RecordBatch. If not passed, will be inferredfrom the data. Mutually exclusive with ‘names’ argument.
metadatadict or Mapping, defaultNone: Optional metadata for the schema (if schema not passed).

Returns:

RecordBatch

See also

RecordBatch.from_arrays,RecordBatch.from_pandas,table

Examples

>>>importpyarrowaspa>>>n_legs=pa.array([2,2,4,4,5,100])>>>animals=pa.array(["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"])>>>names=["n_legs","animals"]

Construct a RecordBatch from a python dictionary:

>>>pa.record_batch({"n_legs":n_legs,"animals":animals})pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]>>>pa.record_batch({"n_legs":n_legs,"animals":animals}).to_pandas()   n_legs        animals0       2       Flamingo1       2         Parrot2       4            Dog3       4          Horse4       5  Brittle stars5     100      Centipede

Creating a RecordBatch from a list of arrays with names:

>>>pa.record_batch([n_legs,animals],names=names)pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]

Creating a RecordBatch from a list of arrays with names and metadata:

>>>my_metadata={"n_legs":"How many legs does an animal have?"}>>>pa.record_batch([n_legs,animals],...names=names,...metadata=my_metadata)pyarrow.RecordBatchn_legs: int64animals: string----n_legs: [2,2,4,4,5,100]animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]>>>pa.record_batch([n_legs,animals],...names=names,...metadata=my_metadata).scheman_legs: int64animals: string-- schema metadata --n_legs: 'How many legs does an animal have?'

Creating a RecordBatch from a pandas DataFrame:

>>>importpandasaspd>>>df=pd.DataFrame({'year':[2020,2022,2021,2022],...'month':[3,5,7,9],...'day':[1,5,9,13],...'n_legs':[2,4,5,100],...'animals':["Flamingo","Horse","Brittle stars","Centipede"]})>>>pa.record_batch(df)pyarrow.RecordBatchyear: int64month: int64day: int64n_legs: int64animals: string----year: [2020,2022,2021,2022]month: [3,5,7,9]day: [1,5,9,13]n_legs: [2,4,5,100]animals: ["Flamingo","Horse","Brittle stars","Centipede"]

>>>pa.record_batch(df).to_pandas()   year  month  day  n_legs        animals0  2020      3    1       2       Flamingo1  2022      5    5       4          Horse2  2021      7    9       5  Brittle stars3  2022      9   13     100      Centipede

Creating a RecordBatch from a pandas DataFrame with schema:

>>>my_schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>pa.record_batch(df,my_schema).scheman_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'pandas: ...>>>pa.record_batch(df,my_schema).to_pandas()   n_legs        animals0       2       Flamingo1       4          Horse2       5  Brittle stars3     100      Centipede

On this page

Edit on GitHub

Movatterモバイル変換

pyarrow.record_batch#

pyarrow.record_batch #