pyarrow.RecordBatchReader#
- classpyarrow.RecordBatchReader#
Bases:
_WeakrefableBase class for reading stream of record batches.
Record batch readers function as iterators of record batches that alsoprovide the schema (without the need to get any batches).
Warning
Do not call this class’s constructor directly, use one of the
RecordBatchReader.from_*functions instead.Notes
To import and export using the Arrow C stream interface, use the
_import_from_cand_export_to_cmethods. However, keep in mind thisinterface is intended for expert users.Examples
>>>importpyarrowaspa>>>schema=pa.schema([('x',pa.int64())])>>>defiter_record_batches():...foriinrange(2):...yieldpa.RecordBatch.from_arrays([pa.array([1,2,3])],schema=schema)>>>reader=pa.RecordBatchReader.from_batches(schema,iter_record_batches())>>>print(reader.schema)x: int64>>>forbatchinreader:...print(batch)pyarrow.RecordBatchx: int64----x: [1,2,3]pyarrow.RecordBatchx: int64----x: [1,2,3]
- __init__(*args,**kwargs)#
Methods
__init__(*args, **kwargs)cast(self, target_schema)Wrap this reader with one that casts each batch lazily as it is pulled.
close(self)Release any resources associated with the reader.
from_batches(Schema schema, batches)Create RecordBatchReader from an iterable of batches.
from_stream(data[, schema])Create RecordBatchReader from a Arrow-compatible stream object.
Iterate over record batches from the stream along with their custom metadata.
read_all(self)Read all record batches as a pyarrow.Table.
read_next_batch(self)Read next RecordBatch from the stream.
Read next RecordBatch from the stream along with its custom metadata.
read_pandas(self, **options)Read contents of stream to a pandas.DataFrame.
Attributes
Shared schema of the record batches in the stream.
- cast(self,target_schema)#
Wrap this reader with one that casts each batch lazily as it is pulled.Currently only a safe cast to target_schema is implemented.
- Parameters:
- target_schema
Schema Schema to cast to, the names and order of fields must match.
- target_schema
- Returns:
- RecordBatchReader
- close(self)#
Release any resources associated with the reader.
- staticfrom_batches(Schemaschema,batches)#
Create RecordBatchReader from an iterable of batches.
- Parameters:
- schema
Schema The shared schema of the record batches
- batches
Iterable[RecordBatch] The batches that this reader will return.
- schema
- Returns:
- readerRecordBatchReader
- staticfrom_stream(data,schema=None)#
Create RecordBatchReader from a Arrow-compatible stream object.
This accepts objects implementing the Arrow PyCapsule Protocol forstreams, i.e. objects that have a
__arrow_c_stream__method.
- iter_batches_with_custom_metadata(self)#
Iterate over record batches from the stream along with their custommetadata.
- Yields:
RecordBatchWithMetadata
- read_next_batch(self)#
Read next RecordBatch from the stream.
- Returns:
- Raises:
- StopIteration:
At end of stream.
- read_next_batch_with_custom_metadata(self)#
Read next RecordBatch from the stream along with its custom metadata.
- Returns:
- batch
RecordBatch - custom_metadata
KeyValueMetadata
- batch
- Raises:
- StopIteration:
At end of stream.
- read_pandas(self,**options)#
Read contents of stream to a pandas.DataFrame.
Read all record batches as a pyarrow.Table then convert it to apandas.DataFrame using Table.to_pandas.
- Parameters:
- **options
Arguments to forward to
Table.to_pandas().
- Returns:

