Apply a function to a stream of RecordBatches

Source:R/dataset-scan.R

map_batches.Rd

As an alternative to callingcollect() on aDataset query, you canuse this function to access the stream ofRecordBatches in theDataset.This lets you do more complex operations in R that operate on chunks of datawithout having to hold the entire Dataset in memory at once. You can includemap_batches() in a dplyr pipeline and do additional dplyr methods on thestream of data in Arrow after it.

Usage

map_batches(X,FUN,..., .schema=NULL, .lazy=TRUE, .data.frame=NULL)

Arguments

X: ADataset orarrow_dplyr_query object, as returned by thedplyr methods onDataset.
FUN: A function orpurrr-style lambda expression to apply to eachbatch. It must return a RecordBatch or something coercible to one via`as_record_batch()'.
...: Additional arguments passed toFUN
.schema: An optionalschema(). If NULL, the schema will be inferredfrom the first batch.
.lazy: UseTRUE to evaluateFUN lazily as batches are read fromthe result; useFALSE to evaluateFUN on all batches before returningthe reader.
.data.frame: Deprecated argument, ignored

Value

Anarrow_dplyr_query.

Details

This is experimental and not recommended for production use. It is alsosingle-threaded and runs in R not C++, so it won't be as fast as coreArrow methods.

Movatterモバイル変換

Using the package

Arrow concepts

Installation

Apply a function to a stream of RecordBatches

Usage

Arguments

Value

Details