As an alternative to callingcollect() on aDataset query, you canuse this function to access the stream ofRecordBatches in theDataset.This lets you do more complex operations in R that operate on chunks of datawithout having to hold the entire Dataset in memory at once. You can includemap_batches() in a dplyr pipeline and do additional dplyr methods on thestream of data in Arrow after it.
Arguments
- X
A
Datasetorarrow_dplyr_queryobject, as returned by thedplyrmethods onDataset.- FUN
A function or
purrr-style lambda expression to apply to eachbatch. It must return a RecordBatch or something coercible to one via`as_record_batch()'.- ...
Additional arguments passed to
FUN- .schema
An optional
schema(). If NULL, the schema will be inferredfrom the first batch.- .lazy
Use
TRUEto evaluateFUNlazily as batches are read fromthe result; useFALSEto evaluateFUNon all batches before returningthe reader.- .data.frame
Deprecated argument, ignored