Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Apply a function to a stream of RecordBatches

Source:R/dataset-scan.R
map_batches.Rd

As an alternative to callingcollect() on aDataset query, you canuse this function to access the stream ofRecordBatches in theDataset.This lets you do more complex operations in R that operate on chunks of datawithout having to hold the entire Dataset in memory at once. You can includemap_batches() in a dplyr pipeline and do additional dplyr methods on thestream of data in Arrow after it.

Usage

map_batches(X,FUN,..., .schema=NULL, .lazy=TRUE, .data.frame=NULL)

Arguments

X

ADataset orarrow_dplyr_query object, as returned by thedplyr methods onDataset.

FUN

A function orpurrr-style lambda expression to apply to eachbatch. It must return a RecordBatch or something coercible to one via`as_record_batch()'.

...

Additional arguments passed toFUN

.schema

An optionalschema(). If NULL, the schema will be inferredfrom the first batch.

.lazy

UseTRUE to evaluateFUN lazily as batches are read fromthe result; useFALSE to evaluateFUN on all batches before returningthe reader.

.data.frame

Deprecated argument, ignored

Value

Anarrow_dplyr_query.

Details

This is experimental and not recommended for production use. It is alsosingle-threaded and runs in R not C++, so it won't be as fast as coreArrow methods.


[8]ページ先頭

©2009-2025 Movatter.jp