Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Scan the contents of a dataset

Source:R/dataset-scan.R
Scanner.Rd

AScanner iterates over aDataset's fragments and returns dataaccording to given row filtering and column projection. AScannerBuildercan help create one.

Factory

Scanner$create() wraps theScannerBuilder interface to make aScanner.It takes the following arguments:

  • dataset: ADataset orarrow_dplyr_query object, as returned by thedplyr methods onDataset.

  • projection: A character vector of column names to select columns or anamed list of expressions

  • filter: AExpression to filter the scanned rows by, orTRUE (default)to keep all rows.

  • use_threads: logical: should scanning use multithreading? DefaultTRUE

  • ...: Additional arguments, currently ignored

Methods

ScannerBuilder has the following methods:

  • $Project(cols): Indicate that the scan should only return columns givenbycols, a character vector of column names or a named list ofExpression.

  • $Filter(expr): Filter rows by anExpression.

  • $UseThreads(threads): logical: should the scan use multithreading?The method's default input isTRUE, but you must call the method to enablemultithreading because the scanner default isFALSE.

  • $BatchSize(batch_size): integer: Maximum row count of scanned recordbatches, default is 32K. If scanned record batches are overflowing memorythen this method can be called to reduce their size.

  • $schema: Active binding, returns theSchema of the Dataset

  • $Finish(): Returns aScanner

Scanner currently has a single method,$ToTable(), which evaluates thequery and returns an ArrowTable.

Examples

# Set up directory for examplestf<-tempfile()dir.create(tf)on.exit(unlink(tf))write_dataset(mtcars,tf, partitioning="cyl")ds<-open_dataset(tf)scan_builder<-ds$NewScan()scan_builder$Filter(Expression$field_ref("hp")>100)#> ScannerBuilderscan_builder$Project(list(hp_times_ten=10*Expression$field_ref("hp")))#> ScannerBuilder# Once configured, call $Finish()scanner<-scan_builder$Finish()# Can get results as a tableas.data.frame(scanner$ToTable())#>    hp_times_ten#> 1          1130#> 2          1090#> 3          1100#> 4          1100#> 5          1100#> 6          1050#> 7          1230#> 8          1230#> 9          1750#> 10         1750#> 11         2450#> 12         1800#> 13         1800#> 14         1800#> 15         2050#> 16         2150#> 17         2300#> 18         1500#> 19         1500#> 20         2450#> 21         1750#> 22         2640#> 23         3350# Or as a RecordBatchReaderscanner$ToRecordBatchReader()#> RecordBatchReader#> 1 columns#> hp_times_ten: double#>#> See $metadata for additional Schema metadata

[8]ページ先頭

©2009-2025 Movatter.jp