Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:File-Backed Array for Out-of-Memory Computation
Version:0.2.0
Language:en-US
Encoding:UTF-8
License:LGPL-3
URL:https://dipterix.org/filearray/,https://github.com/dipterix/filearray
BugReports:https://github.com/dipterix/filearray/issues
Description:Stores large arrays in files to avoid occupying large memories. Implemented with super fast gigabyte-level multi-threaded reading/writing via 'OpenMP'. Supports multiple non-character data types (double, float, complex, integer, logical, and raw).
Imports:digest, fastmap (≥ 1.1.1), methods, Rcpp, uuid (≥ 1.1.0)
Suggests:bit64, knitr, rmarkdown, testthat (≥ 3.0.0)
RoxygenNote:7.3.2
LinkingTo:BH, Rcpp
Config/testthat/edition:3
VignetteBuilder:knitr
NeedsCompilation:yes
Packaged:2025-04-01 15:57:07 UTC; dipterix
Author:Zhengjia Wang [aut, cre, cph]
Maintainer:Zhengjia Wang <dipterix.wang@gmail.com>
Repository:CRAN
Date/Publication:2025-04-01 16:30:01 UTC

Definition of file array

Description

S4 class definition ofFileArray. Please usefilearray_create andfilearray_loadto create instances.

Public Methods

get_header(key, default = NULL)

Get header information; returnsdefault ifkey is missing

set_header(key, value)

Set header information; the extra headers will be stored in meta file. Please do not store large headers as they will be loaded into memory frequently.

can_write()

Whether the array data can be altered

create(filebase, dimension, type = "double", partition_size = 1)

Create a file array instance

delete(force = FALSE)

Remove array from local file system and reset

dimension()

Get dimension vector

dimnames(v)

Set/get dimension names

element_size()

Internal storage: bytes per element

fill_partition(part, value)

Fill a partition with given scalar

get_partition(part, reshape = NULL)

Get partition data, and reshape (if not null) to desired dimension

expand(n)

Expand array along the last margin; returns true if expanded; if thedimnames have been assigned prior to expansion, the last dimension names will be filled withNA

initialize_partition()

Make sure a partition file exists; if not, create one and fill withNAs or 0 (type='raw')

load(filebase, mode = c("readwrite", "readonly"))

Load file array from existing directory

partition_path(part)

Get partition file path

partition_size()

Get partition size; seefilearray

set_partition(part, value, ..., strict = TRUE)

Set partition value

sexp_type()

Get dataSEXP type; see R internal manuals

show()

Print information

type()

Get data type

valid()

Check if the array is valid.

See Also

filearray


'S3' methods for 'FileArray'

Description

These are 'S3' methods for 'FileArray'

Usage

## S3 method for class 'FileArray'  x[  i,  ...,  drop = TRUE,  reshape = NULL,  strict = TRUE,  dimnames = TRUE,  split_dim = 0]## S3 replacement method for class 'FileArray'x[i, ..., lazy = FALSE] <- value## S3 method for class 'FileArray'x[[i]]## S3 method for class 'FileArray'as.array(x, reshape = NULL, drop = FALSE, ...)## S3 method for class 'FileArray'dim(x)## S3 method for class 'FileArray'dimnames(x)## S3 replacement method for class 'FileArray'dimnames(x) <- value## S3 method for class 'FileArray'length(x)## S3 method for class 'FileArray'max(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'min(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'range(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'sum(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'subset(x, ..., drop = FALSE, .env = parent.frame())

Arguments

x

a file array

i,...

index set, or passed to other methods

drop

whether to drop dimensions; see topicExtract

reshape

a new dimension to set before returning subset results; default isNULL (use default dimensions)

strict

whether to allow indices to exceed bound; currently only acceptTRUE

dimnames

whether to preservedimnames

split_dim

internally used; split dimension and calculate indices tomanually speed up the subset; value ranged from 0 to size of dimension minusone.

lazy

whether to lazy-evaluate the method, only works when assigningarrays with logical array index

value

value to substitute or set

na.rm

whether to removeNA values during the calculation

.env

environment to evaluate formula when evaluating subset margin indices.

Functions


'S4' methods forFileArray

Description

'S4' methods forFileArray

Usage

## S4 method for signature 'FileArray,FileArray'e1 + e2## S4 method for signature 'FileArray,numeric'e1 + e2## S4 method for signature 'numeric,FileArray'e1 + e2## S4 method for signature 'FileArray,complex'e1 + e2## S4 method for signature 'complex,FileArray'e1 + e2## S4 method for signature 'FileArray,logical'e1 + e2## S4 method for signature 'logical,FileArray'e1 + e2## S4 method for signature 'FileArray,array'e1 + e2## S4 method for signature 'array,FileArray'e1 + e2## S4 method for signature 'FileArray,FileArray'e1 - e2## S4 method for signature 'FileArray,numeric'e1 - e2## S4 method for signature 'numeric,FileArray'e1 - e2## S4 method for signature 'FileArray,complex'e1 - e2## S4 method for signature 'complex,FileArray'e1 - e2## S4 method for signature 'FileArray,logical'e1 - e2## S4 method for signature 'logical,FileArray'e1 - e2## S4 method for signature 'FileArray,array'e1 - e2## S4 method for signature 'array,FileArray'e1 - e2## S4 method for signature 'FileArray,FileArray'e1 * e2## S4 method for signature 'FileArray,numeric'e1 * e2## S4 method for signature 'numeric,FileArray'e1 * e2## S4 method for signature 'FileArray,complex'e1 * e2## S4 method for signature 'complex,FileArray'e1 * e2## S4 method for signature 'FileArray,logical'e1 * e2## S4 method for signature 'logical,FileArray'e1 * e2## S4 method for signature 'FileArray,array'e1 * e2## S4 method for signature 'array,FileArray'e1 * e2## S4 method for signature 'FileArray,FileArray'e1 / e2## S4 method for signature 'FileArray,numeric'e1 / e2## S4 method for signature 'numeric,FileArray'e1 / e2## S4 method for signature 'FileArray,complex'e1 / e2## S4 method for signature 'complex,FileArray'e1 / e2## S4 method for signature 'FileArray,logical'e1 / e2## S4 method for signature 'logical,FileArray'e1 / e2## S4 method for signature 'FileArray,array'e1 / e2## S4 method for signature 'array,FileArray'e1 / e2## S4 method for signature 'FileArray,FileArray'e1 ^ e2## S4 method for signature 'FileArray,numeric'e1 ^ e2## S4 method for signature 'numeric,FileArray'e1 ^ e2## S4 method for signature 'FileArray,complex'e1 ^ e2## S4 method for signature 'complex,FileArray'e1 ^ e2## S4 method for signature 'FileArray,logical'e1 ^ e2## S4 method for signature 'logical,FileArray'e1 ^ e2## S4 method for signature 'FileArray,array'e1 ^ e2## S4 method for signature 'array,FileArray'e1 ^ e2## S4 method for signature 'FileArray,FileArray'e1 %% e2## S4 method for signature 'FileArray,numeric'e1 %% e2## S4 method for signature 'numeric,FileArray'e1 %% e2## S4 method for signature 'FileArray,complex'e1 %% e2## S4 method for signature 'complex,FileArray'e1 %% e2## S4 method for signature 'FileArray,logical'e1 %% e2## S4 method for signature 'logical,FileArray'e1 %% e2## S4 method for signature 'FileArray,array'e1 %% e2## S4 method for signature 'array,FileArray'e1 %% e2## S4 method for signature 'FileArray,FileArray'e1 %/% e2## S4 method for signature 'FileArray,numeric'e1 %/% e2## S4 method for signature 'numeric,FileArray'e1 %/% e2## S4 method for signature 'FileArray,complex'e1 %/% e2## S4 method for signature 'complex,FileArray'e1 %/% e2## S4 method for signature 'FileArray,logical'e1 %/% e2## S4 method for signature 'logical,FileArray'e1 %/% e2## S4 method for signature 'FileArray,array'e1 %/% e2## S4 method for signature 'array,FileArray'e1 %/% e2## S4 method for signature 'FileArray,FileArray'e1 == e2## S4 method for signature 'FileArray,numeric'e1 == e2## S4 method for signature 'numeric,FileArray'e1 == e2## S4 method for signature 'FileArray,complex'e1 == e2## S4 method for signature 'complex,FileArray'e1 == e2## S4 method for signature 'FileArray,logical'e1 == e2## S4 method for signature 'logical,FileArray'e1 == e2## S4 method for signature 'FileArray,array'e1 == e2## S4 method for signature 'array,FileArray'e1 == e2## S4 method for signature 'FileArray,FileArray'e1 > e2## S4 method for signature 'FileArray,numeric'e1 > e2## S4 method for signature 'numeric,FileArray'e1 > e2## S4 method for signature 'FileArray,complex'e1 > e2## S4 method for signature 'complex,FileArray'e1 > e2## S4 method for signature 'FileArray,logical'e1 > e2## S4 method for signature 'logical,FileArray'e1 > e2## S4 method for signature 'FileArray,array'e1 > e2## S4 method for signature 'array,FileArray'e1 > e2## S4 method for signature 'FileArray,FileArray'e1 < e2## S4 method for signature 'FileArray,numeric'e1 < e2## S4 method for signature 'numeric,FileArray'e1 < e2## S4 method for signature 'FileArray,complex'e1 < e2## S4 method for signature 'complex,FileArray'e1 < e2## S4 method for signature 'FileArray,logical'e1 < e2## S4 method for signature 'logical,FileArray'e1 < e2## S4 method for signature 'FileArray,array'e1 < e2## S4 method for signature 'array,FileArray'e1 < e2## S4 method for signature 'FileArray,FileArray'e1 != e2## S4 method for signature 'FileArray,numeric'e1 != e2## S4 method for signature 'numeric,FileArray'e1 != e2## S4 method for signature 'FileArray,complex'e1 != e2## S4 method for signature 'complex,FileArray'e1 != e2## S4 method for signature 'FileArray,logical'e1 != e2## S4 method for signature 'logical,FileArray'e1 != e2## S4 method for signature 'FileArray,array'e1 != e2## S4 method for signature 'array,FileArray'e1 != e2## S4 method for signature 'FileArray,FileArray'e1 >= e2## S4 method for signature 'FileArray,numeric'e1 >= e2## S4 method for signature 'numeric,FileArray'e1 >= e2## S4 method for signature 'FileArray,complex'e1 >= e2## S4 method for signature 'complex,FileArray'e1 >= e2## S4 method for signature 'FileArray,logical'e1 >= e2## S4 method for signature 'logical,FileArray'e1 >= e2## S4 method for signature 'FileArray,array'e1 >= e2## S4 method for signature 'array,FileArray'e1 >= e2## S4 method for signature 'FileArray,FileArray'e1 <= e2## S4 method for signature 'FileArray,numeric'e1 <= e2## S4 method for signature 'numeric,FileArray'e1 <= e2## S4 method for signature 'FileArray,complex'e1 <= e2## S4 method for signature 'complex,FileArray'e1 <= e2## S4 method for signature 'FileArray,logical'e1 <= e2## S4 method for signature 'logical,FileArray'e1 <= e2## S4 method for signature 'FileArray,array'e1 <= e2## S4 method for signature 'array,FileArray'e1 <= e2## S4 method for signature 'FileArray,FileArray'e1 & e2## S4 method for signature 'FileArray,numeric'e1 & e2## S4 method for signature 'numeric,FileArray'e1 & e2## S4 method for signature 'FileArray,complex'e1 & e2## S4 method for signature 'complex,FileArray'e1 & e2## S4 method for signature 'FileArray,logical'e1 & e2## S4 method for signature 'logical,FileArray'e1 & e2## S4 method for signature 'FileArray,array'e1 & e2## S4 method for signature 'array,FileArray'e1 & e2## S4 method for signature 'FileArray,FileArray'e1 | e2## S4 method for signature 'FileArray,numeric'e1 | e2## S4 method for signature 'numeric,FileArray'e1 | e2## S4 method for signature 'FileArray,complex'e1 | e2## S4 method for signature 'complex,FileArray'e1 | e2## S4 method for signature 'FileArray,logical'e1 | e2## S4 method for signature 'logical,FileArray'e1 | e2## S4 method for signature 'FileArray,array'e1 | e2## S4 method for signature 'array,FileArray'e1 | e2## S4 method for signature 'FileArray'!x## S4 method for signature 'FileArray'exp(x)## S4 method for signature 'FileArray'expm1(x)## S4 method for signature 'FileArray'log(x, base = exp(1))## S4 method for signature 'FileArray'log10(x)## S4 method for signature 'FileArray'log2(x)## S4 method for signature 'FileArray'log1p(x)## S4 method for signature 'FileArray'abs(x)## S4 method for signature 'FileArray'sqrt(x)## S4 method for signature 'FileArray'sign(x)## S4 method for signature 'FileArray'signif(x, digits = 6)## S4 method for signature 'FileArray'trunc(x, ...)## S4 method for signature 'FileArray'floor(x)## S4 method for signature 'FileArray'ceiling(x)## S4 method for signature 'FileArray'round(x, digits = 0)## S4 method for signature 'FileArray'acos(x)## S4 method for signature 'FileArray'acosh(x)## S4 method for signature 'FileArray'asin(x)## S4 method for signature 'FileArray'asinh(x)## S4 method for signature 'FileArray'atan(x)## S4 method for signature 'FileArray'atanh(x)## S4 method for signature 'FileArray'cos(x)## S4 method for signature 'FileArray'cosh(x)## S4 method for signature 'FileArray'cospi(x)## S4 method for signature 'FileArray'sin(x)## S4 method for signature 'FileArray'sinh(x)## S4 method for signature 'FileArray'sinpi(x)## S4 method for signature 'FileArray'tan(x)## S4 method for signature 'FileArray'tanh(x)## S4 method for signature 'FileArray'tanpi(x)## S4 method for signature 'FileArray'gamma(x)## S4 method for signature 'FileArray'lgamma(x)## S4 method for signature 'FileArray'digamma(x)## S4 method for signature 'FileArray'trigamma(x)## S4 method for signature 'FileArray'Arg(z)## S4 method for signature 'FileArray'Conj(z)## S4 method for signature 'FileArray'Im(z)## S4 method for signature 'FileArray'Mod(z)## S4 method for signature 'FileArray'Re(z)## S4 method for signature 'FileArray'is.na(x)

Arguments

x,z,e1,e2

FileArray or compatible data

base,digits,...

passed to other methods

Value

SeeS4groupGeneric


Apply functions over file array margins (extended)

Description

Apply functions over file array margins (extended)

Usage

apply(X, MARGIN, FUN, ..., simplify = TRUE)## S4 method for signature 'FileArray'apply(X, MARGIN, FUN, ..., simplify = TRUE)## S4 method for signature 'FileArrayProxy'apply(X, MARGIN, FUN, ..., simplify = TRUE)

Arguments

X

a file array

MARGIN

scalar giving the subscripts which the function will be applied over. Current implementation only allows margin size to be one

FUN

the function to be applied

...

optional arguments toFUN

simplify

a logical indicating whether results should be simplified if possible

Value

See Section 'Value' inapply;


Create or load existing file arrays

Description

Create or load existing file arrays

Usage

as_filearray(x, ...)as_filearrayproxy(x, ...)filearray_create(  filebase,  dimension,  type = c("double", "float", "integer", "logical", "raw", "complex"),  partition_size = NA,  initialize = FALSE,  ...)filearray_load(filebase, mode = c("readwrite", "readonly"))filearray_checkload(  filebase,  mode = c("readonly", "readwrite"),  ...,  symlink_ok = TRUE)filearray_load_or_create(  filebase,  dimension,  on_missing = NULL,  type = NA,  ...,  mode = c("readonly", "readwrite"),  symlink_ok = TRUE,  initialize = FALSE,  partition_size = NA,  verbose = FALSE,  auto_set_headers = TRUE)

Arguments

x

R object such as array, file array proxy, or character that can be transformed into file array

...

additional headers to check used byfilearray_checkload(see 'Details'). This argument is ignored byfilearray_create, reserved for future compatibility.

filebase

a directory path to store arrays in the local file system. When creating an array, the path must not exist.

dimension

dimension of the array, at least length of 2

type

storage type of the array; default is'double'. Otheroptions include'integer','logical', and'raw'.

partition_size

positive partition size for the last margin, orNA to automatically guess; see 'Details'.

initialize

whether to initialize partition files; default is falsefor performance considerations. However, if the array is dense, it is recommended to set to true

mode

whether allows writing to the file; choices are'readwrite' and'readonly'.

symlink_ok

whether arrays with symbolic-link partitions can pass the test; this is usually used on bound arrays with symbolic-links; seefilearray_bind;

on_missing

function to handle file array (such as initialization)when a new array is created; must take only one argument, the array object

verbose

whether to print out some debug messages

auto_set_headers

whether to automatically set headers if array is missing or to be created; default is true

Details

The file arrays partition out-of-memory array objects and store them separately in local file systems. Since R stores matrices/arrays in column-major style, file array uses the slowest margin (the last margin) to slice the partitions. This helps to align the elementswithin the files with the corresponding memory order. An array with dimension100x200x300x400 has 4 margins. The length of the last margin is 400, which is also the maximum number of potentialpartitions. The number of partitions are determined by the last marginsize divided bypartition_size. For example, if the partitionsize is 1, then there will be 400 partitions. If the partition size if 3, there will be 134 partitions. The default partition sizes are determined internally following these priorities:

1.

the file size of each partition does not exceed1GB

2.

the number of partitions do not exceed 100

These two rules are not hard requirements. The goal is to reduce thenumbers of partitions as much as possible.

The arguments... infilearray_checkload should be namedarguments that provide additional checks for the header information. The check will fail if at least one header is not identical. For example,if an array contains header key-signature pair, one can usefilearray_checkload(..., key = signature) to validate the signature.Note the comparison will be rigid, meaning the storage type of the headers will be considered as well. If the signature stored in the array is an integer while provided is a double, then the check will result in failure.

Value

AFileArray-class instance.

Author(s)

Zhengjia Wang

Examples

# Preparelibrary(filearray)filebase <- tempfile()if(file.exists(filebase)){ unlink(filebase, TRUE) }# create arrayx <- filearray_create(filebase, dimension = c(200, 30, 8))print(x)# Assign valuesx[] <- rnorm(48000)# Subsetx[1,2,]# load existing arrayfilearray_load(filebase)x$set_header("signature", "tom")filearray_checkload(filebase, signature = "tom")## Not run: # Trying to load with wrong signaturefilearray_checkload(filebase, signature = "jerry")## End(Not run)# check-load, and create a new array if failx <- filearray_load_or_create(    filebase = filebase, dimension = c(200, 30, 8),    verbose = FALSE, signature = "henry")x$get_header("signature")# check-load with initializationx <- filearray_load_or_create(    filebase = filebase,    dimension = c(3, 4, 5),    verbose = FALSE, mode = "readonly",    on_missing = function(array) {        array[] <- seq_len(60)    })x[1:3,1,1]# Clean upunlink(filebase, recursive = TRUE)

Merge and bind homogeneous file arrays

Description

The file arrays to be merged must be homogeneous:same data type, partition size, and partition length

Usage

filearray_bind(  ...,  .list = list(),  filebase = tempfile(),  symlink = FALSE,  overwrite = FALSE,  cache_ok = FALSE)

Arguments

...,.list

file array instances

filebase

where to create merged array

symlink

whether to usefile.symlink; if true,then partition files will be symbolic-linked to the original arrays,otherwise the partition files will be copied over. If you want your datato be portable, do not use symbolic-links. The default value isFALSE

overwrite

whether to overwrite whenfilebase already exists;default is false, which raises errors

cache_ok

see 'Details', only used ifoverwrite is true.

Details

The input arrays must share the same data type and partition size.The dimension for each partition should also be the same. For examplean arrayx1 has dimension100x20x30 with partition size1, then each partition dimension is100x20x1, and there are30 partitions.x1 can bind with another array of the same partition size. This means ifx2 has dimension100x20x40 and each partition size is1, thenx1 andx2 can be merged.

Iffilebase exists andoverwrite isFALSE, an error will always raise. Ifoverwrite=TRUE andcache_ok=FALSE, thenthe existingfilebase will be erased and any data stored within willbe lost. If bothoverwrite andcache_ok areTRUE, then , before erasingfilebase, the function validates the existingarray header and compare the header signatures. If the existing headersignature is the same as the array to be created, then the existing array will be returned. Thiscache_ok could be extremely useful whenbinding large arrays withsymlink=FALSE as the cache might avoidmoving files around. However,cache_ok should be enabled with caution.This is because only the header information will be compared, but the partition data will not be compared. If the existing array was generated froman old versions of the source arrays, but the data from the source arrayshas been altered, then thecache_ok=TRUE is rarely proper as the cacheis outdated.

Thesymlink option should be used with extra caution. Creating symbolic links is definitely faster than copying partition files. However, since the partition files are simply linked to the original partition files, changing to the input arrays will also affect the merged arrays, and vice versa; see 'Examples'. Also for arrays created from symbolic links, if the original arrays are deleted, while the merged arrays will not be invalidated, the corresponding partitions will no longer be accessible. Attempts to set deleted partitions will likely result in failure. Thereforesymlink should be set to true when creating merged arrays aretemporary for read-only purpose, and when speed and disk space is inconsideration. For extended reading, please checkfiles for details.

Value

A bound array in'FileArray' class.

Examples

partition_size <- 1type <- "double"x1 <- filearray_create(    tempfile(), c(2,2), type = type,    partition_size = partition_size)x1[] <- 1:4x2 <- filearray_create(    tempfile(), c(2,1), type = type,    partition_size = partition_size)x2[] <- 5:6y1 <- filearray_bind(x1, x2, symlink = FALSE)y2 <- filearray_bind(x1, x2)# y1 copies partition files, and y2 simply creates links # if symlink is supportedy1[] - y2[]# change x1x1[1,1] <- NA# y1 is not affectedy1[]# y2 changes y2[]

Set or get file array threads

Description

Will enable/disable multi-threaded reading or writingatC++ level.

Usage

filearray_threads(n, ...)

Arguments

n

number of threads to set. Ifn is negative,then default to the number of cores that computer has.

...

internally used

Value

An integer of current number of threads


Map multiple file arrays and save results

Description

Advanced mapping function for multiple file arrays.fmapruns the mapping functions and stores the results in file arrays.fmap2 stores results in memory. This feature is experimental. There are several constraints to the input. Failure to meet these constraints may result in undefined results, or even crashes. Please read Section 'Details' carefully before using this function.

Usage

fmap(  x,  fun,  .y = NULL,  .buffer_count = NA_integer_,  .output_size = NA_integer_,  ...)fmap2(x, fun, .buffer_count = NA, .simplify = TRUE, ...)fmap_element_wise(x, fun, .y, ..., .input_size = NA)

Arguments

x

a list of file arrays to map; each element ofx must share the same dimensions.

fun

function that takes one list

.y

a file array object, used to save results

.buffer_count

number of total buffers (chunks) to run

.output_size

fun output vector length

...

other arguments passing tofun

.simplify

whether to applysimplify2array to the result

.input_size

number of elements to read from each array ofx

Details

Denote the first argument offun asinput, The length ofinput equals the length ofx. The size of eachelement ofinput is defined by.input_size, except for thelast loop. For example, given dimension of each input array as10x10x10x10, if.input_size=100, thenlength(input[[1]])=100. The total number of runs equals tolength(x[[1]])/100. If.input_size=300, thenlength(input[[1]]) will be300 except for the last run. This is because10000 cannot be divided by300. The element length of the last run will be100.

The returned variable length offun will be checked by.output_size. If the output length exceed.output_size, an error will be raised.

Please make sure thatlength(.y)/length(x[[1]]) equals to.output_size/.input_size.

Forfmap_element_wise, theinput[[1]] and output length must be the consistent.

Value

File array instance.y

Examples

set.seed(1)x1 <- filearray_create(tempfile(), dimension = c(100,20,3))x1[] <- rnorm(6000)x2 <- filearray_create(tempfile(), dimension = c(100,20,3))x2[] <- rnorm(6000)# Add two arraysoutput <- filearray_create(tempfile(), dimension = c(100,20,3))fmap(list(x1, x2), function(input){    input[[1]] + input[[2]]}, output)# checkrange(output[] - (x1[] + x2[]))output$delete()# Calculate the maximum of x1/x2 for every 100 elements# total 60 batches/loops (`.buffer_count`)output <- filearray_create(tempfile(), dimension = c(20,3))fmap(list(x1, x2), function(input){    max(input[[1]] / input[[2]])}, .y = output, .buffer_count = 60)# checkrange(output[] - apply(x1[] / x2[], c(2,3), max))output$delete()# A large array exampleif(interactive()){    x <- filearray_create(tempfile(), dimension = c(287, 100, 301, 4))    dimnames(x) <- list(        Trial = 1:287,        Marker = 1:100,        Time = 1:301,        Location = 1:4    )    for(i in 1:4){        x[,,,i] <- runif(8638700)    }    # Step 1:    # for each location, trial, and marker, calibrate (baseline)    # according to first 50 time-points    output <- filearray_create(tempfile(), dimension = dim(x))    # baseline-percentage change    fmap(        list(x),        function(input){            # get locational data            location_data <- input[[1]]            dim(location_data) <- c(287, 100, 301)            # collapse over first 50 time points for            # each trial, and marker            baseline <- apply(location_data[,,1:50], c(1,2), mean)            # calibrate            calibrated <- sweep(location_data, c(1,2), baseline,                                FUN = function(data, bl){                                    (data / bl - 1) * 100                                })            return(calibrated)        },        .y = output,        # input dimension is 287 x 100 x 301 for each location        # hence 4 loops in total        .buffer_count = 4    )    # cleanup    x$delete()}# cleanupx1$delete()x2$delete()output$delete()

A generic function ofwhich that is'FileArray' compatible

Description

A generic function ofwhich that is'FileArray' compatible

Usage

fwhich(x, val, arr.ind = FALSE, ret.values = FALSE, ...)## Default S3 method:fwhich(x, val, arr.ind = FALSE, ret.values = FALSE, ...)## S3 method for class 'FileArray'fwhich(x, val, arr.ind = FALSE, ret.values = FALSE, ...)

Arguments

x

any R vector, matrix, array or file-array

val

values to find, or a function taking one argument (a slice of data vector) and returns either logical vector with the same length as the slice or index of the slice; see 'Examples'

arr.ind

logical; should array indices be returned whenx is an array?

ret.values

whether to return the values of corresponding indices as an attributes; default is false

...

passed toval ifval is a function

Value

The indices ofx elements that are listed inval.

Examples

# ---- Default case ------------------------------------x <- array(1:27 + 2, rep(3,3))# find index of `x` equal to either 4 or 5fwhich(x, c(4,5))res <- fwhich(x, c(4,5), ret.values = TRUE)resattr(res, "values")# ---- file-array case --------------------------------arr <- filearray_create(tempfile(), dim(x))arr[] <- xfwhich(arr, c(4,5))fwhich(arr, c(4,5), arr.ind = TRUE, ret.values = TRUE)arr[2:3, 1, 1]# Clean up this examplearr$delete()# ---- `val` is a function ----------------------------x <- as_filearray(c(sample(15), 15), dimension = c(4,4))ret <- fwhich(x, val = which.max,               ret.values = TRUE, arr.ind = FALSE)# ret is the indexret == which.max(x[])# attr(ret, "values") is the max valuemax(x[]) == attr(ret, "values")# customize `val`fwhich(x, ret.values = TRUE, arr.ind = FALSE,       val = function( slice ) {           slice > 10 # or which(slice > 10)       })

A map-reduce method to iterate blocks of file-array data with little memory usage

Description

A map-reduce method to iterate blocks of file-array data with little memory usage

Usage

mapreduce(x, map, reduce, ...)## S4 method for signature 'FileArray,ANY,function'mapreduce(x, map, reduce, buffer_size = NA, ...)## S4 method for signature 'FileArray,ANY,NULL'mapreduce(x, map, reduce, buffer_size = NA, ...)## S4 method for signature 'FileArray,ANY,missing'mapreduce(x, map, reduce, buffer_size = NA, ...)

Arguments

x

a file array object

map

mapping function that receives 3 arguments; see 'Details'

reduce

NULL, or a function that takes a list as input

...

passed to other methods

buffer_size

control how we split the array; see 'Details'

Details

When handling out-of-memory arrays, it is recommended to loada block of array at a time and execute on block level. Seeapply for a implementation. When an array is too large,and when there are too many blocks, this operation will become very slow if computer memory is low. This is because the R will perform garbage collection frequently. Implemented inC++,mapreduce creates a buffer to storethe block data. By reusing the memory over and over again, it is possibleto iterate through the array with minimal garbage collections. Many statistics, includingmin,max,sum,mean, ... These statistics can be calculated in this way efficiently.

The functionmap contains three arguments:data (mandate),size (optional), andfirst_index (optional). Thedata is the buffer,whose length is consistent across iterations.size indicatesthe effective size of the buffer. If the partition sizeis not divisible by the buffer size, only firstsize elements ofthe data are from array, and the rest elements will beNA. This situation could only occurs whenbuffer_size is manually specified. By default, all ofdata should belong to arrays.The last argumentfirst_index is the index of the first elementdata[1] in the whole array. It is useful when positional data is needed.

The buffer size, specified bybuffer_size is an additional optional argument in.... Its default isNA,and will be calculated automatically. If manually specified, alarge buffer size would be desired to speed up the calculation.The default buffer size will not exceednThreads x 2MB, wherenThreads is the number of threads set byfilearray_threads.When partition length cannot be divided by the buffer size, instead oftrimming the buffer,NAs will be filled to the buffer, passed tomap function; see previous paragraph for treatments.

The functionmapreduce ignores the missing partitions. That meansif a partition is missing, its data will not be read nor passed tomap function. Please runx$initialize_partition() to make surepartition files exist.

Value

Ifreduce isNULL, return mapped results, otherwisereturn reduced results fromreduce function

Examples

x <- filearray_create(tempfile(), c(100, 100, 10))x[] <- rnorm(1e5)## calculate summation# identical to sum(x[]), but is more feasible in large casesmapreduce(x, map = function(data, size){    # make sure `data` is all from array    if(length(data) != size){        data <- data[1:size]    }    sum(data)}, reduce = function(mapped_list){    do.call(sum, mapped_list)})## Find elements are less than -3positions <- mapreduce(    x,    map = function(data, size, first_index) {        if (length(data) != size) {            data <- data[1:size]        }        which(data < -3) + (first_index - 1)    },    reduce = function(mapped_list) {        do.call(c, mapped_list)    })if(length(positions)){    x[[positions[1]]]}

The type of a file array (extended)

Description

The type of a file array (extended)

Usage

typeof(x)## S4 method for signature 'FileArray'typeof(x)## S4 method for signature 'FileArrayProxy'typeof(x)

Arguments

x

any file array

Value

A character string. The possible values are"double","integer","logical", and"raw"


[8]ページ先頭

©2009-2025 Movatter.jp