| Type: | Package |
| Title: | File-Backed Array for Out-of-Memory Computation |
| Version: | 0.2.0 |
| Language: | en-US |
| Encoding: | UTF-8 |
| License: | LGPL-3 |
| URL: | https://dipterix.org/filearray/,https://github.com/dipterix/filearray |
| BugReports: | https://github.com/dipterix/filearray/issues |
| Description: | Stores large arrays in files to avoid occupying large memories. Implemented with super fast gigabyte-level multi-threaded reading/writing via 'OpenMP'. Supports multiple non-character data types (double, float, complex, integer, logical, and raw). |
| Imports: | digest, fastmap (≥ 1.1.1), methods, Rcpp, uuid (≥ 1.1.0) |
| Suggests: | bit64, knitr, rmarkdown, testthat (≥ 3.0.0) |
| RoxygenNote: | 7.3.2 |
| LinkingTo: | BH, Rcpp |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2025-04-01 15:57:07 UTC; dipterix |
| Author: | Zhengjia Wang [aut, cre, cph] |
| Maintainer: | Zhengjia Wang <dipterix.wang@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-04-01 16:30:01 UTC |
Definition of file array
Description
S4 class definition ofFileArray. Please usefilearray_create andfilearray_loadto create instances.
Public Methods
get_header(key, default = NULL)Get header information; returns
defaultifkeyis missingset_header(key, value)Set header information; the extra headers will be stored in meta file. Please do not store large headers as they will be loaded into memory frequently.
can_write()Whether the array data can be altered
create(filebase, dimension, type = "double", partition_size = 1)Create a file array instance
delete(force = FALSE)Remove array from local file system and reset
dimension()Get dimension vector
dimnames(v)Set/get dimension names
element_size()Internal storage: bytes per element
fill_partition(part, value)Fill a partition with given scalar
get_partition(part, reshape = NULL)Get partition data, and reshape (if not null) to desired dimension
expand(n)Expand array along the last margin; returns true if expanded; if the
dimnameshave been assigned prior to expansion, the last dimension names will be filled withNAinitialize_partition()Make sure a partition file exists; if not, create one and fill with
NAs or 0 (type='raw')load(filebase, mode = c("readwrite", "readonly"))Load file array from existing directory
partition_path(part)Get partition file path
partition_size()Get partition size; see
filearrayset_partition(part, value, ..., strict = TRUE)Set partition value
sexp_type()Get data
SEXPtype; see R internal manualsshow()Print information
type()Get data type
valid()Check if the array is valid.
See Also
'S3' methods for 'FileArray'
Description
These are 'S3' methods for 'FileArray'
Usage
## S3 method for class 'FileArray' x[ i, ..., drop = TRUE, reshape = NULL, strict = TRUE, dimnames = TRUE, split_dim = 0]## S3 replacement method for class 'FileArray'x[i, ..., lazy = FALSE] <- value## S3 method for class 'FileArray'x[[i]]## S3 method for class 'FileArray'as.array(x, reshape = NULL, drop = FALSE, ...)## S3 method for class 'FileArray'dim(x)## S3 method for class 'FileArray'dimnames(x)## S3 replacement method for class 'FileArray'dimnames(x) <- value## S3 method for class 'FileArray'length(x)## S3 method for class 'FileArray'max(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'min(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'range(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'sum(x, na.rm = FALSE, ...)## S3 method for class 'FileArray'subset(x, ..., drop = FALSE, .env = parent.frame())Arguments
x | a file array |
i,... | index set, or passed to other methods |
drop | whether to drop dimensions; see topic |
reshape | a new dimension to set before returning subset results; default is |
strict | whether to allow indices to exceed bound; currently only accept |
dimnames | whether to preserve |
split_dim | internally used; split dimension and calculate indices tomanually speed up the subset; value ranged from 0 to size of dimension minusone. |
lazy | whether to lazy-evaluate the method, only works when assigningarrays with logical array index |
value | value to substitute or set |
na.rm | whether to remove |
.env | environment to evaluate formula when evaluating subset margin indices. |
Functions
[: subset array`[`(FileArray) <- value: subset assign array[[: get element by indexas.array(FileArray): converts file array to native array in Rdim(FileArray): get dimensionsdimnames(FileArray): get dimension namesdimnames(FileArray) <- value: set dimension nameslength(FileArray): get array lengthmax(FileArray): get max valuemin(FileArray): get min valuerange(FileArray): get value rangesum(FileArray): get summationsubset(FileArray): get subset file array with formulae
'S4' methods forFileArray
Description
'S4' methods forFileArray
Usage
## S4 method for signature 'FileArray,FileArray'e1 + e2## S4 method for signature 'FileArray,numeric'e1 + e2## S4 method for signature 'numeric,FileArray'e1 + e2## S4 method for signature 'FileArray,complex'e1 + e2## S4 method for signature 'complex,FileArray'e1 + e2## S4 method for signature 'FileArray,logical'e1 + e2## S4 method for signature 'logical,FileArray'e1 + e2## S4 method for signature 'FileArray,array'e1 + e2## S4 method for signature 'array,FileArray'e1 + e2## S4 method for signature 'FileArray,FileArray'e1 - e2## S4 method for signature 'FileArray,numeric'e1 - e2## S4 method for signature 'numeric,FileArray'e1 - e2## S4 method for signature 'FileArray,complex'e1 - e2## S4 method for signature 'complex,FileArray'e1 - e2## S4 method for signature 'FileArray,logical'e1 - e2## S4 method for signature 'logical,FileArray'e1 - e2## S4 method for signature 'FileArray,array'e1 - e2## S4 method for signature 'array,FileArray'e1 - e2## S4 method for signature 'FileArray,FileArray'e1 * e2## S4 method for signature 'FileArray,numeric'e1 * e2## S4 method for signature 'numeric,FileArray'e1 * e2## S4 method for signature 'FileArray,complex'e1 * e2## S4 method for signature 'complex,FileArray'e1 * e2## S4 method for signature 'FileArray,logical'e1 * e2## S4 method for signature 'logical,FileArray'e1 * e2## S4 method for signature 'FileArray,array'e1 * e2## S4 method for signature 'array,FileArray'e1 * e2## S4 method for signature 'FileArray,FileArray'e1 / e2## S4 method for signature 'FileArray,numeric'e1 / e2## S4 method for signature 'numeric,FileArray'e1 / e2## S4 method for signature 'FileArray,complex'e1 / e2## S4 method for signature 'complex,FileArray'e1 / e2## S4 method for signature 'FileArray,logical'e1 / e2## S4 method for signature 'logical,FileArray'e1 / e2## S4 method for signature 'FileArray,array'e1 / e2## S4 method for signature 'array,FileArray'e1 / e2## S4 method for signature 'FileArray,FileArray'e1 ^ e2## S4 method for signature 'FileArray,numeric'e1 ^ e2## S4 method for signature 'numeric,FileArray'e1 ^ e2## S4 method for signature 'FileArray,complex'e1 ^ e2## S4 method for signature 'complex,FileArray'e1 ^ e2## S4 method for signature 'FileArray,logical'e1 ^ e2## S4 method for signature 'logical,FileArray'e1 ^ e2## S4 method for signature 'FileArray,array'e1 ^ e2## S4 method for signature 'array,FileArray'e1 ^ e2## S4 method for signature 'FileArray,FileArray'e1 %% e2## S4 method for signature 'FileArray,numeric'e1 %% e2## S4 method for signature 'numeric,FileArray'e1 %% e2## S4 method for signature 'FileArray,complex'e1 %% e2## S4 method for signature 'complex,FileArray'e1 %% e2## S4 method for signature 'FileArray,logical'e1 %% e2## S4 method for signature 'logical,FileArray'e1 %% e2## S4 method for signature 'FileArray,array'e1 %% e2## S4 method for signature 'array,FileArray'e1 %% e2## S4 method for signature 'FileArray,FileArray'e1 %/% e2## S4 method for signature 'FileArray,numeric'e1 %/% e2## S4 method for signature 'numeric,FileArray'e1 %/% e2## S4 method for signature 'FileArray,complex'e1 %/% e2## S4 method for signature 'complex,FileArray'e1 %/% e2## S4 method for signature 'FileArray,logical'e1 %/% e2## S4 method for signature 'logical,FileArray'e1 %/% e2## S4 method for signature 'FileArray,array'e1 %/% e2## S4 method for signature 'array,FileArray'e1 %/% e2## S4 method for signature 'FileArray,FileArray'e1 == e2## S4 method for signature 'FileArray,numeric'e1 == e2## S4 method for signature 'numeric,FileArray'e1 == e2## S4 method for signature 'FileArray,complex'e1 == e2## S4 method for signature 'complex,FileArray'e1 == e2## S4 method for signature 'FileArray,logical'e1 == e2## S4 method for signature 'logical,FileArray'e1 == e2## S4 method for signature 'FileArray,array'e1 == e2## S4 method for signature 'array,FileArray'e1 == e2## S4 method for signature 'FileArray,FileArray'e1 > e2## S4 method for signature 'FileArray,numeric'e1 > e2## S4 method for signature 'numeric,FileArray'e1 > e2## S4 method for signature 'FileArray,complex'e1 > e2## S4 method for signature 'complex,FileArray'e1 > e2## S4 method for signature 'FileArray,logical'e1 > e2## S4 method for signature 'logical,FileArray'e1 > e2## S4 method for signature 'FileArray,array'e1 > e2## S4 method for signature 'array,FileArray'e1 > e2## S4 method for signature 'FileArray,FileArray'e1 < e2## S4 method for signature 'FileArray,numeric'e1 < e2## S4 method for signature 'numeric,FileArray'e1 < e2## S4 method for signature 'FileArray,complex'e1 < e2## S4 method for signature 'complex,FileArray'e1 < e2## S4 method for signature 'FileArray,logical'e1 < e2## S4 method for signature 'logical,FileArray'e1 < e2## S4 method for signature 'FileArray,array'e1 < e2## S4 method for signature 'array,FileArray'e1 < e2## S4 method for signature 'FileArray,FileArray'e1 != e2## S4 method for signature 'FileArray,numeric'e1 != e2## S4 method for signature 'numeric,FileArray'e1 != e2## S4 method for signature 'FileArray,complex'e1 != e2## S4 method for signature 'complex,FileArray'e1 != e2## S4 method for signature 'FileArray,logical'e1 != e2## S4 method for signature 'logical,FileArray'e1 != e2## S4 method for signature 'FileArray,array'e1 != e2## S4 method for signature 'array,FileArray'e1 != e2## S4 method for signature 'FileArray,FileArray'e1 >= e2## S4 method for signature 'FileArray,numeric'e1 >= e2## S4 method for signature 'numeric,FileArray'e1 >= e2## S4 method for signature 'FileArray,complex'e1 >= e2## S4 method for signature 'complex,FileArray'e1 >= e2## S4 method for signature 'FileArray,logical'e1 >= e2## S4 method for signature 'logical,FileArray'e1 >= e2## S4 method for signature 'FileArray,array'e1 >= e2## S4 method for signature 'array,FileArray'e1 >= e2## S4 method for signature 'FileArray,FileArray'e1 <= e2## S4 method for signature 'FileArray,numeric'e1 <= e2## S4 method for signature 'numeric,FileArray'e1 <= e2## S4 method for signature 'FileArray,complex'e1 <= e2## S4 method for signature 'complex,FileArray'e1 <= e2## S4 method for signature 'FileArray,logical'e1 <= e2## S4 method for signature 'logical,FileArray'e1 <= e2## S4 method for signature 'FileArray,array'e1 <= e2## S4 method for signature 'array,FileArray'e1 <= e2## S4 method for signature 'FileArray,FileArray'e1 & e2## S4 method for signature 'FileArray,numeric'e1 & e2## S4 method for signature 'numeric,FileArray'e1 & e2## S4 method for signature 'FileArray,complex'e1 & e2## S4 method for signature 'complex,FileArray'e1 & e2## S4 method for signature 'FileArray,logical'e1 & e2## S4 method for signature 'logical,FileArray'e1 & e2## S4 method for signature 'FileArray,array'e1 & e2## S4 method for signature 'array,FileArray'e1 & e2## S4 method for signature 'FileArray,FileArray'e1 | e2## S4 method for signature 'FileArray,numeric'e1 | e2## S4 method for signature 'numeric,FileArray'e1 | e2## S4 method for signature 'FileArray,complex'e1 | e2## S4 method for signature 'complex,FileArray'e1 | e2## S4 method for signature 'FileArray,logical'e1 | e2## S4 method for signature 'logical,FileArray'e1 | e2## S4 method for signature 'FileArray,array'e1 | e2## S4 method for signature 'array,FileArray'e1 | e2## S4 method for signature 'FileArray'!x## S4 method for signature 'FileArray'exp(x)## S4 method for signature 'FileArray'expm1(x)## S4 method for signature 'FileArray'log(x, base = exp(1))## S4 method for signature 'FileArray'log10(x)## S4 method for signature 'FileArray'log2(x)## S4 method for signature 'FileArray'log1p(x)## S4 method for signature 'FileArray'abs(x)## S4 method for signature 'FileArray'sqrt(x)## S4 method for signature 'FileArray'sign(x)## S4 method for signature 'FileArray'signif(x, digits = 6)## S4 method for signature 'FileArray'trunc(x, ...)## S4 method for signature 'FileArray'floor(x)## S4 method for signature 'FileArray'ceiling(x)## S4 method for signature 'FileArray'round(x, digits = 0)## S4 method for signature 'FileArray'acos(x)## S4 method for signature 'FileArray'acosh(x)## S4 method for signature 'FileArray'asin(x)## S4 method for signature 'FileArray'asinh(x)## S4 method for signature 'FileArray'atan(x)## S4 method for signature 'FileArray'atanh(x)## S4 method for signature 'FileArray'cos(x)## S4 method for signature 'FileArray'cosh(x)## S4 method for signature 'FileArray'cospi(x)## S4 method for signature 'FileArray'sin(x)## S4 method for signature 'FileArray'sinh(x)## S4 method for signature 'FileArray'sinpi(x)## S4 method for signature 'FileArray'tan(x)## S4 method for signature 'FileArray'tanh(x)## S4 method for signature 'FileArray'tanpi(x)## S4 method for signature 'FileArray'gamma(x)## S4 method for signature 'FileArray'lgamma(x)## S4 method for signature 'FileArray'digamma(x)## S4 method for signature 'FileArray'trigamma(x)## S4 method for signature 'FileArray'Arg(z)## S4 method for signature 'FileArray'Conj(z)## S4 method for signature 'FileArray'Im(z)## S4 method for signature 'FileArray'Mod(z)## S4 method for signature 'FileArray'Re(z)## S4 method for signature 'FileArray'is.na(x)Arguments
x,z,e1,e2 |
|
base,digits,... | passed to other methods |
Value
Apply functions over file array margins (extended)
Description
Apply functions over file array margins (extended)
Usage
apply(X, MARGIN, FUN, ..., simplify = TRUE)## S4 method for signature 'FileArray'apply(X, MARGIN, FUN, ..., simplify = TRUE)## S4 method for signature 'FileArrayProxy'apply(X, MARGIN, FUN, ..., simplify = TRUE)Arguments
X | a file array |
MARGIN | scalar giving the subscripts which the function will be applied over. Current implementation only allows margin size to be one |
FUN | the function to be applied |
... | optional arguments to |
simplify | a logical indicating whether results should be simplified if possible |
Value
See Section 'Value' inapply;
Create or load existing file arrays
Description
Create or load existing file arrays
Usage
as_filearray(x, ...)as_filearrayproxy(x, ...)filearray_create( filebase, dimension, type = c("double", "float", "integer", "logical", "raw", "complex"), partition_size = NA, initialize = FALSE, ...)filearray_load(filebase, mode = c("readwrite", "readonly"))filearray_checkload( filebase, mode = c("readonly", "readwrite"), ..., symlink_ok = TRUE)filearray_load_or_create( filebase, dimension, on_missing = NULL, type = NA, ..., mode = c("readonly", "readwrite"), symlink_ok = TRUE, initialize = FALSE, partition_size = NA, verbose = FALSE, auto_set_headers = TRUE)Arguments
x | R object such as array, file array proxy, or character that can be transformed into file array |
... | additional headers to check used by |
filebase | a directory path to store arrays in the local file system. When creating an array, the path must not exist. |
dimension | dimension of the array, at least length of 2 |
type | storage type of the array; default is |
partition_size | positive partition size for the last margin, or |
initialize | whether to initialize partition files; default is falsefor performance considerations. However, if the array is dense, it is recommended to set to true |
mode | whether allows writing to the file; choices are |
symlink_ok | whether arrays with symbolic-link partitions can pass the test; this is usually used on bound arrays with symbolic-links; see |
on_missing | function to handle file array (such as initialization)when a new array is created; must take only one argument, the array object |
verbose | whether to print out some debug messages |
auto_set_headers | whether to automatically set headers if array is missing or to be created; default is true |
Details
The file arrays partition out-of-memory array objects and store them separately in local file systems. Since R stores matrices/arrays in column-major style, file array uses the slowest margin (the last margin) to slice the partitions. This helps to align the elementswithin the files with the corresponding memory order. An array with dimension100x200x300x400 has 4 margins. The length of the last margin is 400, which is also the maximum number of potentialpartitions. The number of partitions are determined by the last marginsize divided bypartition_size. For example, if the partitionsize is 1, then there will be 400 partitions. If the partition size if 3, there will be 134 partitions. The default partition sizes are determined internally following these priorities:
- 1.
the file size of each partition does not exceed
1GB- 2.
the number of partitions do not exceed 100
These two rules are not hard requirements. The goal is to reduce thenumbers of partitions as much as possible.
The arguments... infilearray_checkload should be namedarguments that provide additional checks for the header information. The check will fail if at least one header is not identical. For example,if an array contains header key-signature pair, one can usefilearray_checkload(..., key = signature) to validate the signature.Note the comparison will be rigid, meaning the storage type of the headers will be considered as well. If the signature stored in the array is an integer while provided is a double, then the check will result in failure.
Value
AFileArray-class instance.
Author(s)
Zhengjia Wang
Examples
# Preparelibrary(filearray)filebase <- tempfile()if(file.exists(filebase)){ unlink(filebase, TRUE) }# create arrayx <- filearray_create(filebase, dimension = c(200, 30, 8))print(x)# Assign valuesx[] <- rnorm(48000)# Subsetx[1,2,]# load existing arrayfilearray_load(filebase)x$set_header("signature", "tom")filearray_checkload(filebase, signature = "tom")## Not run: # Trying to load with wrong signaturefilearray_checkload(filebase, signature = "jerry")## End(Not run)# check-load, and create a new array if failx <- filearray_load_or_create( filebase = filebase, dimension = c(200, 30, 8), verbose = FALSE, signature = "henry")x$get_header("signature")# check-load with initializationx <- filearray_load_or_create( filebase = filebase, dimension = c(3, 4, 5), verbose = FALSE, mode = "readonly", on_missing = function(array) { array[] <- seq_len(60) })x[1:3,1,1]# Clean upunlink(filebase, recursive = TRUE)Merge and bind homogeneous file arrays
Description
The file arrays to be merged must be homogeneous:same data type, partition size, and partition length
Usage
filearray_bind( ..., .list = list(), filebase = tempfile(), symlink = FALSE, overwrite = FALSE, cache_ok = FALSE)Arguments
...,.list | file array instances |
filebase | where to create merged array |
symlink | whether to use |
overwrite | whether to overwrite when |
cache_ok | see 'Details', only used if |
Details
The input arrays must share the same data type and partition size.The dimension for each partition should also be the same. For examplean arrayx1 has dimension100x20x30 with partition size1, then each partition dimension is100x20x1, and there are30 partitions.x1 can bind with another array of the same partition size. This means ifx2 has dimension100x20x40 and each partition size is1, thenx1 andx2 can be merged.
Iffilebase exists andoverwrite isFALSE, an error will always raise. Ifoverwrite=TRUE andcache_ok=FALSE, thenthe existingfilebase will be erased and any data stored within willbe lost. If bothoverwrite andcache_ok areTRUE, then , before erasingfilebase, the function validates the existingarray header and compare the header signatures. If the existing headersignature is the same as the array to be created, then the existing array will be returned. Thiscache_ok could be extremely useful whenbinding large arrays withsymlink=FALSE as the cache might avoidmoving files around. However,cache_ok should be enabled with caution.This is because only the header information will be compared, but the partition data will not be compared. If the existing array was generated froman old versions of the source arrays, but the data from the source arrayshas been altered, then thecache_ok=TRUE is rarely proper as the cacheis outdated.
Thesymlink option should be used with extra caution. Creating symbolic links is definitely faster than copying partition files. However, since the partition files are simply linked to the original partition files, changing to the input arrays will also affect the merged arrays, and vice versa; see 'Examples'. Also for arrays created from symbolic links, if the original arrays are deleted, while the merged arrays will not be invalidated, the corresponding partitions will no longer be accessible. Attempts to set deleted partitions will likely result in failure. Thereforesymlink should be set to true when creating merged arrays aretemporary for read-only purpose, and when speed and disk space is inconsideration. For extended reading, please checkfiles for details.
Value
A bound array in'FileArray' class.
Examples
partition_size <- 1type <- "double"x1 <- filearray_create( tempfile(), c(2,2), type = type, partition_size = partition_size)x1[] <- 1:4x2 <- filearray_create( tempfile(), c(2,1), type = type, partition_size = partition_size)x2[] <- 5:6y1 <- filearray_bind(x1, x2, symlink = FALSE)y2 <- filearray_bind(x1, x2)# y1 copies partition files, and y2 simply creates links # if symlink is supportedy1[] - y2[]# change x1x1[1,1] <- NA# y1 is not affectedy1[]# y2 changes y2[]Set or get file array threads
Description
Will enable/disable multi-threaded reading or writingatC++ level.
Usage
filearray_threads(n, ...)Arguments
n | number of threads to set. If |
... | internally used |
Value
An integer of current number of threads
Map multiple file arrays and save results
Description
Advanced mapping function for multiple file arrays.fmapruns the mapping functions and stores the results in file arrays.fmap2 stores results in memory. This feature is experimental. There are several constraints to the input. Failure to meet these constraints may result in undefined results, or even crashes. Please read Section 'Details' carefully before using this function.
Usage
fmap( x, fun, .y = NULL, .buffer_count = NA_integer_, .output_size = NA_integer_, ...)fmap2(x, fun, .buffer_count = NA, .simplify = TRUE, ...)fmap_element_wise(x, fun, .y, ..., .input_size = NA)Arguments
x | a list of file arrays to map; each element of |
fun | function that takes one list |
.y | a file array object, used to save results |
.buffer_count | number of total buffers (chunks) to run |
.output_size |
|
... | other arguments passing to |
.simplify | whether to apply |
.input_size | number of elements to read from each array of |
Details
Denote the first argument offun asinput, The length ofinput equals the length ofx. The size of eachelement ofinput is defined by.input_size, except for thelast loop. For example, given dimension of each input array as10x10x10x10, if.input_size=100, thenlength(input[[1]])=100. The total number of runs equals tolength(x[[1]])/100. If.input_size=300, thenlength(input[[1]]) will be300 except for the last run. This is because10000 cannot be divided by300. The element length of the last run will be100.
The returned variable length offun will be checked by.output_size. If the output length exceed.output_size, an error will be raised.
Please make sure thatlength(.y)/length(x[[1]]) equals to.output_size/.input_size.
Forfmap_element_wise, theinput[[1]] and output length must be the consistent.
Value
File array instance.y
Examples
set.seed(1)x1 <- filearray_create(tempfile(), dimension = c(100,20,3))x1[] <- rnorm(6000)x2 <- filearray_create(tempfile(), dimension = c(100,20,3))x2[] <- rnorm(6000)# Add two arraysoutput <- filearray_create(tempfile(), dimension = c(100,20,3))fmap(list(x1, x2), function(input){ input[[1]] + input[[2]]}, output)# checkrange(output[] - (x1[] + x2[]))output$delete()# Calculate the maximum of x1/x2 for every 100 elements# total 60 batches/loops (`.buffer_count`)output <- filearray_create(tempfile(), dimension = c(20,3))fmap(list(x1, x2), function(input){ max(input[[1]] / input[[2]])}, .y = output, .buffer_count = 60)# checkrange(output[] - apply(x1[] / x2[], c(2,3), max))output$delete()# A large array exampleif(interactive()){ x <- filearray_create(tempfile(), dimension = c(287, 100, 301, 4)) dimnames(x) <- list( Trial = 1:287, Marker = 1:100, Time = 1:301, Location = 1:4 ) for(i in 1:4){ x[,,,i] <- runif(8638700) } # Step 1: # for each location, trial, and marker, calibrate (baseline) # according to first 50 time-points output <- filearray_create(tempfile(), dimension = dim(x)) # baseline-percentage change fmap( list(x), function(input){ # get locational data location_data <- input[[1]] dim(location_data) <- c(287, 100, 301) # collapse over first 50 time points for # each trial, and marker baseline <- apply(location_data[,,1:50], c(1,2), mean) # calibrate calibrated <- sweep(location_data, c(1,2), baseline, FUN = function(data, bl){ (data / bl - 1) * 100 }) return(calibrated) }, .y = output, # input dimension is 287 x 100 x 301 for each location # hence 4 loops in total .buffer_count = 4 ) # cleanup x$delete()}# cleanupx1$delete()x2$delete()output$delete()A generic function ofwhich that is'FileArray' compatible
Description
A generic function ofwhich that is'FileArray' compatible
Usage
fwhich(x, val, arr.ind = FALSE, ret.values = FALSE, ...)## Default S3 method:fwhich(x, val, arr.ind = FALSE, ret.values = FALSE, ...)## S3 method for class 'FileArray'fwhich(x, val, arr.ind = FALSE, ret.values = FALSE, ...)Arguments
x | any R vector, matrix, array or file-array |
val | values to find, or a function taking one argument (a slice of data vector) and returns either logical vector with the same length as the slice or index of the slice; see 'Examples' |
arr.ind | logical; should array indices be returned when |
ret.values | whether to return the values of corresponding indices as an attributes; default is false |
... | passed to |
Value
The indices ofx elements that are listed inval.
Examples
# ---- Default case ------------------------------------x <- array(1:27 + 2, rep(3,3))# find index of `x` equal to either 4 or 5fwhich(x, c(4,5))res <- fwhich(x, c(4,5), ret.values = TRUE)resattr(res, "values")# ---- file-array case --------------------------------arr <- filearray_create(tempfile(), dim(x))arr[] <- xfwhich(arr, c(4,5))fwhich(arr, c(4,5), arr.ind = TRUE, ret.values = TRUE)arr[2:3, 1, 1]# Clean up this examplearr$delete()# ---- `val` is a function ----------------------------x <- as_filearray(c(sample(15), 15), dimension = c(4,4))ret <- fwhich(x, val = which.max, ret.values = TRUE, arr.ind = FALSE)# ret is the indexret == which.max(x[])# attr(ret, "values") is the max valuemax(x[]) == attr(ret, "values")# customize `val`fwhich(x, ret.values = TRUE, arr.ind = FALSE, val = function( slice ) { slice > 10 # or which(slice > 10) })A map-reduce method to iterate blocks of file-array data with little memory usage
Description
A map-reduce method to iterate blocks of file-array data with little memory usage
Usage
mapreduce(x, map, reduce, ...)## S4 method for signature 'FileArray,ANY,function'mapreduce(x, map, reduce, buffer_size = NA, ...)## S4 method for signature 'FileArray,ANY,NULL'mapreduce(x, map, reduce, buffer_size = NA, ...)## S4 method for signature 'FileArray,ANY,missing'mapreduce(x, map, reduce, buffer_size = NA, ...)Arguments
x | a file array object |
map | mapping function that receives 3 arguments; see 'Details' |
reduce |
|
... | passed to other methods |
buffer_size | control how we split the array; see 'Details' |
Details
When handling out-of-memory arrays, it is recommended to loada block of array at a time and execute on block level. Seeapply for a implementation. When an array is too large,and when there are too many blocks, this operation will become very slow if computer memory is low. This is because the R will perform garbage collection frequently. Implemented inC++,mapreduce creates a buffer to storethe block data. By reusing the memory over and over again, it is possibleto iterate through the array with minimal garbage collections. Many statistics, includingmin,max,sum,mean, ... These statistics can be calculated in this way efficiently.
The functionmap contains three arguments:data (mandate),size (optional), andfirst_index (optional). Thedata is the buffer,whose length is consistent across iterations.size indicatesthe effective size of the buffer. If the partition sizeis not divisible by the buffer size, only firstsize elements ofthe data are from array, and the rest elements will beNA. This situation could only occurs whenbuffer_size is manually specified. By default, all ofdata should belong to arrays.The last argumentfirst_index is the index of the first elementdata[1] in the whole array. It is useful when positional data is needed.
The buffer size, specified bybuffer_size is an additional optional argument in.... Its default isNA,and will be calculated automatically. If manually specified, alarge buffer size would be desired to speed up the calculation.The default buffer size will not exceednThreads x 2MB, wherenThreads is the number of threads set byfilearray_threads.When partition length cannot be divided by the buffer size, instead oftrimming the buffer,NAs will be filled to the buffer, passed tomap function; see previous paragraph for treatments.
The functionmapreduce ignores the missing partitions. That meansif a partition is missing, its data will not be read nor passed tomap function. Please runx$initialize_partition() to make surepartition files exist.
Value
Ifreduce isNULL, return mapped results, otherwisereturn reduced results fromreduce function
Examples
x <- filearray_create(tempfile(), c(100, 100, 10))x[] <- rnorm(1e5)## calculate summation# identical to sum(x[]), but is more feasible in large casesmapreduce(x, map = function(data, size){ # make sure `data` is all from array if(length(data) != size){ data <- data[1:size] } sum(data)}, reduce = function(mapped_list){ do.call(sum, mapped_list)})## Find elements are less than -3positions <- mapreduce( x, map = function(data, size, first_index) { if (length(data) != size) { data <- data[1:size] } which(data < -3) + (first_index - 1) }, reduce = function(mapped_list) { do.call(c, mapped_list) })if(length(positions)){ x[[positions[1]]]}The type of a file array (extended)
Description
The type of a file array (extended)
Usage
typeof(x)## S4 method for signature 'FileArray'typeof(x)## S4 method for signature 'FileArrayProxy'typeof(x)Arguments
x | any file array |
Value
A character string. The possible values are"double","integer","logical", and"raw"