| Version: | 4.6.4 |
| Title: | Manage Massive Matrices with Shared Memory and Memory-MappedFiles |
| Depends: | R (≥ 3.2.0), |
| Imports: | bigmemory.sri, methods, utils, Rcpp, uuid (≥ 1.0-2) |
| Enhances: | biganalytics, bigtabulate |
| LinkingTo: | BH, uuid (≥ 1.0-2), Rcpp |
| Encoding: | UTF-8 |
| Description: | Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memory-mapped files. Packages 'biganalytics', 'bigtabulate', 'synchronicity', and 'bigalgebra' provide advanced functionality. |
| License: | LGPL-3 |Apache License 2.0 |
| URL: | https://github.com/kaneplusplus/bigmemory |
| BugReports: | https://github.com/kaneplusplus/bigmemory/issues |
| LazyLoad: | yes |
| Suggests: | testthat, remotes |
| RoxygenNote: | 7.2.3 |
| NeedsCompilation: | yes |
| Packaged: | 2024-01-09 17:18:13 UTC; mike |
| Author: | Michael J. Kane |
| Maintainer: | Michael J. Kane <bigmemoryauthors@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-01-09 20:20:08 UTC |
Manage massive matrices with shared memory and memory-mapped files.
Description
Create, store, access, and manipulate massive matrices. Matrices are, bydefault, allocated to shared memory and may use memory-mapped files.Packagesbiganalytics,synchronicity,bigalgebra, andbigtabulate provide advanced functionality. Access to andmanipulation of abig.matrix object is exposed in an S4class whose interface is similar to that of amatrix. Use ofthese packages in parallel environments can provide substantial speed andmemory efficiencies.bigmemory also provides aC++framework for the development of new tools that can work both withbig.matrix and nativematrix objects.
Details
Index of functions/methods (grouped in a friendly way):
big.matrix, filebacked.big.matrix, as.big.matrixis.big.matrix, is.separated, is.filebackeddescribe, attach.big.matrix, attach.resourcesub.big.matrix, is.sub.big.matrixdim, dimnames, nrow, ncol, print, head, tail, typeof, lengthread.big.matrix, write.big.matrixmwhichmorder, mpermutedeepcopyflush
Multi-gigabyte data sets challenge and frustrate users, even onwell-equipped hardware. Use ofC/C++ can provide efficiencies, butis cumbersome for interactive data analysis and lacks the flexibility andpower of 's rich statistical programming environment. The packagebigmemory and associated packagesbiganalytics,synchronicity,bigtabulate, andbigalgebra bridgethis gap, implementing massive matrices and supporting their manipulationand exploration. The datastructures may be allocated to shared memory, allowing separate processes onthe same computer to share access to a single copy of the data set. Thedata structures may also be file-backed, allowing users to easily manage andanalyze data sets larger than available RAM and share them across nodes of acluster. These features of the Bigmemory Project open the door for powerfuland memory-efficient parallel analyses and data mining of massive data sets.
This project (bigmemory and its sister packages) is still activelydeveloped, although the design and current features can be viewed as"stable." Please feel free to email us with any questions:bigmemoryauthors@gmail.com.
Memory considerations
For obvious reasons memory that thebig.matrix uses is managed outsidethe R memory pool available to the garbage collector and the memory occupiedby thebig.matrix is not visible to the R.This has subtle implications:
Memory usage is not visible via general R functions (e.g. the
gc()function)Garbage collector is mislead by the very small memory footprint of the
big.matrixobject (which acts merely as a pointer to the external memory structure), which can resultin much less eagerness to garbage-collect the unusedbig.memoryobjects.After removing a last reference to a bigbig.matrix, user should manually rungc()to reclaim the memory.Attaching the description of already finalized
big.matrixand accessing this objectwill result in undefined behavior, which simply means it will crash the current R sessionwith no hope of saving the data in it. To prevent R from de-allocating (finalizing) thematrices, user should keep at least onebig.memoryobject somewhere in R memory in atleast one R session on the current machine.Abruptly closed R (using e.g. task manager) will not have a chance to finalize the
big.matrixobjects, which will result in a memory leak, as thebig.matriceswill remain in the memory (perhaps under obfuscated names) with no easy way to reconnect R to them.
Note
Various options are available.options(bigmemory.typecast.warning) can be set to avoid annoyingwarnings that might occur if, for example, you assign objects (typicallytype double) to char, short, or integerbig.matrix objects.options(bigmemory.print.warning) protects against extracting andprinting a massive matrix (which would involve the creation of a secondmassive copy of the matrix).options(bigmemory.allow.dimnames) bydefault prevents the setting ofdimnames attributes, because theyaren't allocated to shared memory and changes will not be visible acrossprocesses.options(bigmemory.default.type) is"double" bedefault (a change in default behavior as of 4.1.1) but may be changed by theuser.
Note that you can't simply use abig.matrix with many (most) existingfunctions (e.g.lm,kmeans). One nice exceptionissplit, because this function only accesses subsets of thematrix.
Author(s)
Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.
Maintainers: Michael J. Kane bigmemoryauthors@gmail.com
See Also
For example,big.matrix,mwhich,read.big.matrix
Examples
# Our examples are all trivial in size, rather than burning huge amounts# of memory.x <- big.matrix(5, 2, type="integer", init=0, dimnames=list(NULL, c("alpha", "beta")))xx[1:2,]x[,1] <- 1:5x[,"alpha"]colnames(x)options(bigmemory.allow.dimnames=TRUE)colnames(x) <- NULLx[,]Extract or Replace
Description
Extract or replace big.matrix elements
Usage
## S4 method for signature 'big.matrix,ANY,ANY,missing'x[i, j, drop]## S4 method for signature 'big.matrix,ANY,ANY,logical'x[i, j, drop]## S4 method for signature 'big.matrix,missing,ANY,missing'x[i, j, drop]## S4 method for signature 'big.matrix,missing,ANY,logical'x[i, j, drop]## S4 method for signature 'big.matrix,ANY,missing,missing'x[i, j, ..., drop = TRUE]## S4 method for signature 'big.matrix,ANY,missing,logical'x[i, j, drop]## S4 method for signature 'big.matrix,missing,missing,missing'x[i, j, drop]## S4 method for signature 'big.matrix,missing,missing,logical'x[i, j, drop]## S4 method for signature 'big.matrix,matrix,missing,missing'x[i, j, drop]## S4 replacement method for signature 'big.matrix,numeric,numeric,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,numeric,logical,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,logical,numeric,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,logical,logical,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,logical,character,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,numeric,character,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,missing,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,numeric,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,logical,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,numeric,missing,numeric'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,logical,missing,numeric'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,numeric,missing,matrix'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,logical,missing,matrix'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,character,character,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,character,ANY'x[j] <- value## S4 replacement method for signature 'big.matrix,character,missing,ANY'x[i] <- value## S4 replacement method for signature 'big.matrix,missing,missing,numeric'x[i, j] <- value## S4 replacement method for signature 'big.matrix,matrix,missing,numeric'x[i, j] <- valueArguments
x | A |
i | Indices specifying the rows |
j | Indices specifying the columns |
drop | Logical indication if reduce to minimum dimensions |
... | Additional arguments |
value | typically an array-like R object of similar class |
big.matrix size
Description
Returns the size of the created matrix in bytes
Usage
GetMatrixSize(bigMat)Arguments
bigMat | a |
Create a “big.matrix” from a matrix or vector.
Description
Create abig.matrix from amatrix orvectorordata.frame;avector will result in abig.matrix with one column.A data frame will have character vectors converted to factors, and thenall factors converted to numeric factor levels. All labels or charactervalues will be lost.
Methods
signature(x = "matrix")...
signature(x = "vector")...
signature(x = "data.frame")...
Convert to base R matrix
Description
Extract values from abig.matrix objectand convert to a base R matrix object
Usage
## S4 method for signature 'big.matrix'as.matrix(x)Arguments
x | A big.matrix object |
The core "big.matrix" operations.
Description
Create abig.matrix (or check to see if an objectis abig.matrix, or create abig.matrix from amatrix, and so on). Thebig.matrix may be file-backed.
Usage
big.matrix( nrow, ncol, type = options()$bigmemory.default.type, init = NULL, dimnames = NULL, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE, shared = options()$bigmemory.default.shared)filebacked.big.matrix( nrow, ncol, type = options()$bigmemory.default.type, init = NULL, dimnames = NULL, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE)as.big.matrix( x, type = NULL, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE, shared = options()$bigmemory.default.shared)is.big.matrix(x)## S4 method for signature 'big.matrix'is.big.matrix(x)## S4 method for signature 'ANY'is.big.matrix(x)is.separated(x)## S4 method for signature 'big.matrix'is.separated(x)is.filebacked(x)## S4 method for signature 'big.matrix'is.filebacked(x)shared.name(x)## S4 method for signature 'big.matrix'shared.name(x)file.name(x)## S4 method for signature 'big.matrix'file.name(x)dir.name(x)## S4 method for signature 'big.matrix'dir.name(x)is.shared(x)## S4 method for signature 'big.matrix'is.shared(x)is.readonly(x)## S4 method for signature 'big.matrix'is.readonly(x)is.nil(address)Arguments
nrow | number of rows. |
ncol | number of columns. |
type | the type of the atomic element( |
init | a scalar value for initializing the matrix ( |
dimnames | a list of the row and column names; use with cautionfor large objects. |
separated | use separated column organization of the data;see details. |
backingfile | the root name for the file(s) for the cache of |
backingpath | the path to the directory containing the filebacking cache. |
descriptorfile | the name of the file to hold the backingfiledescription, for subsequent use with |
binarydescriptor | the flag to specify if the binary RDS formatshould be used for the backingfile description, for subsequent use with |
shared |
|
x | a |
address | an |
Details
Abig.matrix consists of an object inR that does nothingmore than point to the data structure implemented inC++. Theobject acts much like a traditionalR matrix, but helps protect the userfrom many inadvertent memory-consuming pitfalls of traditionalR matricesand data frames.
There are twobig.matrix types which managedata in different ways. A standard, sharedbig.matrix is constrainedto availableRAM, and may be shared across separateR processes.A file-backedbig.matrix may exceed availableRAM byusing hard drive space, and may also be shared across processes. Theatomic types of these matrices may bedouble,integer,short, orchar (8, 4, 2, and 1 bytes, respectively).
Ifx is abig.matrix, thenx[1:5,] is returned as an Rmatrix containing the first five rows ofx. Ifx is oftypedouble, then the result will benumeric; otherwise, theresult will be anintegerR matrix. The expressionx alonewill display information about theR object (e.g. the external pointer)rather than evaluating the matrix itself (the user should tryx[,]with extreme caution, recognizing that a hugeRmatrix willbe created).
Ifx has a huge number of rows and/or columns, then the use ofrownames and/orcolnames will be extremely memory-intensiveand should be avoided. Ifx has a huge number of columns andseparated=TRUE is used (this isn't typically recommended),the user might want to store the transpose as there is overhead of apointer for each column in the matrix. Ifseparated isTRUE,then the memory is allocated into separate vectors for each column.Use this option with caution if you have a large number of columns, asshared-memory segments are limited by OS and hardware combinations. Ifseparated isFALSE, the matrix is stored in traditionalcolumn-major format. The functionis.separated() returns theseparation type of thebig.matrix.
When abig.matrix,x, is passed as an argumentto a function, it is essentially providing call-by-reference rather thancall-by-value behavior. If the function modifies any of the values ofx, the changes are not limited in scope to a local copy within thefunction. This introduces the possibility of side-effects, in contrast tostandardR behavior.
A file-backedbig.matrix may exceed availableRAM in sizeby using a file cache (or possibly multiple file caches, ifseparated=TRUE). This can incur a substantial performance penalty forsuch large matrices, but less of a penalty than most other approaches forhandling such large objects. A side-effect of creating a file-backed objectis not only the file-backing(s), but a descriptor file (in the samedirectory) that is needed for subsequent attachments (seeattach.big.matrix).
Note that we do not allow setting or changing thedimnames attributesby default; such changes would not be reflected in the descriptor objects orin shared memory. To override this, setoptions(bigmemory.allow.dimnames=TRUE).
It should also be noted that a user can create an “anonymous” file-backedbig.matrix by specifying "" as thefilebacking argument.In this case, the backing resides in the temporary directory and adescriptor file is not created. These should be used with caution sinceeven anonymous backings use disk space which could eventually fill thehard drive. Anonymous backings are removed either manually, by auser, or automatically, when the operating system deems it appropriate.
Finally, note thatas.big.matrix can coerce data frames. It doesthis by making any character columns into factors, and then making allfactors numeric before forming thebig.matrix. Level labels arenot preserved and must be managed by the user if desired.
Value
Abig.matrix is returned (forbig.matrix andfilebacked.big.matrix, andas.big.matrix),andTRUE orFALSE foris.big.matrix and theother functions.
Author(s)
John W. Emerson and Michael J. Kanebigmemoryauthors@gmail.com
References
The Bigmemory Project:http://www.bigmemory.org/.
See Also
bigmemory, and perhaps the class documentation ofbig.matrix;attach.big.matrix anddescribe. Sister packagesbiganalytics,bigtabulate,synchronicity, andbigalgebra provide advanced functionality.
Examples
x <- big.matrix(10, 2, type='integer', init=-5)options(bigmemory.allow.dimnames=TRUE)colnames(x) <- c("alpha", "beta")is.big.matrix(x)dim(x)colnames(x)rownames(x)x[,]x[1:8,1] <- 11:18colnames(x) <- NULLx[,]# The following shared memory example is quite silly, as you wouldn't# likely do this in a single R session. But if zdescription were# passed to another R session via SNOW, foreach, or even by a# simple file read/write, then the attach.big.matrix() within the# second R process would give access to the same object in memory.# Please see the package vignette for real examples.z <- big.matrix(3, 3, type='integer', init=3)z[,]dim(z)z[1,1] <- 2z[,]zdescription <- describe(z)zdescriptiony <- attach.big.matrix(zdescription)y[,]yzy[1,1] <- -100y[,]z[,]Class "big.matrix"
Description
Thebig.matrix class is designed for matrices withelements of typedouble,integer,short, orchar.Abig.matrix acts much like a traditionalR matrix, but helps protectthe user from many inadvertent memory-consuming pitfalls of traditionalRmatrices and data frames. The objects are allocated to shared memory,and if file-backing is used they may exceed virtual memory in size. Sadly,32-bit operating system constraints – largely Windows and some MacOS versions–will be a limiting factor with file-backed matrices; 64-bit operatingsystems are recommended.
Objects from the Class
Unlike manyR objects, objects should not be created by calls of the formnew("big.matrix", ...). The functionsbig.matrix()andfilebacked.big.matrix() are intended for the user.
Slots
address:Object of class
"externalptr"points to the memory location of theC++ data structure.
Methods
As you would expect:
- [<-
signature(x = "big.matrix", i = "ANY", j = "ANY"): ...- [<-
signature(x = "big.matrix", i = "ANY", j = "missing"): ...- [<-
signature(x = "big.matrix", i = "missing", j = "ANY"): ...- [<-
signature(x = "big.matrix", i = "missing", j = "missing"): ...- [<-
signature(x = "big.matrix", i = "matrix", j = "missing"): ...- [
signature(x = "big.matrix", i = "ANY", j = "ANY", drop = "missing"): ...- [
signature(x = "big.matrix", i = "ANY", j = "ANY", drop = "logical"): ...- [
signature(x = "big.matrix", i = "ANY", j = "missing", drop = "missing"): ...- [
signature(x = "big.matrix", i = "ANY", j = "missing", drop = "logical"): ...- [
signature(x = "big.matrix", i = "matrix", j = "missing", drop = "logical"): ...- [
signature(x = "big.matrix", i = "missing", j = "ANY", drop = "missing"): ...- [
signature(x = "big.matrix", i = "missing", j = "ANY", drop = "logical"): ...- [
signature(x = "big.matrix", i = "missing", j = "missing", drop = "missing"): ...- [
signature(x = "big.matrix", i = "missing", j = "missing", drop = "logical"): ...
The following are probably more interesting:
- describe
signature(x = "big.matrix"): provide necessary andsufficient information for the sharing or re-attaching of the object.- dim
signature(x = "big.matrix"): returns the dimension of thebig.matrix.- length
signature(x = "big.matrix"): returns the product of thedimensions of thebig.matrix.- dimnames<-
signature(x = "big.matrix", value = "list"): setthe row and column names, prohibited by default (seebigmemoryto override).- dimnames
signature(x = "big.matrix"): get the row and columnnames.- head
signature(x = "big.matrix"): get the first 6 (orn) rows.- as.matrix
signature(x = "big.matrix"): coerce abig.matrixto amatrix.- is.big.matrix
signature(x = "big.matrix"): returnTRUEif it's abig.matrix.- is.filebacked
signature(x = "big.matrix"): returnTRUEif there is a file-backing.- is.separated
signature(x = "big.matrix"): returnTRUEif thebig.matrixis organized as a separated column vectors.- is.sub.big.matrix
signature(x = "big.matrix"): returnTRUEif this is a sub-matrix of abig.matrix.- ncol
signature(x = "big.matrix"): returns the number ofcolumns.- nrow
signature(x = "big.matrix"): returns the number of rows.signature(x = "big.matrix"): a traditionalprint()is intentionally disabled, and returnshead(x)unlessoptions()$bm.print.warning==FALSE; in this case,print(x[,])is the result, which could be very big!- sub.big.matrix
signature(x = "big.matrix"): forcontiguous submatrices.- tail
signature(x = "big.matrix"): returns the last 6 (orn) rows.- typeof
signature(x = "big.matrix"): return the type of theatomic elements of thebig.matrix.- write.big.matrix
signature(bigMat = "big.matrix", fileName = "character"): produce an ASCII file from thebig.matrix.- apply
signature(x = "big.matrix"):apply()whereMARGINmay only be 1 or 2, but otherwise conforming to what youwould expect fromapply().
Author(s)
Michael J. Kane and John W. Emersonbigmemoryauthors@gmail.com
See Also
Examples
showClass("big.matrix")Class "big.matrix.descriptor"
Description
An object of this class contains necessary and sufficient informationto “attach” a shared or filebackedbig.matrix.
Usage
## S4 method for signature 'character'attach.resource(obj, ...)## S4 method for signature 'big.matrix.descriptor'attach.resource(obj, ...)Arguments
obj | The filename of the descriptor for a filebacked matrix,assumed to be in the directory specified |
... | possibly |
Objects from the Class
Objects should not be created by calls of the formnew("big.matrix.descriptor", ...),but should use thedescribe function.
Slots
description:Object of class
"list"; details omitted.
Extends
Class"descriptor", directly.
Methods
- attach.resource
signature(obj = "big.matrix.descriptor"): ...- sub.big.matrix
signature(x = "big.matrix.descriptor"): ...
Note
We provideattach.resource for convenience, but expect most userswill preferattach.big.matrix.
Author(s)
John W. Emerson and Michael J. Kane
References
Other types of descriptors are defined in packagesynchronicity.
See Also
See alsoattach.big.matrix.
Examples
showClass("big.matrix.descriptor")Produces a physical copy of a “big.matrix”
Description
This is needed to make a duplicate of abig.matrix, with the new copyoptionally filebacked.
Usage
deepcopy( x, cols = NULL, rows = NULL, y = NULL, type = NULL, separated = NULL, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE, shared = options()$bigmemory.default.shared)Arguments
x | |
cols | possible subset of columns for the deepcopy; could be numeric,named, or logical. |
rows | possible subset of rows for the deepcopy; could be numeric,named, or logical. |
y | optional destination object ( |
type | preferably specified, |
separated | use separated column organization of the data instead ofcolumn-major organization; use with caution if the number of columns islarge. |
backingfile | the root name for the file(s) for the cache of |
backingpath | the path to the directory containing the file-backingcache. |
descriptorfile | we recommend specifying this for file-backing. |
binarydescriptor | the flag to specify if the binary RDS format shouldbe used for the backingfile description, for subsequent use with |
shared |
|
Details
This is needed to make a duplicate of abig.matrix, becausetraditional syntax would only copy the object (the pointer to thebig.matrix rather than thebig.matrix itself).It can also make a copy of only a subset of columns.
Value
See Also
Examples
x <- as.big.matrix(matrix(1:30, 10, 3))y <- deepcopy(x, -1) # Don't include the first column.xyhead(x)head(y)The basic “big.matrix” operations for sharing and re-attaching.
Description
Thedescribe function returns the information needed byattach.big.matrix to reference a shared or file-backedbig.matrix object.Theattach.big.matrix andattach.resource functions create anewbig.matrix object based on the descriptor information referencingpreviously allocated shared-memory or file-backed matrices.
Usage
## S4 method for signature 'big.matrix'describe(x)attach.big.matrix(obj, ...)Arguments
x | a |
obj | an object as returned by |
... | possibly |
Details
Thedescribe function returns a list of the information needed toattach to abig.matrix object.A descriptor file is automatically created when a new filebackedbig.matrix is created.
Value
describe returns a list of of the information needed to attach toabig.matrix object.
attach.big.matrix return a new instance of typebig.matrixcorresponding to a shared-memory or file-backedbig.matrix.
Author(s)
Michael J. Kane and John W. Emersonbigmemoryauthors@gmail.com
See Also
bigmemory,big.matrix, or the classdocumentationbig.matrix.
Examples
# The example is quite silly, as you wouldn't likely do this in a# single R session. But if zdescription were passed to another R session# via SNOW, foreach, or even by a simple file read/write,# then the attach of the second R process would give access to the# same object in memory. Please see the package vignette for real examples.z <- big.matrix(3, 3, type='integer', init=3)z[,]dim(z)z[1,1] <- 2z[,]zdescription <- describe(z)zdescriptiony <- attach.big.matrix(zdescription)y[,]yzzz <- attach.resource(zdescription)zz[1,1] <- -100y[,]z[,]Dimensions of a big.matrix object
Description
Retrieve the dimensions of abig.matrix object
Usage
## S4 method for signature 'big.matrix'dim(x)Arguments
x | A |
Dimnames of a big.matrix Object
Description
Retrieve or set the dimnames of an object
Usage
## S4 method for signature 'big.matrix'dimnames(x)## S4 replacement method for signature 'big.matrix,list'dimnames(x) <- valueArguments
x | A big.matrix object |
value | A possible value for |
Updating a big.matrix filebacking.
Description
For a file-backedbig.matrix object,flush() forcesany modified information to be written to the file-backing.
Usage
flush(con)## S4 method for signature 'big.matrix'flush(con)Arguments
con | filebacked |
Details
This function flushes any modified data (inRAM) of a file-backedbig.matrix to disk. This may be useful forimproving performance in cases where allowing the operating system to decideon flushing creates a bottleneck (likely near the threshold of availableRAM).
Value
TRUE orFALSE (invisible), indicating whether or not the flush was successful.
Author(s)
John W. Emerson and Michael J. Kane
Examples
temp_dir = tempdir()if (!dir.exists(temp_dir)) dir.create(temp_dir)x <- big.matrix(nrow=3, ncol=3, backingfile='flushtest.bin', descriptorfile='flushtest.desc', backingpath=temp_dir, type='integer')x[1,1] <- 0flush(x)Return First or Last Part of a big.matrix Object
Description
Returns the first or last parts of abig.matrixobject.
Usage
## S4 method for signature 'big.matrix'head(x, n = 6)## S4 method for signature 'big.matrix'tail(x, n = 6)Arguments
x | A big.matrix object |
n | A single integer for the number of rows to return |
Check if Float
Description
Check to see if the elements of a big.matrix object are floats.
Usage
is.float(x)Arguments
x | An object to be evaluated if float |
Is Float?
Description
Check if R numeric value has float flag
Usage
## S4 method for signature 'numeric'is.float(x)Arguments
x | A numeric value |
Submatrix support
Description
This doesn't create a copy, it just provides a new version of the classwhich provides behavior for a contiguous submatrix of the big.matrix.Non-contiguous submatrices are not supported.
Usage
is.sub.big.matrix(x)## S4 method for signature 'big.matrix'is.sub.big.matrix(x)sub.big.matrix( x, firstRow = 1, lastRow = NULL, firstCol = 1, lastCol = NULL, backingpath = NULL)## S4 method for signature 'big.matrix'sub.big.matrix( x, firstRow = 1, lastRow = NULL, firstCol = 1, lastCol = NULL, backingpath = NULL)## S4 method for signature 'big.matrix.descriptor'sub.big.matrix( x, firstRow = 1, lastRow = NULL, firstCol = 1, lastCol = NULL, backingpath = NULL)Arguments
x | A descriptor object |
firstRow | the first row of the submatrix |
lastRow | the last row of the submatrix if not NULL |
firstCol | the first column of the submatrix |
lastCol | of the submatrix if not NULL |
backingpath | required path to the filebacked object, if applicable |
Details
Thesub.big.matrix function allows a user to create abig.matrixobject that references a contiguous set of columns and rows of anotherbig.matrix object.
Theis.sub.big.matrix function returnsTRUE if the specifiedargument is asub.big.matrix object and returnFALSEotherwise.
Value
Abig.matrix which is actually a submatrix of a largerbig.matrix.It is not a physical copy. Only contiguous blocks may form a submatrix.
Author(s)
John W. Emerson and Michael J. Kane
See Also
Examples
x <- big.matrix(10, 5, init=0, type="double")x[,] <- 1:50y <- sub.big.matrix(x, 2, 9, 2, 3)y[,]y[1,1] <- -99x[,]rm(x)Length of a big.matrix object
Description
Get the length of abig.matrix object
Usage
## S4 method for signature 'big.matrix'length(x)Arguments
x | A |
Ordering and Permuting functions forbig.matrix'' and matrix” objects
Description
Themorder function returns a permutation of rowindices which can be used to rearrange an object according to the valuesin the specified columns (a multi-column ordering).Thempermute function actually reorders the rows of abig.matrix ormatrix based onan order vector or a desired ordering on a set of columns.
Usage
morder(x, cols, na.last = TRUE, decreasing = FALSE)morderCols(x, rows, na.last = TRUE, decreasing = FALSE)mpermute(x, order = NULL, cols = NULL, allow.duplicates = FALSE, ...)mpermuteCols(x, order = NULL, rows = NULL, allow.duplicates = FALSE, ...)Arguments
x | A |
cols | The columns of |
na.last | for controlling the treatment of |
decreasing | logical. Should the sort order be increasing ordecreasing? |
rows | The rows of |
order | A vector specifying the reordering of rows, i.e. theresult of a call to |
allow.duplicates | ff |
... | optional parameters to pass to |
Details
Themorder function behaves similar toorder,returning a permutation of1:nrow(x) which rearranges objectsaccording to the values in the specified columns. However,mordertakes abig.matrix or anRmatrix (with numeric type) anda set of columns (cols) with which to determine the ordering;morder does not incur the same memory overhead required byorder, and runs more quickly.
Thempermute function changes the row ordering of abig.matrixormatrix based on a vectororder or an ordering basedon a set of columns specified bycols. It should be noted thatthis function has side-effects, that isx is changed when thisfunction is called.
Value
morder returns an ordering vector.mpermute returns nothing but does change the contents ofx.This type of a side-effect is generally frowned upon inR, but we “break”the rules here to avoid memory overhead and improve performance.
Author(s)
Michael J. Kanebigmemoryauthors@gmail.com
See Also
Examples
m = matrix(as.double(as.matrix(iris)), nrow=nrow(iris))morder(m, 1)order(m[,1])m[order(m[,1]), 2]mpermute(m, cols=1)m[,2]Expanded “which”-like functionality.
Description
Implementswhich-like functionality for abig.matrix,with additional options for efficient comparisons (executed inC++);also works for regular numeric matrices without the memory overhead.
Usage
mwhich(x, cols, vals, comps, op = "AND")Arguments
x | a |
cols | a vector of column indices or names. |
vals | a list (one component for each of |
comps | a list of operators (one component for each of |
op | the comparison operator for combining the results of theindividual tests, either |
Details
To improve performance and avoid the creation of massive temporary vectorsinR when doing comparisons,mwhich() efficiently executescolumn-by-column comparisons of values to the specified values or ranges,and then returns the row indices satisfying the comparison specified by theop operator. More advanced comparisons are then possible(and memory-efficient) inR by doing set operations (unionandintersect, for example) on the results of multiplemwhich() calls.
Note thatNA is a valid argument in conjunction with'eq' or'neq', replacing traditionalis.na() calls.And both-Inf andInf can be used for one-sided inequalities.
Ifmwhich() is used with a regular numericRmatrix, weaccess the data directly and thus incur no memory overhead. Interesteddevelopers might want to look at our code for this case, which uses a handypointer trick (accessor) inC++.
Value
a vector of row indices satisfying the criteria.
Author(s)
John W. Emersonbigmemoryauthors@gmail.com
See Also
Examples
x <- as.big.matrix(matrix(1:30, 10, 3))options(bigmemory.allow.dimnames=TRUE)colnames(x) <- c("A", "B", "C")x[,]x[mwhich(x, 1:2, list(c(2,3), c(11,17)), list(c('ge','le'), c('gt', 'lt')), 'OR'),]x[mwhich(x, c("A","B"), list(c(2,3), c(11,17)), list(c('ge','le'), c('gt', 'lt')), 'AND'),]# These should produce the same answer with a regular matrix:y <- matrix(1:30, 10, 3)y[mwhich(y, 1:2, list(c(2,3), c(11,17)), list(c('ge','le'), c('gt', 'lt')), 'OR'),]y[mwhich(y, -3, list(c(2,3), c(11,17)), list(c('ge','le'), c('gt', 'lt')), 'AND'),]x[1,1] <- NAmwhich(x, 1:2, NA, 'eq', 'OR')mwhich(x, 1:2, NA, 'neq', 'AND')# Column 1 equal to 4 and/or column 2 less than or equal to 16:mwhich(x, 1:2, list(4, 16), list('eq', 'le'), 'OR')mwhich(x, 1:2, list(4, 16), list('eq', 'le'), 'AND')# Column 2 less than or equal to 15:mwhich(x, 2, 15, 'le')# No NAs in either column, and column 2 strictly less than 15:mwhich(x, c(1:2,2), list(NA, NA, 15), list('neq', 'neq', 'lt'), 'AND')x <- big.matrix(4, 2, init=1, type="double")x[1,1] <- Infmwhich(x, 1, Inf, 'eq')mwhich(x, 1, 1, 'gt')mwhich(x, 1, 1, 'le')Expanded “which”-like functionality.
Description
Implementswhich-like functionality for abig.matrix, with additional options for efficient comparisons(executed inC++); also works for regular numeric matrices withoutthe memory overhead.test
Methods
- signature(x = "big.matrix=", cols= "ANY", vals = "ANY",", " comps = "ANY", op = "character")
...
- signature(x = "big.matrix", cols = "ANY", vals ="ANY",", " comps = "ANY", op = "missing")
...
- signature(x = "matrix", cols = "ANY", vals = "ANY",", "comps = "ANY", op = "character")
...
- signature(x = "matrix", cols = "ANY", vals = "ANY","," comps = "ANY", op = "missing")
...
See Also
The Number of Rows/Columns of a big.matrix
Description
nrow andncol return the number ofrows or columns present in abig.matrix object.
Usage
## S4 method for signature 'big.matrix'ncol(x)## S4 method for signature 'big.matrix'nrow(x)Arguments
x | A big.matrix object |
Value
An integer of length 1
Print Values
Description
print will print out the elements withinabig.matrix object.
Usage
## S4 method for signature 'big.matrix'print(x)Arguments
x | A |
Note
By default, this will only return thehead of a big.matrixto prevent console overflow. If you turn off the bigmemory.print.warningoption then it will convert to a base R matrix and print all elements.
The Type of a big.matrix Object
Description
typeof returns the storage type of abig.matrix object
Usage
## S4 method for signature 'big.matrix'typeof(x)Arguments
x | A |
File interface for a “big.matrix”
Description
Create abig.matrix by reading from asuitably-formatted ASCII file, orwrite the contents of abig.matrix to a file.
Usage
write.big.matrix(x, filename, row.names = FALSE, col.names = FALSE, sep = ",")## S4 method for signature 'big.matrix,character'write.big.matrix(x, filename, row.names = FALSE, col.names = FALSE, sep = ",")read.big.matrix( filename, sep = ",", header = FALSE, col.names = NULL, row.names = NULL, has.row.names = FALSE, ignore.row.names = FALSE, type = NA, skip = 0, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE, extraCols = NULL, shared = options()$bigmemory.default.shared)## S4 method for signature 'character'read.big.matrix( filename, sep = ",", header = FALSE, col.names = NULL, row.names = NULL, has.row.names = FALSE, ignore.row.names = FALSE, type = NA, skip = 0, separated = FALSE, backingfile = NULL, backingpath = NULL, descriptorfile = NULL, binarydescriptor = FALSE, extraCols = NULL, shared = options()$bigmemory.default.shared)Arguments
x | |
filename | the name of an input/output file. |
row.names | a vector of names, use them even if row names appear toexist in the file. |
col.names | a vector of names, use them even if column names existin the file. |
sep | a field delimiter. |
header | if |
has.row.names | if |
ignore.row.names | if |
type | preferably specified, |
skip | number of lines to skip at the head of the file. |
separated | use separated column organization of the data instead ofcolumn-major organization. |
backingfile | the root name for the file(s) for the cache of |
backingpath | the path to the directory containing the file backingcache. |
descriptorfile | the file to be used for the description of thefilebacked matrix. |
binarydescriptor | the flag to specify if the binary RDS format shouldbe used for the backingfile description, for subsequent use with |
extraCols | the optional number of extra columns to be appended to thematrix for future use. |
shared | if |
Details
Files must contain only one atomic type(allinteger, for example). You, the user, should know whetheryour file has row and/or column names, and various combinations of optionsshould be helpful in obtaining the desired behavior.
When reading from a file, iftype is not specified we try tomake a reasonable guess for you withoutmaking any guarantees at this point.Unless you have really large integer values, we recommendyou consider"short". If you have something that is essentiallycategorical, you might even be able use"char", with huge memorysavings for large data sets.
Any non-numeric entry will be ignored and replaced withNA,so reading something that traditionally would be adata.framewon't cause an error. A warning is issued.
Wishlist: we'd like to provide an option to ignore specified columns whiledoing reads.Or perhaps to specify columns targeted for factor or character conversionto numeric values. Would you use such features? Email us and let us know!
Value
abig.matrix object is returned byread.big.matrix,whilewrite.big.matrix creates an output file (a path could be partoffilename).
Author(s)
John W. Emerson and Michael J. Kanebigmemoryauthors@gmail.com
See Also
Examples
# Without specifying the type, this big.matrix x will hold integers.x <- as.big.matrix(matrix(1:10, 5, 2))x[2,2] <- NAx[,]temp_dir = tempdir()if (!dir.exists(temp_dir)) dir.create(temp_dir)write.big.matrix(x, file.path(temp_dir, "foo.txt"))# Just for fun, I'll read it back in as character (1-byte integers):y <- read.big.matrix(file.path(temp_dir, "foo.txt"), type="char")y[,]# Other examples:w <- as.big.matrix(matrix(1:10, 5, 2), type='double')w[1,2] <- NAw[2,2] <- -Infw[3,2] <- Infw[4,2] <- NaNw[,]write.big.matrix(w, file.path(temp_dir, "bar.txt"))w <- read.big.matrix(file.path(temp_dir, "bar.txt"), type="double")w[,]w <- read.big.matrix(file.path(temp_dir, "bar.txt"), type="short")w[,]# Another example using row names (which we don't like).x <- as.big.matrix(as.matrix(iris), type='double')rownames(x) <- as.character(1:nrow(x))head(x)write.big.matrix(x, file.path(temp_dir, 'IrisData.txt'), col.names=TRUE, row.names=TRUE)y <- read.big.matrix(file.path(temp_dir, "IrisData.txt"), header=TRUE, has.row.names=TRUE)head(y)# The following would fail with a dimension mismatch:if (FALSE) y <- read.big.matrix(file.path(temp_dir, "IrisData.txt"), header=TRUE)