Movatterモバイル変換


[0]ホーム

URL:


Version:4.6.4
Title:Manage Massive Matrices with Shared Memory and Memory-MappedFiles
Depends:R (≥ 3.2.0),
Imports:bigmemory.sri, methods, utils, Rcpp, uuid (≥ 1.0-2)
Enhances:biganalytics, bigtabulate
LinkingTo:BH, uuid (≥ 1.0-2), Rcpp
Encoding:UTF-8
Description:Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memory-mapped files. Packages 'biganalytics', 'bigtabulate', 'synchronicity', and 'bigalgebra' provide advanced functionality.
License:LGPL-3 |Apache License 2.0
URL:https://github.com/kaneplusplus/bigmemory
BugReports:https://github.com/kaneplusplus/bigmemory/issues
LazyLoad:yes
Suggests:testthat, remotes
RoxygenNote:7.2.3
NeedsCompilation:yes
Packaged:2024-01-09 17:18:13 UTC; mike
Author:Michael J. KaneORCID iD [aut, cre], John W. Emerson [aut], Peter Haverty [aut], Charles Determan [aut]
Maintainer:Michael J. Kane <bigmemoryauthors@gmail.com>
Repository:CRAN
Date/Publication:2024-01-09 20:20:08 UTC

Manage massive matrices with shared memory and memory-mapped files.

Description

Create, store, access, and manipulate massive matrices. Matrices are, bydefault, allocated to shared memory and may use memory-mapped files.Packagesbiganalytics,synchronicity,bigalgebra, andbigtabulate provide advanced functionality. Access to andmanipulation of abig.matrix object is exposed in an S4class whose interface is similar to that of amatrix. Use ofthese packages in parallel environments can provide substantial speed andmemory efficiencies.bigmemory also provides aC++framework for the development of new tools that can work both withbig.matrix and nativematrix objects.

Details

Index of functions/methods (grouped in a friendly way):

big.matrix, filebacked.big.matrix, as.big.matrixis.big.matrix, is.separated, is.filebackeddescribe, attach.big.matrix, attach.resourcesub.big.matrix, is.sub.big.matrixdim, dimnames, nrow, ncol, print, head, tail, typeof, lengthread.big.matrix, write.big.matrixmwhichmorder, mpermutedeepcopyflush

Multi-gigabyte data sets challenge and frustrate users, even onwell-equipped hardware. Use ofC/C++ can provide efficiencies, butis cumbersome for interactive data analysis and lacks the flexibility andpower of 's rich statistical programming environment. The packagebigmemory and associated packagesbiganalytics,synchronicity,bigtabulate, andbigalgebra bridgethis gap, implementing massive matrices and supporting their manipulationand exploration. The datastructures may be allocated to shared memory, allowing separate processes onthe same computer to share access to a single copy of the data set. Thedata structures may also be file-backed, allowing users to easily manage andanalyze data sets larger than available RAM and share them across nodes of acluster. These features of the Bigmemory Project open the door for powerfuland memory-efficient parallel analyses and data mining of massive data sets.

This project (bigmemory and its sister packages) is still activelydeveloped, although the design and current features can be viewed as"stable." Please feel free to email us with any questions:bigmemoryauthors@gmail.com.

Memory considerations

For obvious reasons memory that thebig.matrix uses is managed outsidethe R memory pool available to the garbage collector and the memory occupiedby thebig.matrix is not visible to the R.This has subtle implications:

Note

Various options are available.options(bigmemory.typecast.warning) can be set to avoid annoyingwarnings that might occur if, for example, you assign objects (typicallytype double) to char, short, or integerbig.matrix objects.options(bigmemory.print.warning) protects against extracting andprinting a massive matrix (which would involve the creation of a secondmassive copy of the matrix).options(bigmemory.allow.dimnames) bydefault prevents the setting ofdimnames attributes, because theyaren't allocated to shared memory and changes will not be visible acrossprocesses.options(bigmemory.default.type) is"double" bedefault (a change in default behavior as of 4.1.1) but may be changed by theuser.

Note that you can't simply use abig.matrix with many (most) existingfunctions (e.g.lm,kmeans). One nice exceptionissplit, because this function only accesses subsets of thematrix.

Author(s)

Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.

Maintainers: Michael J. Kane bigmemoryauthors@gmail.com

See Also

For example,big.matrix,mwhich,read.big.matrix

Examples

# Our examples are all trivial in size, rather than burning huge amounts# of memory.x <- big.matrix(5, 2, type="integer", init=0,                dimnames=list(NULL, c("alpha", "beta")))xx[1:2,]x[,1] <- 1:5x[,"alpha"]colnames(x)options(bigmemory.allow.dimnames=TRUE)colnames(x) <- NULLx[,]

Extract or Replace

Description

Extract or replace big.matrix elements

Usage

## S4 method for signature 'big.matrix,ANY,ANY,missing'x[i, j, drop]## S4 method for signature 'big.matrix,ANY,ANY,logical'x[i, j, drop]## S4 method for signature 'big.matrix,missing,ANY,missing'x[i, j, drop]## S4 method for signature 'big.matrix,missing,ANY,logical'x[i, j, drop]## S4 method for signature 'big.matrix,ANY,missing,missing'x[i, j, ..., drop = TRUE]## S4 method for signature 'big.matrix,ANY,missing,logical'x[i, j, drop]## S4 method for signature 'big.matrix,missing,missing,missing'x[i, j, drop]## S4 method for signature 'big.matrix,missing,missing,logical'x[i, j, drop]## S4 method for signature 'big.matrix,matrix,missing,missing'x[i, j, drop]## S4 replacement method for signature 'big.matrix,numeric,numeric,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,numeric,logical,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,logical,numeric,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,logical,logical,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,logical,character,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,numeric,character,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,missing,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,numeric,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,logical,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,numeric,missing,numeric'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,logical,missing,numeric'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,numeric,missing,matrix'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,logical,missing,matrix'x[i, j, ...] <- value## S4 replacement method for signature 'big.matrix,character,character,ANY'x[i, j] <- value## S4 replacement method for signature 'big.matrix,missing,character,ANY'x[j] <- value## S4 replacement method for signature 'big.matrix,character,missing,ANY'x[i] <- value## S4 replacement method for signature 'big.matrix,missing,missing,numeric'x[i, j] <- value## S4 replacement method for signature 'big.matrix,matrix,missing,numeric'x[i, j] <- value

Arguments

x

Abig.matrix object

i

Indices specifying the rows

j

Indices specifying the columns

drop

Logical indication if reduce to minimum dimensions

...

Additional arguments

value

typically an array-like R object of similar class


big.matrix size

Description

Returns the size of the created matrix in bytes

Usage

GetMatrixSize(bigMat)

Arguments

bigMat

abig.matrix object


Create a “big.matrix” from a matrix or vector.

Description

Create abig.matrix from amatrix orvectorordata.frame;avector will result in abig.matrix with one column.A data frame will have character vectors converted to factors, and thenall factors converted to numeric factor levels. All labels or charactervalues will be lost.

Methods

signature(x = "matrix")

...

signature(x = "vector")

...

signature(x = "data.frame")

...


Convert to base R matrix

Description

Extract values from abig.matrix objectand convert to a base R matrix object

Usage

## S4 method for signature 'big.matrix'as.matrix(x)

Arguments

x

A big.matrix object


The core "big.matrix" operations.

Description

Create abig.matrix (or check to see if an objectis abig.matrix, or create abig.matrix from amatrix, and so on). Thebig.matrix may be file-backed.

Usage

big.matrix(  nrow,  ncol,  type = options()$bigmemory.default.type,  init = NULL,  dimnames = NULL,  separated = FALSE,  backingfile = NULL,  backingpath = NULL,  descriptorfile = NULL,  binarydescriptor = FALSE,  shared = options()$bigmemory.default.shared)filebacked.big.matrix(  nrow,  ncol,  type = options()$bigmemory.default.type,  init = NULL,  dimnames = NULL,  separated = FALSE,  backingfile = NULL,  backingpath = NULL,  descriptorfile = NULL,  binarydescriptor = FALSE)as.big.matrix(  x,  type = NULL,  separated = FALSE,  backingfile = NULL,  backingpath = NULL,  descriptorfile = NULL,  binarydescriptor = FALSE,  shared = options()$bigmemory.default.shared)is.big.matrix(x)## S4 method for signature 'big.matrix'is.big.matrix(x)## S4 method for signature 'ANY'is.big.matrix(x)is.separated(x)## S4 method for signature 'big.matrix'is.separated(x)is.filebacked(x)## S4 method for signature 'big.matrix'is.filebacked(x)shared.name(x)## S4 method for signature 'big.matrix'shared.name(x)file.name(x)## S4 method for signature 'big.matrix'file.name(x)dir.name(x)## S4 method for signature 'big.matrix'dir.name(x)is.shared(x)## S4 method for signature 'big.matrix'is.shared(x)is.readonly(x)## S4 method for signature 'big.matrix'is.readonly(x)is.nil(address)

Arguments

nrow

number of rows.

ncol

number of columns.

type

the type of the atomic element(options()$bigmemory.default.type by default –"double" –but can be changed by the user to"integer","short", or"char").

init

a scalar value for initializing the matrix (NULL bydefault to avoid unnecessary time spent doing the initializing).

dimnames

a list of the row and column names; use with cautionfor large objects.

separated

use separated column organization of the data;see details.

backingfile

the root name for the file(s) for the cache ofx.

backingpath

the path to the directory containing the filebacking cache.

descriptorfile

the name of the file to hold the backingfiledescription, for subsequent use withattach.big.matrix;ifNULL, thebackingfile is used as the root part of thedescriptor file name. The descriptor file is placed in the same directoryas the backing files.

binarydescriptor

the flag to specify if the binary RDS formatshould be used for the backingfile description, for subsequent use withattach.big.matrix; ifNULL ofFALSE, thedput() file format is used.

shared

TRUE by default, and alwaysTRUE if thebig.matrix is file-backed. For a non-filebackedbig.matrix,shared=FALSE uses non-shared memory, which can be more stable forlarge (say, >50% of RAM) objects. Shared memory allocation can sometimesfail in such cases due to exhausted shared-memory resources in the system.

x

amatrix,vector, ordata.frame foras.big.matrix; if a vector, a one-column
big.matrix iscreated byas.big.matrix; if adata.frame, see details.For theis.* functions,x is likely abig.matrix.

address

anexternalptr, sois.nil(x@address) mightbe a sensible thing to want to check, but it's pretty obscure.

Details

Abig.matrix consists of an object inR that does nothingmore than point to the data structure implemented inC++. Theobject acts much like a traditionalR matrix, but helps protect the userfrom many inadvertent memory-consuming pitfalls of traditionalR matricesand data frames.

There are twobig.matrix types which managedata in different ways. A standard, sharedbig.matrix is constrainedto availableRAM, and may be shared across separateR processes.A file-backedbig.matrix may exceed availableRAM byusing hard drive space, and may also be shared across processes. Theatomic types of these matrices may bedouble,integer,short, orchar (8, 4, 2, and 1 bytes, respectively).

Ifx is abig.matrix, thenx[1:5,] is returned as an Rmatrix containing the first five rows ofx. Ifx is oftypedouble, then the result will benumeric; otherwise, theresult will be anintegerR matrix. The expressionx alonewill display information about theR object (e.g. the external pointer)rather than evaluating the matrix itself (the user should tryx[,]with extreme caution, recognizing that a hugeRmatrix willbe created).

Ifx has a huge number of rows and/or columns, then the use ofrownames and/orcolnames will be extremely memory-intensiveand should be avoided. Ifx has a huge number of columns andseparated=TRUE is used (this isn't typically recommended),the user might want to store the transpose as there is overhead of apointer for each column in the matrix. Ifseparated isTRUE,then the memory is allocated into separate vectors for each column.Use this option with caution if you have a large number of columns, asshared-memory segments are limited by OS and hardware combinations. Ifseparated isFALSE, the matrix is stored in traditionalcolumn-major format. The functionis.separated() returns theseparation type of thebig.matrix.

When abig.matrix,x, is passed as an argumentto a function, it is essentially providing call-by-reference rather thancall-by-value behavior. If the function modifies any of the values ofx, the changes are not limited in scope to a local copy within thefunction. This introduces the possibility of side-effects, in contrast tostandardR behavior.

A file-backedbig.matrix may exceed availableRAM in sizeby using a file cache (or possibly multiple file caches, ifseparated=TRUE). This can incur a substantial performance penalty forsuch large matrices, but less of a penalty than most other approaches forhandling such large objects. A side-effect of creating a file-backed objectis not only the file-backing(s), but a descriptor file (in the samedirectory) that is needed for subsequent attachments (seeattach.big.matrix).

Note that we do not allow setting or changing thedimnames attributesby default; such changes would not be reflected in the descriptor objects orin shared memory. To override this, setoptions(bigmemory.allow.dimnames=TRUE).

It should also be noted that a user can create an “anonymous” file-backedbig.matrix by specifying "" as thefilebacking argument.In this case, the backing resides in the temporary directory and adescriptor file is not created. These should be used with caution sinceeven anonymous backings use disk space which could eventually fill thehard drive. Anonymous backings are removed either manually, by auser, or automatically, when the operating system deems it appropriate.

Finally, note thatas.big.matrix can coerce data frames. It doesthis by making any character columns into factors, and then making allfactors numeric before forming thebig.matrix. Level labels arenot preserved and must be managed by the user if desired.

Value

Abig.matrix is returned (forbig.matrix andfilebacked.big.matrix, and
as.big.matrix),andTRUE orFALSE foris.big.matrix and theother functions.

Author(s)

John W. Emerson and Michael J. Kanebigmemoryauthors@gmail.com

References

The Bigmemory Project:http://www.bigmemory.org/.

See Also

bigmemory, and perhaps the class documentation ofbig.matrix;attach.big.matrix anddescribe. Sister packagesbiganalytics,bigtabulate,synchronicity, andbigalgebra provide advanced functionality.

Examples

x <- big.matrix(10, 2, type='integer', init=-5)options(bigmemory.allow.dimnames=TRUE)colnames(x) <- c("alpha", "beta")is.big.matrix(x)dim(x)colnames(x)rownames(x)x[,]x[1:8,1] <- 11:18colnames(x) <- NULLx[,]# The following shared memory example is quite silly, as you wouldn't# likely do this in a single R session.  But if zdescription were# passed to another R session via SNOW, foreach, or even by a# simple file read/write, then the attach.big.matrix() within the# second R process would give access to the same object in memory.# Please see the package vignette for real examples.z <- big.matrix(3, 3, type='integer', init=3)z[,]dim(z)z[1,1] <- 2z[,]zdescription <- describe(z)zdescriptiony <- attach.big.matrix(zdescription)y[,]yzy[1,1] <- -100y[,]z[,]

Class "big.matrix"

Description

Thebig.matrix class is designed for matrices withelements of typedouble,integer,short, orchar.Abig.matrix acts much like a traditionalR matrix, but helps protectthe user from many inadvertent memory-consuming pitfalls of traditionalRmatrices and data frames. The objects are allocated to shared memory,and if file-backing is used they may exceed virtual memory in size. Sadly,32-bit operating system constraints – largely Windows and some MacOS versions–will be a limiting factor with file-backed matrices; 64-bit operatingsystems are recommended.

Objects from the Class

Unlike manyR objects, objects should not be created by calls of the formnew("big.matrix", ...). The functionsbig.matrix()andfilebacked.big.matrix() are intended for the user.

Slots

address:

Object of class"externalptr"points to the memory location of theC++ data structure.

Methods

As you would expect:

[<-

signature(x = "big.matrix", i = "ANY", j = "ANY"): ...

[<-

signature(x = "big.matrix", i = "ANY", j = "missing"): ...

[<-

signature(x = "big.matrix", i = "missing", j = "ANY"): ...

[<-

signature(x = "big.matrix", i = "missing", j = "missing"): ...

[<-

signature(x = "big.matrix", i = "matrix", j = "missing"): ...

[

signature(x = "big.matrix", i = "ANY", j = "ANY", drop = "missing"): ...

[

signature(x = "big.matrix", i = "ANY", j = "ANY", drop = "logical"): ...

[

signature(x = "big.matrix", i = "ANY", j = "missing", drop = "missing"): ...

[

signature(x = "big.matrix", i = "ANY", j = "missing", drop = "logical"): ...

[

signature(x = "big.matrix", i = "matrix", j = "missing", drop = "logical"): ...

[

signature(x = "big.matrix", i = "missing", j = "ANY", drop = "missing"): ...

[

signature(x = "big.matrix", i = "missing", j = "ANY", drop = "logical"): ...

[

signature(x = "big.matrix", i = "missing", j = "missing", drop = "missing"): ...

[

signature(x = "big.matrix", i = "missing", j = "missing", drop = "logical"): ...

The following are probably more interesting:

describe

signature(x = "big.matrix"): provide necessary andsufficient information for the sharing or re-attaching of the object.

dim

signature(x = "big.matrix"): returns the dimension of thebig.matrix.

length

signature(x = "big.matrix"): returns the product of thedimensions of thebig.matrix.

dimnames<-

signature(x = "big.matrix", value = "list"): setthe row and column names, prohibited by default (seebigmemoryto override).

dimnames

signature(x = "big.matrix"): get the row and columnnames.

head

signature(x = "big.matrix"): get the first 6 (orn) rows.

as.matrix

signature(x = "big.matrix"): coerce abig.matrix to amatrix.

is.big.matrix

signature(x = "big.matrix"): returnTRUEif it's abig.matrix.

is.filebacked

signature(x = "big.matrix"): returnTRUEif there is a file-backing.

is.separated

signature(x = "big.matrix"): returnTRUEif thebig.matrix is organized as a separated column vectors.

is.sub.big.matrix

signature(x = "big.matrix"): returnTRUE if this is a sub-matrix of abig.matrix.

ncol

signature(x = "big.matrix"): returns the number ofcolumns.

nrow

signature(x = "big.matrix"): returns the number of rows.

print

signature(x = "big.matrix"): a traditionalprint()is intentionally disabled, and returnshead(x) unlessoptions()$bm.print.warning==FALSE; in this case,print(x[,])is the result, which could be very big!

sub.big.matrix

signature(x = "big.matrix"): forcontiguous submatrices.

tail

signature(x = "big.matrix"): returns the last 6 (orn) rows.

typeof

signature(x = "big.matrix"): return the type of theatomic elements of thebig.matrix.

write.big.matrix

signature(bigMat = "big.matrix", fileName = "character"): produce an ASCII file from thebig.matrix.

apply

signature(x = "big.matrix"):apply() whereMARGIN may only be 1 or 2, but otherwise conforming to what youwould expect fromapply().

Author(s)

Michael J. Kane and John W. Emersonbigmemoryauthors@gmail.com

See Also

big.matrix

Examples

showClass("big.matrix")

Class "big.matrix.descriptor"

Description

An object of this class contains necessary and sufficient informationto “attach” a shared or filebackedbig.matrix.

Usage

## S4 method for signature 'character'attach.resource(obj, ...)## S4 method for signature 'big.matrix.descriptor'attach.resource(obj, ...)

Arguments

obj

The filename of the descriptor for a filebacked matrix,assumed to be in the directory specified

...

possiblypath which gives the path where the descriptorand/or filebacking can be found.

Objects from the Class

Objects should not be created by calls of the formnew("big.matrix.descriptor", ...),but should use thedescribe function.

Slots

description:

Object of class"list"; details omitted.

Extends

Class"descriptor", directly.

Methods

attach.resource

signature(obj = "big.matrix.descriptor"): ...

sub.big.matrix

signature(x = "big.matrix.descriptor"): ...

Note

We provideattach.resource for convenience, but expect most userswill preferattach.big.matrix.

Author(s)

John W. Emerson and Michael J. Kane

References

Other types of descriptors are defined in packagesynchronicity.

See Also

See alsoattach.big.matrix.

Examples

showClass("big.matrix.descriptor")

Produces a physical copy of a “big.matrix”

Description

This is needed to make a duplicate of abig.matrix, with the new copyoptionally filebacked.

Usage

deepcopy(  x,  cols = NULL,  rows = NULL,  y = NULL,  type = NULL,  separated = NULL,  backingfile = NULL,  backingpath = NULL,  descriptorfile = NULL,  binarydescriptor = FALSE,  shared = options()$bigmemory.default.shared)

Arguments

x

abig.matrix.

cols

possible subset of columns for the deepcopy; could be numeric,named, or logical.

rows

possible subset of rows for the deepcopy; could be numeric,named, or logical.

y

optional destination object (matrix orbig.matrix);if not specified, abig.matrix will be created.

type

preferably specified,"integer" for example.

separated

use separated column organization of the data instead ofcolumn-major organization; use with caution if the number of columns islarge.

backingfile

the root name for the file(s) for the cache ofx.

backingpath

the path to the directory containing the file-backingcache.

descriptorfile

we recommend specifying this for file-backing.

binarydescriptor

the flag to specify if the binary RDS format shouldbe used for the backingfile description, for subsequent use withattach.big.matrix; ifNULL ofFALSE, thedput() file format is used.

shared

TRUE by default, and alwaysTRUE if thebig.matrix is file-backed. For a non-filebackedbig.matrix,shared=FALSE uses non-shared memory, which can be more stable forlarge (say, >50\fail in such cases due to exhausted shared-memory resources in the system.

Details

This is needed to make a duplicate of abig.matrix, becausetraditional syntax would only copy the object (the pointer to thebig.matrix rather than thebig.matrix itself).It can also make a copy of only a subset of columns.

Value

abig.matrix.

See Also

big.matrix

Examples

x <- as.big.matrix(matrix(1:30, 10, 3))y <- deepcopy(x, -1)    # Don't include the first column.xyhead(x)head(y)

The basic “big.matrix” operations for sharing and re-attaching.

Description

Thedescribe function returns the information needed byattach.big.matrix to reference a shared or file-backedbig.matrix object.Theattach.big.matrix andattach.resource functions create anewbig.matrix object based on the descriptor information referencingpreviously allocated shared-memory or file-backed matrices.

Usage

## S4 method for signature 'big.matrix'describe(x)attach.big.matrix(obj, ...)

Arguments

x

abig.matrix object

obj

an object as returned bydescribe() or, optionally,the filename of the descriptor for a filebacked matrix, assumed to be inthe directory specified by thepath (if one is provided)

...

possiblypath which gives the path where the descriptorand/or filebacking can be found

Details

Thedescribe function returns a list of the information needed toattach to abig.matrix object.A descriptor file is automatically created when a new filebackedbig.matrix is created.

Value

describe returns a list of of the information needed to attach toabig.matrix object.

attach.big.matrix return a new instance of typebig.matrixcorresponding to a shared-memory or file-backedbig.matrix.

Author(s)

Michael J. Kane and John W. Emersonbigmemoryauthors@gmail.com

See Also

bigmemory,big.matrix, or the classdocumentationbig.matrix.

Examples

# The example is quite silly, as you wouldn't likely do this in a# single R session.  But if zdescription were passed to another R session# via SNOW, foreach, or even by a simple file read/write,# then the attach of the second R process would give access to the# same object in memory.  Please see the package vignette for real examples.z <- big.matrix(3, 3, type='integer', init=3)z[,]dim(z)z[1,1] <- 2z[,]zdescription <- describe(z)zdescriptiony <- attach.big.matrix(zdescription)y[,]yzzz <- attach.resource(zdescription)zz[1,1] <- -100y[,]z[,]

Dimensions of a big.matrix object

Description

Retrieve the dimensions of abig.matrix object

Usage

## S4 method for signature 'big.matrix'dim(x)

Arguments

x

Abig.matrix object


Dimnames of a big.matrix Object

Description

Retrieve or set the dimnames of an object

Usage

## S4 method for signature 'big.matrix'dimnames(x)## S4 replacement method for signature 'big.matrix,list'dimnames(x) <- value

Arguments

x

A big.matrix object

value

A possible value fordimnames(x)


Updating a big.matrix filebacking.

Description

For a file-backedbig.matrix object,flush() forcesany modified information to be written to the file-backing.

Usage

flush(con)## S4 method for signature 'big.matrix'flush(con)

Arguments

con

filebackedbig.matrix.

Details

This function flushes any modified data (inRAM) of a file-backedbig.matrix to disk. This may be useful forimproving performance in cases where allowing the operating system to decideon flushing creates a bottleneck (likely near the threshold of availableRAM).

Value

TRUE orFALSE (invisible), indicating whether or not the flush was successful.

Author(s)

John W. Emerson and Michael J. Kane

Examples

temp_dir = tempdir()if (!dir.exists(temp_dir)) dir.create(temp_dir)x <- big.matrix(nrow=3, ncol=3, backingfile='flushtest.bin',                descriptorfile='flushtest.desc', backingpath=temp_dir,                type='integer')x[1,1] <- 0flush(x)

Return First or Last Part of a big.matrix Object

Description

Returns the first or last parts of abig.matrixobject.

Usage

## S4 method for signature 'big.matrix'head(x, n = 6)## S4 method for signature 'big.matrix'tail(x, n = 6)

Arguments

x

A big.matrix object

n

A single integer for the number of rows to return


Check if Float

Description

Check to see if the elements of a big.matrix object are floats.

Usage

is.float(x)

Arguments

x

An object to be evaluated if float


Is Float?

Description

Check if R numeric value has float flag

Usage

## S4 method for signature 'numeric'is.float(x)

Arguments

x

A numeric value


Submatrix support

Description

This doesn't create a copy, it just provides a new version of the classwhich provides behavior for a contiguous submatrix of the big.matrix.Non-contiguous submatrices are not supported.

Usage

is.sub.big.matrix(x)## S4 method for signature 'big.matrix'is.sub.big.matrix(x)sub.big.matrix(  x,  firstRow = 1,  lastRow = NULL,  firstCol = 1,  lastCol = NULL,  backingpath = NULL)## S4 method for signature 'big.matrix'sub.big.matrix(  x,  firstRow = 1,  lastRow = NULL,  firstCol = 1,  lastCol = NULL,  backingpath = NULL)## S4 method for signature 'big.matrix.descriptor'sub.big.matrix(  x,  firstRow = 1,  lastRow = NULL,  firstCol = 1,  lastCol = NULL,  backingpath = NULL)

Arguments

x

A descriptor object

firstRow

the first row of the submatrix

lastRow

the last row of the submatrix if not NULL

firstCol

the first column of the submatrix

lastCol

of the submatrix if not NULL

backingpath

required path to the filebacked object, if applicable

Details

Thesub.big.matrix function allows a user to create abig.matrixobject that references a contiguous set of columns and rows of anotherbig.matrix object.

Theis.sub.big.matrix function returnsTRUE if the specifiedargument is asub.big.matrix object and returnFALSEotherwise.

Value

Abig.matrix which is actually a submatrix of a largerbig.matrix.It is not a physical copy. Only contiguous blocks may form a submatrix.

Author(s)

John W. Emerson and Michael J. Kane

See Also

big.matrix

Examples

x <- big.matrix(10, 5, init=0, type="double")x[,] <- 1:50y <- sub.big.matrix(x, 2, 9, 2, 3)y[,]y[1,1] <- -99x[,]rm(x)

Length of a big.matrix object

Description

Get the length of abig.matrix object

Usage

## S4 method for signature 'big.matrix'length(x)

Arguments

x

Abig.matrix object


Ordering and Permuting functions for⁠big.matrix'' and ⁠matrix” objects

Description

Themorder function returns a permutation of rowindices which can be used to rearrange an object according to the valuesin the specified columns (a multi-column ordering).Thempermute function actually reorders the rows of abig.matrix ormatrix based onan order vector or a desired ordering on a set of columns.

Usage

morder(x, cols, na.last = TRUE, decreasing = FALSE)morderCols(x, rows, na.last = TRUE, decreasing = FALSE)mpermute(x, order = NULL, cols = NULL, allow.duplicates = FALSE, ...)mpermuteCols(x, order = NULL, rows = NULL, allow.duplicates = FALSE, ...)

Arguments

x

Abig.matrix ormatrix object with numeric values.

cols

The columns ofx to get the ordering for or reorder on

na.last

for controlling the treatment ofNAs. IfTRUE, missing values in the data are put last; ifFALSE,they are put first; ifNA, they are removed.

decreasing

logical. Should the sort order be increasing ordecreasing?

rows

The rows ofx to get the ordering for or reorder on

order

A vector specifying the reordering of rows, i.e. theresult of a call toorder ormorder.

allow.duplicates

ffTRUE, allows a row to be duplicated inthe resultingbig.matrix ormatrix (i.e. in this case,order would not need to be a permutation of1:nrow(x)).

...

optional parameters to pass tomorder whencolsis specified instead of just usingorder.

Details

Themorder function behaves similar toorder,returning a permutation of1:nrow(x) which rearranges objectsaccording to the values in the specified columns. However,mordertakes abig.matrix or anRmatrix (with numeric type) anda set of columns (cols) with which to determine the ordering;morder does not incur the same memory overhead required byorder, and runs more quickly.

Thempermute function changes the row ordering of abig.matrixormatrix based on a vectororder or an ordering basedon a set of columns specified bycols. It should be noted thatthis function has side-effects, that isx is changed when thisfunction is called.

Value

morder returns an ordering vector.mpermute returns nothing but does change the contents ofx.This type of a side-effect is generally frowned upon inR, but we “break”the rules here to avoid memory overhead and improve performance.

Author(s)

Michael J. Kanebigmemoryauthors@gmail.com

See Also

order

Examples

m = matrix(as.double(as.matrix(iris)), nrow=nrow(iris))morder(m, 1)order(m[,1])m[order(m[,1]), 2]mpermute(m, cols=1)m[,2]

Expanded “which”-like functionality.

Description

Implementswhich-like functionality for abig.matrix,with additional options for efficient comparisons (executed inC++);also works for regular numeric matrices without the memory overhead.

Usage

mwhich(x, cols, vals, comps, op = "AND")

Arguments

x

abig.matrix (or a numeric matrix; see below).

cols

a vector of column indices or names.

vals

a list (one component for each ofcols) of vectors oflength 1 or 2; length 1 is used to test equality (or inequality), whilevectors of length 2 are used for checking values in the range (-InfandInf are allowed). If a scalar or vector of length 2 is providedinstead of a list, it will be replicatedlength(cols) times.

comps

a list of operators (one component for each ofcols),including'eq','neq','le','lt','ge'and'gt'. If a single operator, it will be replicatedlength(cols) times.

op

the comparison operator for combining the results of theindividual tests, either'AND' or'OR'.

Details

To improve performance and avoid the creation of massive temporary vectorsinR when doing comparisons,mwhich() efficiently executescolumn-by-column comparisons of values to the specified values or ranges,and then returns the row indices satisfying the comparison specified by theop operator. More advanced comparisons are then possible(and memory-efficient) inR by doing set operations (unionandintersect, for example) on the results of multiplemwhich() calls.

Note thatNA is a valid argument in conjunction with'eq' or'neq', replacing traditionalis.na() calls.And both-Inf andInf can be used for one-sided inequalities.

Ifmwhich() is used with a regular numericRmatrix, weaccess the data directly and thus incur no memory overhead. Interesteddevelopers might want to look at our code for this case, which uses a handypointer trick (accessor) inC++.

Value

a vector of row indices satisfying the criteria.

Author(s)

John W. Emersonbigmemoryauthors@gmail.com

See Also

big.matrix,which

Examples

x <- as.big.matrix(matrix(1:30, 10, 3))options(bigmemory.allow.dimnames=TRUE)colnames(x) <- c("A", "B", "C")x[,]x[mwhich(x, 1:2, list(c(2,3), c(11,17)),         list(c('ge','le'), c('gt', 'lt')), 'OR'),]x[mwhich(x, c("A","B"), list(c(2,3), c(11,17)),          list(c('ge','le'), c('gt', 'lt')), 'AND'),]# These should produce the same answer with a regular matrix:y <- matrix(1:30, 10, 3)y[mwhich(y, 1:2, list(c(2,3), c(11,17)),         list(c('ge','le'), c('gt', 'lt')), 'OR'),]y[mwhich(y, -3, list(c(2,3), c(11,17)),         list(c('ge','le'), c('gt', 'lt')), 'AND'),]x[1,1] <- NAmwhich(x, 1:2, NA, 'eq', 'OR')mwhich(x, 1:2, NA, 'neq', 'AND')# Column 1 equal to 4 and/or column 2 less than or equal to 16:mwhich(x, 1:2, list(4, 16), list('eq', 'le'), 'OR')mwhich(x, 1:2, list(4, 16), list('eq', 'le'), 'AND')# Column 2 less than or equal to 15:mwhich(x, 2, 15, 'le')# No NAs in either column, and column 2 strictly less than 15:mwhich(x, c(1:2,2), list(NA, NA, 15), list('neq', 'neq', 'lt'), 'AND')x <- big.matrix(4, 2, init=1, type="double")x[1,1] <- Infmwhich(x, 1, Inf, 'eq')mwhich(x, 1, 1, 'gt')mwhich(x, 1, 1, 'le')

Expanded “which”-like functionality.

Description

Implementswhich-like functionality for abig.matrix, with additional options for efficient comparisons(executed inC++); also works for regular numeric matrices withoutthe memory overhead.test

Methods

signature(x = "big.matrix=", cols= "ANY", vals = "ANY",", " comps = "ANY", op = "character")

...

signature(x = "big.matrix", cols = "ANY", vals ="ANY",", " comps = "ANY", op = "missing")

...

signature(x = "matrix", cols = "ANY", vals = "ANY",", "comps = "ANY", op = "character")

...

signature(x = "matrix", cols = "ANY", vals = "ANY","," comps = "ANY", op = "missing")

...

See Also

big.matrix,which,mwhich


The Number of Rows/Columns of a big.matrix

Description

nrow andncol return the number ofrows or columns present in abig.matrix object.

Usage

## S4 method for signature 'big.matrix'ncol(x)## S4 method for signature 'big.matrix'nrow(x)

Arguments

x

A big.matrix object

Value

An integer of length 1


Print Values

Description

print will print out the elements withinabig.matrix object.

Usage

## S4 method for signature 'big.matrix'print(x)

Arguments

x

Abig.matrix object

Note

By default, this will only return thehead of a big.matrixto prevent console overflow. If you turn off the bigmemory.print.warningoption then it will convert to a base R matrix and print all elements.


The Type of a big.matrix Object

Description

typeof returns the storage type of abig.matrix object

Usage

## S4 method for signature 'big.matrix'typeof(x)

Arguments

x

Abig.matrix object


File interface for a “big.matrix”

Description

Create abig.matrix by reading from asuitably-formatted ASCII file, orwrite the contents of abig.matrix to a file.

Usage

write.big.matrix(x, filename, row.names = FALSE, col.names = FALSE, sep = ",")## S4 method for signature 'big.matrix,character'write.big.matrix(x, filename, row.names = FALSE, col.names = FALSE, sep = ",")read.big.matrix(  filename,  sep = ",",  header = FALSE,  col.names = NULL,  row.names = NULL,  has.row.names = FALSE,  ignore.row.names = FALSE,  type = NA,  skip = 0,  separated = FALSE,  backingfile = NULL,  backingpath = NULL,  descriptorfile = NULL,  binarydescriptor = FALSE,  extraCols = NULL,  shared = options()$bigmemory.default.shared)## S4 method for signature 'character'read.big.matrix(  filename,  sep = ",",  header = FALSE,  col.names = NULL,  row.names = NULL,  has.row.names = FALSE,  ignore.row.names = FALSE,  type = NA,  skip = 0,  separated = FALSE,  backingfile = NULL,  backingpath = NULL,  descriptorfile = NULL,  binarydescriptor = FALSE,  extraCols = NULL,  shared = options()$bigmemory.default.shared)

Arguments

x

abig.matrix.

filename

the name of an input/output file.

row.names

a vector of names, use them even if row names appear toexist in the file.

col.names

a vector of names, use them even if column names existin the file.

sep

a field delimiter.

header

ifTRUE, the first line (after a possible skip)should contain column names.

has.row.names

ifTRUE, then the first column contains rownames.

ignore.row.names

ifTRUE whenhas.row.names==TRUE,the row names will be ignored.

type

preferably specified,"integer" for example.

skip

number of lines to skip at the head of the file.

separated

use separated column organization of the data instead ofcolumn-major organization.

backingfile

the root name for the file(s) for the cache ofx.

backingpath

the path to the directory containing the file backingcache.

descriptorfile

the file to be used for the description of thefilebacked matrix.

binarydescriptor

the flag to specify if the binary RDS format shouldbe used for the backingfile description, for subsequent use withattach.big.matrix; ifNULL ofFALSE, thedput() file format is used.

extraCols

the optional number of extra columns to be appended to thematrix for future use.

shared

ifTRUE, the resultingbig.matrix can be sharedacross processes.

Details

Files must contain only one atomic type(allinteger, for example). You, the user, should know whetheryour file has row and/or column names, and various combinations of optionsshould be helpful in obtaining the desired behavior.

When reading from a file, iftype is not specified we try tomake a reasonable guess for you withoutmaking any guarantees at this point.Unless you have really large integer values, we recommendyou consider"short". If you have something that is essentiallycategorical, you might even be able use"char", with huge memorysavings for large data sets.

Any non-numeric entry will be ignored and replaced withNA,so reading something that traditionally would be adata.framewon't cause an error. A warning is issued.

Wishlist: we'd like to provide an option to ignore specified columns whiledoing reads.Or perhaps to specify columns targeted for factor or character conversionto numeric values. Would you use such features? Email us and let us know!

Value

abig.matrix object is returned byread.big.matrix,whilewrite.big.matrix creates an output file (a path could be partoffilename).

Author(s)

John W. Emerson and Michael J. Kanebigmemoryauthors@gmail.com

See Also

big.matrix

Examples

# Without specifying the type, this big.matrix x will hold integers.x <- as.big.matrix(matrix(1:10, 5, 2))x[2,2] <- NAx[,]temp_dir = tempdir()if (!dir.exists(temp_dir)) dir.create(temp_dir)write.big.matrix(x, file.path(temp_dir, "foo.txt"))# Just for fun, I'll read it back in as character (1-byte integers):y <- read.big.matrix(file.path(temp_dir, "foo.txt"), type="char")y[,]# Other examples:w <- as.big.matrix(matrix(1:10, 5, 2), type='double')w[1,2] <- NAw[2,2] <- -Infw[3,2] <- Infw[4,2] <- NaNw[,]write.big.matrix(w, file.path(temp_dir, "bar.txt"))w <- read.big.matrix(file.path(temp_dir, "bar.txt"), type="double")w[,]w <- read.big.matrix(file.path(temp_dir, "bar.txt"), type="short")w[,]# Another example using row names (which we don't like).x <- as.big.matrix(as.matrix(iris), type='double')rownames(x) <- as.character(1:nrow(x))head(x)write.big.matrix(x, file.path(temp_dir, 'IrisData.txt'), col.names=TRUE,                  row.names=TRUE)y <- read.big.matrix(file.path(temp_dir, "IrisData.txt"), header=TRUE,                      has.row.names=TRUE)head(y)# The following would fail with a dimension mismatch:if (FALSE) y <- read.big.matrix(file.path(temp_dir, "IrisData.txt"),                                 header=TRUE)

[8]ページ先頭

©2009-2025 Movatter.jp