| Type: | Package |
| Title: | Partial Principal Component Analysis of Partitioned Large SparseMatrices |
| Version: | 1.1 |
| Date: | 2024-10-22 |
| Maintainer: | Srika Raja <sri1919@iastate.edu> |
| Description: | Performs partial principal component analysis of a large sparse matrix. The matrix may be stored as a list of matrices to be concatenated (implicitly) horizontally. Useful application includes cases where the number of total nonzero entries exceed the capacity of 32 bit integers (e.g., with large Single Nucleotide Polymorphism data). |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.0.2), methods, RSpectra (≥ 0.16-1) |
| Imports: | Matrix (≥ 1.1-0), Rcpp (≥ 0.11.5) |
| LinkingTo: | Rcpp |
| NeedsCompilation: | yes |
| Suggests: | ggbiplot |
| RoxygenNote: | 7.3.2 |
| Packaged: | 2024-10-22 06:14:10 UTC; srika |
| Author: | Srika Raja [aut, cre], Somak Dutta [aut] |
| Repository: | CRAN |
| Date/Publication: | 2024-10-22 06:30:02 UTC |
Performs a principal component analysis on a large sparse matrices or a list of large sparsematrices and returns the results as an object compatible to class prcomp
Description
Performs a partial principal component analysis on a large sparse matrices or a list of large sparsematrices and returns the results as an object compatible to class prcomp. Uses RSpectra libraryto compute the largest eigenvalues.
Usage
pPCA(x, rank, retX = TRUE, scale. = TRUE, normalize = FALSE, sd.tol = 1e-05)Arguments
x | A matrix, sparse matrix (Matrix::dgCMatrix), or a list of these. When a listis supplied, the entries are concatenated horizontally (implicitly). See description. |
rank | An integer specifying the number of principal components to compute. |
retX | A logical value indicating whether the rotated variables (PC scores) should be returned. |
scale. | A logical value indicating whether the variables should be scaled to haveunit variance before the analysis takes place. |
normalize | A logical value indicating whether the principal component scores should be normalized. |
sd.tol | A positive number, warnings are printed if the standard deviation of anycolumn is less than this threshold. |
Details
When the input argument is a matrix (of class "matrix" or "dgCMatrix"), principal component analysisis performed to extract a few largest components. When a list of matrices is passed, the partial PCAis performed on the horizontally concatenated matrix, i.e., ifx = list(X1,X2,X3) then thepartial PCA is done on the matrix [X1 X2 X3], without concatenating the matrices explicitly. This can beuseful when the matrix is so high-dimensional that the total number of non-zero entriesexceed 2^31-1 (roughly 9.33e10), the capacity of a 32 bit integer. For example, in PCA with veryhigh-dimensional SNP data, the sparse matrices can be stored for each chromosome within the capacityof 32 bit integers.
Value
pPCA returns a list with class "pPCA" (compatible with "prcomp") containing the followingcomponents:
sdev | A vector of the singular values (standard deviations of the principal components). |
rotation | A matrix whose columns contain the eigenvectors (loadings). |
x | A matrix of the principal component scores, returned if retX is true. This isthe centred (and scaled if requested) data multiplied by the rotation matrix. |
center | column means. |
scale | column standard deviations, if scale. is true. Otherwise, FALSE. |
Note
The partial SVD is computed through the RSpectra package. All elements in the first row of the rotationmatrix are positive.
Author(s)
Srika Raja and Somak Dutta
References
Raja, S. and Dutta, S. (2024). Matrix-free partial PCA of partitioned genetic data.REU project 2024, Iowa State University.
Dai, F., Dutta, S., and, Maitra, R. (2020). A Matrix-Free Likelihood Method forExploratory Factor Analysis of High-Dimensional Gaussian Data. Journal of Computational andGraphical Statistics, 29(3), 675–680.
See Also
Examples
library(Matrix)set.seed(20190329)m <- rsparsematrix(50,100,density = 0.35)results <- pPCA(m, rank = 2)biplot(results)data <- list(rsparsematrix(nrow = 50,ncol = 10,density = 0.35), rsparsematrix(nrow = 50,ncol = 40,density = 0.35)) # Using a list of matricesresult <- pPCA(data, rank = 3)print(result)biplot(result)Print the Output of Principal Component Analysis (pPCA)
Description
Prints the output of thepPCA
Usage
## S3 method for class 'pPCA'print(x, digits = 3, ...)Arguments
x | An object of class |
digits | The number of decimal places to use in printing results such as variance explained and PC scores. Defaults to 3. |
... | Further arguments passed to |
Value
None.