- Notifications
You must be signed in to change notification settings - Fork55
Description
During my last dive in sparse matrix code, I came across multiple uses of.push_back(...) when building some vectors, e.g. around RcppArmadilloAs.h:178. It is used notably in the code converting dgt matrices to default dgc type (may be somewhere else too, I didn't check). Such vector building method can require big number of memory operations that are known to be relatively slow. Yet there is already native armadillo code for batch building dgc matrices from triplets ijv. My test shows that the native method is much faster on big matrices, e.g. on 3000x4000 randomly sparse matrix with about 8000 non zero entries, actual code converts such matrix in 4.5 ms while native batch construction takes only 0.9 ms on my Intel Xeon E5-2609 v2 @ 2.50GHz.
I think we could rely on native batch construction not only for the speed but also simply for avoiding code duplication.
Here is my test code:
// [[Rcpp::depends(RcppArmadillo)]]#include<RcppArmadillo.h>usingnamespacearma;// [[Rcpp::export]]arma::sp_matasSpMat(SEXP S) {return Rcpp::as<arma::sp_mat>(S);}// [[Rcpp::export]]arma::sp_matdgt2spmatNative(Rcpp::S4 mat) {if (Rcpp::as<std::string>(mat.slot("class")) !="dgTMatrix")Rcpp::stop("matrix must be of dgTMatrix type"); urowvec ti = mat.slot("i"); urowvec tj = mat.slot("j"); vec tx = mat.slot("x"); Rcpp::IntegerVector dims = mat.slot("Dim");int nrow = dims[0];int ncol = dims[1]; umat loc=join_cols(ti, tj);//sp_mat(add_values, locations, values, n_rows, n_cols, sort_locations = true, check_for_zeros = true)returnsp_mat(true, loc, tx, nrow, ncol,true,false);}/*** Rlibrary(Matrix)library(Rcpp)library(microbenchmark)nrow=3000ncol=4000nnz=2*ncoli=sample(1:nrow, nnz, replace=TRUE)j=sample(1:ncol, nnz, replace=TRUE)set.seed(7)x=rnorm(nnz)x[1]=0. # see if it's kept in conversionsdgt=sparseMatrix(i = i, j = j, x=x, dims=c(nrow, ncol), giveCsparse=FALSE)(test=microbenchmark(dgc <- as(dgt, "dgCMatrix"), adgc <- asSpMat(dgt), ndgc <- dgt2spmatNative(dgt)))stopifnot(range(dgc-adgc)==c(0,0))stopifnot(range(dgc-ndgc)==c(0,0))*/
If you decide to go in this direction, I can prepare a patch request.