Rusting Faster: Simulation using Rcpp

PaulNorthrop

2024-08-18

Source:vignettes/rust-c-using-rcpp-vignette.Rmd

rust-c-using-rcpp-vignette.Rmd

Therust package implements the multivariategeneralized ratio-of-uniforms method of simulating random variates froma\(d\)-dimensional continuousdistribution. The user specifies (the log of) a positive target function\(f(x)\) proportional to the densityfunction of the distribution. For an introduction torust seethe vignetteIntroducing rust.

This vignette describes a new feature ofrust: theoption for the user to provide a C++ function to evaluate the targetlog-density, rather than an R function. TheRcppEddelbuettel (2013) andRcppArmadillo(Eddelbuettel andSanderson 2014) packages are used to speed up simulation from thetarget density. The improvement results from faster function evaluationsand (in particular) from performing using C++ the looping in theratio-of-uniforms algorithm. The new functionru_rcpprequires the target log-density to be specified using (externalspointers to) C++ functions, whereas the existingrurequires input R functions. Otherwise, the functionality of these twofunctions is the same. There are also Rcpp-based versions of functionsfor setting Box-Cox transformation parameters:find_lambda_rcpp andfind_lambda_one_d_rcpp

In this vignette we describe in general terms the general setup ofthe Rcpp-based functions and use examples to illustrate their use. Formore information about these examples see the vignetteIntroducing rust

Providing a C++ function to`ru_rcpp`

The general way thatrust enables users to providetheir own C++ functions uses external pointers and is based on theRcpp Gallery articlePassinguser-supplied C++ functions by Dirk Eddelbuettel. For a detailedcase study of the general approach see theRcppDEpackage(Eddelbuettel 2016) vignette attheRcppDE page onCRAN.

The user writes a C++ function to calculate\(\log f(x)\). The current implementation inrust requires this function to have a particularstructure: it must take a constant reference to anRcpp::NumericVector, sayx, a constantreference to anRcpp::List, saypars, andreturn adouble precision scalar.x is theargument\(x\) of\(f(x)\).pars is a listcontaining the values of parameters whose values are not specifiedinside the function. This allows the user to change the values of anyparameters in the target density without editing the function. If thereare no such parameters then the user must still include the argumentpars in their function, even though the list provided tothe function when it is called will be empty.

A simple way for the user to provide their C++ functions to createthem in a file, sayuser_fns.cpp. Example content isprovided below. The full file is available on therustGithub page. The functions in this file are compiled and madeavailable to R, either using theRcpp::sourceCpp function(e.g. Rcpp::sourceCpp("user_fns.cpp")) or using RStudio’sSource button on the editor toolbar. The example content below alsoincludes the functioncreate_xptr, which creates anexternal pointer to a C++ function. SeePassinguser-supplied C++ functions. It is this external pointer that ispassed toru_rcpp to perform ratio-of-uniforms sampling. Ifthe user has written a C++ function, saynew_name, thenthey need to add tocreate_xptr two lines of code:

else if (fstr == "new_name")    return(Rcpp::XPtr<funcPtr>(new funcPtr(&new_name))) ;

to create an external pointer fornew_name usingcreate_xptr. The following excerpt from the exampleuser_fns.cpp file contains code for a standard normaldensity. Note that for this particular example we don’t needRcppArmadillo: we could replace#include <RcppArmadillo.h> with#include <Rcpp.h> and deleteusing namespace arma;. However, RcppArmadillo is used inthe themultivariate normal example below and will beuseful in many examples.

// [[Rcpp::depends(RcppArmadillo)]]#include <RcppArmadillo.h>using namespace arma;using namespace Rcpp;// [[Rcpp::interfaces(r, cpp)]]// User-supplied C++ functions for logf.// Note that currently the only interface available in rust is// double fun(const Rcpp::NumericVector& x, const Rcpp::List& pars).// However, as shown in the function logdmvnorm below RcppArmadillo// functions can be used inside the function.// Each function must be prefaced by the line: // [[Rcpp::export]]// One-dimensional standard normal.// [[Rcpp::export]]  double logdN01(const Rcpp::NumericVector& x, const Rcpp::List& pars) {    return (-pow(x[0], 2.0) / 2.0) ;}// A function to create external pointers for any of the functions above.  // See http://gallery.rcpp.org/articles/passing-cpp-function-pointers/  // If you write a new function above called new_name then add the following//// else if (fstr == "new_name")  //   return(Rcpp::XPtr<funcPtr>(new funcPtr(&new_name))) ;  // [[Rcpp::export]]  SEXP create_xptr(std::string fstr) {    typedef double (*funcPtr)(const Rcpp::NumericVector& x,                    const Rcpp::List& pars) ;    if (fstr == "logdN01")      return(Rcpp::XPtr<funcPtr>(new funcPtr(&logdN01))) ;    else      return(Rcpp::XPtr<funcPtr>(R_NilValue)) ;  }  // We could create the external pointers when this file is sourced using   // the embedded R code below and/or (re)create them using create_xptr() in // an R session or R package../*** Rptr_N01 <- create_xptr("logdN01")*/

Examples :`ru_rcpp`

All the examples in the documentation forru arereplicated in the documentation forru_rcpp. Here weconsider a subset of the examples from theIntroducing rust vignette, to illustratehow to provide user-supplied C++ functions toru_rcpp andto compare the performances ofru andru_rcpp.We use themicrobenchmark package(Mersmann 2015) to make the comparison.

library(rust)library(Rcpp)# Is microbenchmark available?got_microbenchmark<-requireNamespace("microbenchmark", quietly=TRUE)if(got_microbenchmark){library(microbenchmark)}# Set the size of the simulated samplen<-1000

It is assumed that the user has already compiled their C++ functionsand made them available to their R session, either using theRcpp::sourceCpp function(e.g. Rcpp::sourceCpp("user_fns.cpp")) or using RStudio’sSource button on the editor toolbar.

Standard normal density

We start with a simple example: the (1-dimensional) standard normaldensity, based on the C++ functionlogdN01 in the exampleuser_fns.cpp file above.

# Normal density ===================# Create a pointer to the logdN01 C++ function# (not necessary if this was created when the file of C++ functions was sourced)ptr_N01<-create_xptr("logdN01")# Use ru and ru_rcpp starting from the same random number seed and check# that the simulated values are the same.set.seed(47)x_old<-ru(logf=function(x)-x^2/2, d=1, n=n, init=0.1)head(x_old$sim_vals)#>            [,1]#> [1,]  0.7764728#> [2,]  0.5310434#> [3,] -0.1046049#> [4,]  1.2111509#> [5,]  1.1391379#> [6,]  0.5180914set.seed(47)x_new<-ru_rcpp(logf=ptr_N01, d=1, n=n, init=0.1)head(x_new$sim_vals)#>            [,1]#> [1,]  0.7764728#> [2,]  0.5310434#> [3,] -0.1046049#> [4,]  1.2111509#> [5,]  1.1391379#> [6,]  0.5180914# Compare performances of ru and ru_rcppif(got_microbenchmark){res<-microbenchmark(    old=ru(logf=function(x)-x^2/2, d=1, n=n, init=0.1),    new=ru_rcpp(logf=ptr_N01, d=1, n=n, init=0.1))print(res, signif=4)}#> Unit: milliseconds#>  expr    min     lq  mean median     uq    max neval#>   old 14.800 15.690 17.81  17.30 19.500 26.940   100#>   new  2.075  2.216  2.53   2.31  2.446  6.824   100

As we would hope,ru_rcpp is faster thanru. If we start from the same random number seed we get thesame simulated values fromru andru_rcpp.

Multivariate normal density

To execute this example we add the following function touser_fns.cpp

// d-dimensional normal with zero-mean and covariance matrix sigma.// [[Rcpp::export]]double logdmvnorm(const Rcpp::NumericVector& x, const Rcpp::List& pars) {  arma::mat sigma = as<arma::mat>(pars["sigma"]) ;  arma::vec y = Rcpp::as<arma::vec>(x) ;  double qform = arma::as_scalar(y.t() * arma::inv(sigma) * y) ;  return -qform / 2.0  ;}

and add

else if (fstr == "logdmvnorm")  return(Rcpp::XPtr<funcPtr>(new funcPtr(&logdmvnorm))) ;

to the functioncreate_xptr inuser_fns.cpp.

# Three-dimensional normal with positive association ----------------rho<-0.9covmat<-matrix(rho,3,3)+diag(1-rho,3)# R functionlog_dmvnorm<-function(x,mean=rep(0,d),sigma=diag(d)){x<-matrix(x, ncol=length(x))d<-ncol(x)-0.5*(x-mean)%*%solve(sigma)%*%t(x-mean)}# Create a pointer to the logdmvnorm C++ functionptr_mvn<-create_xptr("logdmvnorm")if(got_microbenchmark){res<-microbenchmark(    old=ru(logf=log_dmvnorm, sigma=covmat, d=3, n=n,             init=c(0,0,0)),    new=ru_rcpp(logf=ptr_mvn, sigma=covmat, d=3, n=n,                  init=c(0,0,0)))print(res, signif=4)}#> Unit: milliseconds#>  expr     min      lq   mean  median    uq    max neval#>   old 178.400 189.100 197.50 194.100 200.1 271.10   100#>   new   8.426   8.881  10.27   9.234  10.0  67.25   100

Again, the improvement in speed obtained using Rcpp is clear.

Log-normal density after Box-Cox transformation

In this example we use a log transform (Box-Cox parameter\(\lambda = 0\)) so that theratio-of-uniforms sampling is based on a normal distribution. The C++function to calculate the log-density of a lognormal distributionis:

// Lognormal(mu, sigma).// [[Rcpp::export]]double logdlnorm(const Rcpp::NumericVector& x, const Rcpp::List& pars) {  double mu = pars["mu"] ;  double sigma = pars["sigma"] ;  if (x[0] > 0)    return -log(x[0]) - pow(log(x[0]) - mu, 2.0) / (2.0 * pow(sigma, 2.0)) ;  else    return R_NegInf ;}

ptr_lnorm<-create_xptr("logdlnorm")if(got_microbenchmark){res<-microbenchmark(   old=ru(logf=dlnorm, log=TRUE, d=1, n=n, lower=0, init=0.1,            trans="BC", lambda=0),   new=ru_rcpp(logf=ptr_lnorm, mu=0, sigma=1, d=1, n=n,                 lower=0, init=0.1, trans="BC", lambda=0))print(res, signif=4)}#> Unit: milliseconds#>  expr    min     lq   mean median     uq   max neval#>   old 32.780 35.530 38.580 37.380 39.970 97.63   100#>   new  4.828  5.011  5.435  5.119  5.508 10.51   100

Generalized Pareto posterior density

The C++ function to calculate the log-posterior density is:

// Generalized Pareto posterior based on an MDI prior truncated to// shape parameter xi >= -1.// [[Rcpp::export]]double loggp(const Rcpp::NumericVector& x, const Rcpp::List& ss) {  Rcpp::NumericVector gpd_data = ss["gpd_data"] ;  int m = ss["m"] ;  double xm = ss["xm"] ;  double sum_gp = ss["sum_gp"] ;  if (x[0] <= 0 || x[1] <= -x[0] / xm)    return R_NegInf ;  double loglik ;  Rcpp::NumericVector sdat = gpd_data / x[0] ;  Rcpp::NumericVector zz = 1 + x[1] * sdat ;  if (std::abs(x[1]) > 1e-6) {    loglik = -m * log(x[0]) - (1.0 + 1.0 / x[1]) * sum(log(zz)) ;  } else {    double t1, t2, sdatj ;    double total = 0;    for(int j = 0; j < m; ++j) {      sdatj = sdat[j] ;      for(int i = 1; i < 5; ++i) {        t1 = pow(sdatj, i) ;        t2 = (i * sdatj - i - 1) ;        total += pow(-1.0, i) * t1 * t2 * pow(x[1], i) / i / (i + 1) ;      }    }    loglik = -m * log(x[0]) - sum_gp / x[0] - total ;  }  // MDI prior.  if (x[1] < -1)    return R_NegInf ;  double logprior = -log(x[0]) - x[1] - 1 ;  return (logprior + loglik) ;}

We simulate some data from a Generalized Pareto distribution,calculate summary statistics involved in the likelihood and calculate aninitial value in the search for the posterior mode.

set.seed(46)# Sample data from a GP(sigma, xi) distributiongpd_data<-rgpd(m=100, xi=-0.5, sigma=1)# Calculate summary statistics for use in the log-likelihoodss<-gpd_sum_stats(gpd_data)# Calculate an initial estimateinit<-c(mean(gpd_data),0)

Again we see thatru_rcpp is substantially faster thanru.

# Arguments for ru_rcppptr_gp<-create_xptr("loggp")for_ru_rcpp<-c(list(logf=ptr_gp, init=init, d=2, n=n,                 lower=c(0,-Inf)),ss)if(got_microbenchmark){res<-microbenchmark(   old=ru(logf=gpd_logpost, ss=ss, d=2, n=n, init=init,            lower=c(0,-Inf)),   new=do.call(ru_rcpp,for_ru_rcpp))print(res, signif=4)}#> Unit: milliseconds#>  expr   min    lq  mean median    uq   max neval#>   old 65.66 73.32 76.26  75.86 78.71 86.12   100#>   new 15.14 16.53 17.83  17.01 18.62 25.85   100

Examples :`find_lambda_one_d_rcpp` and`find_lambda_rcpp`

We repeat two examples from theIntroducing rust vignette.

Gamma density: example for`find_lambda_one_d_rcpp`

We make use of theRcppsugar functiondgamma.

// Gamma(alpha, 1).// [[Rcpp::export]]double logdgamma(const Rcpp::NumericVector& x, const Rcpp::List& pars) {  double shp = pars["alpha"] ;  return Rcpp::dgamma(x, shp, 1.0, 1)[0] ;}

alpha<-1max_phi<-qgamma(0.999, shape=alpha)ptr_gam<-create_xptr("logdgamma")lambda<-find_lambda_one_d_rcpp(logf=ptr_gam, alpha=alpha,                                 max_phi=max_phi)lambda#> $lambda#> [1] 0.2727968#>#> $gm#> [1] 0.5689906#>#> $init_psi#> [1] -0.2016904#>#> $sd_psi#> [1] 0.7835109#>#> $user_args#> list()

Generalized Pareto posterior density: example for`find_lambda_rcpp`

In this example we supply an external pointer to a C++ functionphi_to_theta that ensures that both parameters of the modelare strictly positive, a requirement for the Box-Cox transformation tobe applicable. The functionphi_to_theta must have the samestructure as the function used to calculate\(\log f\). SeeProvidinga C++ function toru_rcpp for details. See theIntroducing rust vignette for the formof the transformation.

temp<-do.call(gpd_init,ss)min_phi<-pmax(0,temp$init_phi-2*temp$se_phi)max_phi<-pmax(0,temp$init_phi+2*temp$se_phi)# Create external pointersptr_gp<-create_xptr("loggp")ptr_phi_to_theta_gp<-create_phi_to_theta_xptr("gp")# Note: log_j is set to zero by default inside find_lambda_rcpp()lambda<-find_lambda_rcpp(logf=ptr_gp, ss=ss, d=2, min_phi=min_phi,                           max_phi=max_phi, user_args=list(xm=ss$xm),                           phi_to_theta=ptr_phi_to_theta_gp)lambda#> $lambda#> [1] 0.1624226 0.3678549#>#> $gm#> [1] 1.10542493 0.03225836#>#> $init_psi#> [1]  0.1054021 -0.2184344#>#> $sd_psi#>       Var1       Var2#> 0.12670792 0.02477219#>#> $phi_to_theta#> <pointer: 0x000001dd2adf6810>#>#> $log_j#> <pointer: 0x000001dd2adf66e0>#>#> $user_args#> $user_args$xm#> [1] 1.846219

References

Eddelbuettel, D. 2013.SeamlessR andC++Integration with Rcpp. New York: Springer.

Eddelbuettel, D. 2016.RcppDE: Global Optimization by DifferentialEvolution inC++.https://CRAN.R-project.org/package=RcppDE.

Eddelbuettel, D., and R. Francois. 2011.“Rcpp: SeamlessR andC++ Integration.”Journal ofStatistical Software 40 (8): 1–18. doi:10.18637/jss.v040.i08.

Eddelbuettel, D., and C. Sanderson. 2014.“RcppArmadillo:AcceleratingR with High-PerformanceC++Linear Algebra.”Computational Statistics and DataAnalysis 71: 1054–1063. doi:10.1016/j.csda.2013.02.005.

Mersmann, Olaf. 2015.microbenchmark:Accurate Timing Functions.https://CRAN.R-project.org/package=microbenchmark.

Movatterモバイル変換