Movatterモバイル変換

Type:

Package

Title:

Fit Varying Coefficient Models with Bayesian Additive RegressionTrees

Version:

1.2.4

Date:

2025-12-03

Description:

Fits linear varying coefficient (VC) models, which assert a linear relationship between an outcome and several covariates but allow that relationship (i.e., the coefficients or slopes in the linear regression) to change as functions of additional variables known as effect modifiers, by approximating the coefficient functions with Bayesian Additive Regression Trees. Implements a Metropolis-within-Gibbs sampler to simulate draws from the posterior over coefficient function evaluations. VC models with independent observations or repeated observations can be fit. For more details see Deshpande et al. (2024) <doi:10.1214/24-BA1470>.

License:

GPL (≥ 3)

LinkingTo:

Rcpp, RcppArmadillo

Imports:

Rcpp, MASS

URL:

https://github.com/skdeshpande91/VCBART

NeedsCompilation:

yes

Packaged:

2025-12-03 14:19:58 UTC; sameer

Author:

Sameer K. Deshpande

[aut, cre], Ray Bai

[aut], Cecilia Balocchi

[aut], Jennifer Starling [aut], Jordan Weiss [aut]

Maintainer:

Sameer K. Deshpande <sameer.deshpande@wisc.edu>

Repository:

CRAN

Date/Publication:

2025-12-09 16:00:02 UTC

Fit a VCBART model with compound symmetry error structure

Description

Fit a varying coefficient model to panel data. Assumes a compound symmetry error structure in which the residual errors for a given subject are equally correlated. This is equivalent to assuming that there is a normally distributed random effect per subject.

Usage

VCBART_cs(Y_train,subj_id_train, ni_train,X_train,          Z_cont_train = matrix(0, nrow = 1, ncol = 1),          Z_cat_train = matrix(0L, nrow = 1, ncol = 1),          X_test = matrix(0, nrow = 1, ncol = 1),          Z_cont_test = matrix(0, nrow = 1, ncol = 1),          Z_cat_test = matrix(0, nrow = 1, ncol = 1),          unif_cuts = rep(TRUE, times = ncol(Z_cont_train)),          cutpoints_list = NULL,          cat_levels_list = NULL,          edge_mat_list = NULL,          graph_split = rep(FALSE, times = ncol(Z_cat_train)),          sparse = TRUE,          rho = 0.9,          M = 50,          mu0 = NULL, tau = NULL, nu = NULL, lambda = NULL,          nd = 1000, burn = 1000, thin = 1,          save_samples = TRUE, save_trees = TRUE,          verbose = TRUE, print_every = floor( (nd*thin + burn)/10))

Arguments

Y_train

Vector of continous responses for training data

ni_train

Vector containing the number of observations per subject in the training data.

subj_id_train

Vector of lengthlength(Y_train) that records which subject contributed each observation. Subjects should be numbered sequentially from 1 tolength(ni_train).

X_train

Matrix of covariates for training observations. Do not include intercept as the first column.

Z_cont_train

Matrix of continuous modifiers for training data. Note, modifiers must be rescaled to lie in the interval [-1,1]. Default is a 1x1 matrix, which signals that there are no continuous modifiers in the training data.

Z_cat_train

Integer matrix of categorical modifiers for training data. Note categorical levels should be 0-indexed. That is, if a categorical modifier has 10 levels, the values should run from 0 to 9. Default is a 1x1 matrix, which signals that there are no categorical modifiers in the training data.

X_test

Matrix of covariate for testing observations. Default is a 1x1 matrix, which signals that testing data is not provided.

Z_cont_test

Matrix of continuous modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided.

Z_cat_test

Integer matrix of categorical modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided.

unif_cuts

Vector of logical values indicating whether cutpoints for each continuous modifier should be drawn from a continuous uniform distribution (TRUE) or a discrete set (FALSE) specified incutpoints_list. Default isTRUE for each variable inZ_cont_train

cutpoints_list

List of lengthncol(Z_cont_train) containing a vector of cutpoints for each continuous modifier. By default, this is set toNULL so that cutpoints are drawn uniformly from a continuous distribution.

cat_levels_list

List of lengthncol(Z_cat_train) containing a vector of levels for each categorical modifier. If the j-th categorical modifier contains L levels,cat_levels_list[[j]] should be the vector0:(L-1). Default isNULL, which corresponds to the case that no categorical modifiers are available.

edge_mat_list

List of adjacency matrices if any of the categorical modifiers are network-structured. Default isNULL, which corresponds to the case that there are no network-structured categorical modifiers.

graph_split

Vector of logicals indicating whether each categorical modifier is network-structured. Default isrep(FALSE, times = ncol(Z_cat_train)).

sparse

Logical, indicating whether or not to perform variable selection in each tree ensemble based on a sparse Dirichlet prior rather than uniform prior; see Linero 2018. Default isTRUE

rho

Initial auto-correlation parameter for compound symmetry error structure. Must be between 0 and 1. Default is 0.9.

M

Number of trees in each ensemble. Default is 50.

mu0

Prior mean for jumps/leaf parameters. Default is 0 for each beta function. If supplied, must be a vector of length1 + ncol(X_train).

tau

Prior standard deviation for jumps/leaf parameters. Default is1/sqrt(M) for each beta function. If supplied, must be a vector of length1 + ncol(X_train).

nu

Degrees of freedom for scaled-inverse chi-square prior on sigma^2. Default is 3.

lambda

Scale hyperparameter for scaled-inverse chi-square prior on sigma^2. Default places 90% prior probability that sigma is less thansd(Y_train).

nd

Number of posterior draws to return. Default is 1000.

burn

Number of MCMC iterations to be treated as "warmup" or "burn-in". Default is 1000.

thin

Number of post-warmup MCMC iteration by which to thin. Default is 1.

save_samples

Logical, indicating whether to return all posterior samples. Default isTRUE. IfFALSE, only posterior mean is returned.

save_trees

Logical, indicating whether or not to save a text-based representation of the tree samples. This representation can be passed topredict_flexBART to make predictions at a later time. Default isFALSE.

verbose

Logical, inciating whether to print progress to R console. Default isTRUE.

print_every

As the MCMC runs, a message is printed everyprint_every iterations. Default isfloor( (nd*thin + burn)/10) so that only 10 messages are printed.

Details

Givenp covariatesX_{1}, \ldots, X_{p} andr effect modifiersZ_{1}, \ldots, Z_{r}, the varying coefficient model asserts that

E[Y \vert X = x, Z = ] = \beta_0(z) + \beta_1(z) * x_1 + ... \beta_p(z) * X_p.

That is, for any r-vectorZ, the relationships betweenX andY is linear. However, the specific relationship is allowed to vary with respect tpZ.VCBART approximates the covariate effect functions\beta_0(Z), \ldots, \beta_p(Z) using ensembles of regression trees.This function assumes that the within-subject errors are equi-correlated (i.e. a compound symmetry error structure).

Value

A list containing

y_mean

Mean of the training observations (needed bypredict_VCBART)

y_sd

Standard deviation of the training observations (needed bypredict_VCBART)

x_mean

Vector of means of columns ofX_train, including the intercept (needed bypredict_VCBART).

x_sd

Vector of standard deviations ofX_trian, including the intercept (needed bypredict_VCBART).

yhat.train.mean

Vector containing posterior mean of evaluations of regression function E[y|x,z] on training data.

betahat.train.mean

Matrix withlength(Y_train) rows andncol(X_train)+1 columns containing the posterior mean of evaluations of each coefficient function evaluated on the training data. Each row corresponds to a training set observation and each colunn corresponds to a coefficient function. Note the first column is for the intercept function.

yhat.train

Matrix withnd rows andlength(Y_train) columns. Each row corresponds to a posterior sample of the regression function E[y|x,z] and each column corresponds to a training set observation. Only returned ifsave_samples == TRUE.

betahat.train

Array of dimension withnd xlength(Y_train) xncol(X_train)+1 containing posterior samples of evaluations of the coefficient functions. The first dimension corresponds to posterior samples/MCMC iterations, the second dimension corresponds to individual training set observations, and the third dimension corresponds to coefficient functions. Only returned ifsave_samples == TRUE.

yhat.test.mean

Vector containing posterior mean of evaluations of regression function E[y|x,z] on testing data.

betahat.test.mean

Matrix withnrow(X_test) rows andncol(X_testn)+1 columns containing the posterior mean of evaluations of each coefficient function evaluated on the training data. Each row corresponds to a training set observation and each colunn corresponds to a coefficient function. Note the first column is for the intercept function.

yhat.test

Matrix withnd rows andnrow(X_test) columns. Each row corresponds to a posterior sample of the regression function E[y|x,z] and each column corresponds to a testing set observation. Only returned ifsave_samples == TRUE.

betahat.test

Array of sizend xnrow(X_test) xncol(X_test)+1 containing posterior samples of evaluations of the coefficient functions. The first dimension corresponds to posterior samples/MCMC iterations, the second dimension corresponds to individual training set observations, and the third dimension corresponds to coefficient functions. Only returned ifsave_samples == TRUE.

sigma

Vector containing ALL samples of the residual standard deviation, including warmup.

rho

Vector containing ALL samples of the auto-correlation parameter rho, including warmup.

varcounts

Array of sizend x R xncol(X)+1 that counts the number of times a variable was used in a decision rule in each posterior sample of each ensemble. Here R is the total number of potential modifiers (i.e.R = ncol(Z_cont_train) + ncol(Z_cat_train)).

theta

Ifsparse=TRUE, an array of sizend x Rncol(X)+1 containing samples of the variable splitting probabilities.

trees

A list (of lengthnd) of lists (of lengthncol(X_train)+1) of character vectors (of lengthM) containing textual representations of the regression trees. The string for the s-th sample of the m-th tree in the j-th ensemble is contaiend intrees[[s]][[j]][m]. These strings are parsed bypredict_VCBART to reconstruct the C++ representations of the sampled trees.

Fit a VCBART model with independent error structure

Description

Fit a varying coefficient model to panel data. Assumes residual errors are independent within and between subjects.See Deshpande et al. (2024) for details about the model and MCMC sampler.

Usage

VCBART_ind(Y_train,subj_id_train, ni_train,X_train,           Z_cont_train = matrix(0, nrow = 1, ncol = 1),           Z_cat_train = matrix(0L, nrow = 1, ncol = 1),           X_test = matrix(0, nrow = 1, ncol = 1),           Z_cont_test = matrix(0, nrow = 1, ncol = 1),           Z_cat_test = matrix(0, nrow = 1, ncol = 1),           unif_cuts = rep(TRUE, times = ncol(Z_cont_train)),           cutpoints_list = NULL,           cat_levels_list = NULL,           edge_mat_list = NULL,           graph_split = rep(FALSE, times = ncol(Z_cat_train)),           sparse = TRUE,           M = 50,           mu0 = NULL, tau = NULL, nu = NULL, lambda = NULL,           nd = 1000, burn = 1000, thin = 1,           save_samples = TRUE, save_trees = TRUE,           verbose = TRUE, print_every = floor( (nd*thin + burn)/10))

Arguments

Y_train

Vector of continous responses for training data

ni_train

Vector containing the number of observations per subject in the training data.

subj_id_train

Vector of lengthlength(Y_train) that records which subject contributed each observation. Subjects should be numbered sequentially from 1 tolength(ni_train).

X_train

Matrix of covariates for training observations. Do not include intercept as the first column.

Z_cont_train

Z_cat_train

X_test

Matrix of covariate for testing observations. Default is a 1x1 matrix, which signals that testing data is not provided.

Z_cont_test

Matrix of continuous modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided.

Z_cat_test

Integer matrix of categorical modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided.

unif_cuts

cutpoints_list

cat_levels_list

edge_mat_list

List of adjacency matrices if any of the categorical modifiers are network-structured. Default isNULL, which corresponds to the case that there are no network-structured categorical modifiers.

graph_split

Vector of logicals indicating whether each categorical modifier is network-structured. Default isrep(FALSE, times = ncol(Z_cat_train)).

sparse

Logical, indicating whether or not to perform variable selection in each tree ensemble based on a sparse Dirichlet prior rather than uniform prior; see Linero 2018. Default isTRUE

M

Number of trees in each ensemble. Default is 50.

mu0

Prior mean for jumps/leaf parameters. Default is 0 for each beta function. If supplied, must be a vector of length1 + ncol(X_train).

tau

Prior standard deviation for jumps/leaf parameters. Default is1/sqrt(M) for each beta function. If supplied, must be a vector of length1 + ncol(X_train).

nu

Degrees of freedom for scaled-inverse chi-square prior on sigma^2. Default is 3.

lambda

Scale hyperparameter for scaled-inverse chi-square prior on sigma^2. Default places 90% prior probability that sigma is less thansd(Y_train).

nd

Number of posterior draws to return. Default is 1000.

burn

Number of MCMC iterations to be treated as "warmup" or "burn-in". Default is 1000.

thin

Number of post-warmup MCMC iteration by which to thin. Default is 1.

save_samples

Logical, indicating whether to return all posterior samples. Default isTRUE. IfFALSE, only posterior mean is returned.

save_trees

verbose

Logical, inciating whether to print progress to R console. Default isTRUE.

print_every

As the MCMC runs, a message is printed everyprint_every iterations. Default isfloor( (nd*thin + burn)/10) so that only 10 messages are printed.

Details

Givenp covariatesX_{1}, \ldots, X_{p} andr effect modifiersZ_{1}, \ldots, Z_{r}, the varying coefficient model asserts that

E[Y \vert X = x, Z = ] = \beta_0(z) + \beta_1(z) * x_1 + ... \beta_p(z) * X_p.

Value

A list containing

y_mean

Mean of the training observations (needed bypredict_VCBART)

y_sd

Standard deviation of the training observations (needed bypredict_VCBART)

x_mean

Vector of means of columns ofX_train, including the intercept (needed bypredict_VCBART).

x_sd

Vector of standard deviations ofX_trian, including the intercept (needed bypredict_VCBART).

yhat.train.mean

Vector containing posterior mean of evaluations of regression function E[y|x,z] on training data.

betahat.train.mean

yhat.train

betahat.train

yhat.test.mean

Vector containing posterior mean of evaluations of regression function E[y|x,z] on testing data.

betahat.test.mean

yhat.test

betahat.test

sigma

Vector containing ALL samples of the residual standard deviation, including warmup.

varcounts

theta

Ifsparse=TRUE, an array of sizend x Rncol(X)+1 containing samples of the variable splitting probabilities.

trees

References

Deshpande, S.K, Bai, R., Balocchi, C., Starling, J., and Weiss, J. (2024). VCBART: Bayesian trees for varying coefficients.Bayesian Analysis.doi:10.1214/24-BA1470

Examples

############# True beta functionsbeta0_true <- function(Z){  tmp_Z <- (Z+1)/2  return( 3 * tmp_Z[,1] +   (2 - 5 * (tmp_Z[,2] > 0.5)) * sin(pi * tmp_Z[,1]) -   2 * (tmp_Z[,2] > 0.5))}beta1_true <- function(Z){  tmp_Z <- (Z+1)/2  return(sin(2*tmp_Z[,1] + 0.5)/(4*tmp_Z[,1] + 1) + (2*tmp_Z[,1] - 0.5)^3)}beta2_true <- function(Z){  tmp_Z <- (Z+1)/2  return( (3 - 3*cos(6*pi*tmp_Z[,1]) * tmp_Z[,1]^2) * (tmp_Z[,1] > 0.6) -   (10 * sqrt(tmp_Z[,1])) * (tmp_Z[,1] < 0.25) )}################# Set problem dimensions###############set.seed(417)n_all <- 500ni_all <- rep(4, times = n_all) # 4 observations per subjectsubj_id_all <- rep(1:n_all, each = 4) # give every subject an id numberN_all <- sum(ni_all) # total number of observationsp <- 2 # number of covariatesR_cont <- 20 # number of continuous modifiersR_cat <- 0 # number of categorical modifiersR <- R_cont + R_cat################# Generate covariates & modifiers################X_all <-   matrix(rnorm(N_all*p, mean = 0, sd = 1), nrow = N_all, ncol = p)Z_cont_all <-   matrix(runif(N_all * R_cont, min = -1, max = 1), nrow = N_all, ncol = R_cont)################# Define true coefficient functions & noise level###############beta0_all <- beta0_true(Z_cont_all)beta1_all <- beta1_true(Z_cont_all)beta2_all <- beta2_true(Z_cont_all)beta_all <- cbind(beta0_all, beta1_all, beta2_all)sigma <- 0.1################# Generate response surface & outcomes###############mu_all <- beta0_all + X_all[,1] * beta1_all + X_all[,2] * beta2_allY_all <- mu_all + sigma * rnorm(n = N_all, mean = 0, sd = 1)## Token run to ensure installation worksfit <-   VCBART_ind(Y_train = Y_all,             subj_id_train = subj_id_all,             ni_train = ni_all,             X_train = X_all,             Z_cont_train = Z_cont_all,             nd = 5, burn = 5,             verbose = FALSE)             ## Longer example  fit <-     VCBART_ind(Y_train = Y_all,               subj_id_train = subj_id_all,               ni_train = ni_all,               X_train = X_all,               Z_cont_train = Z_cont_all,               verbose = FALSE)oldpar <- par(no.readonly = TRUE)par(mar = c(3,3,2,1), mgp = c(1.8, 0.5, 0), mfrow = c(1,2))plot(beta_all, fit$betahat.train.mean,      pch = 16, cex = 0.5,     xlab = "Actual", ylab = "Posterior Mean",     main = "Coefficients")abline(a = 0, b = 1, col = 'blue')plot(mu_all, fit$yhat.train.mean,     pch = 16, cex = 0.5,     xlab = "Actual", ylab = "Posterior Mean",     main = "Regression Function E[Y|X,Z]")abline(a = 0, b = 1, col = 'blue')par(oldpar)

Compute posterior predictive evaluates of covariate effect functions.

Description

Given an object returned byVCBART_ind orVCBART_cs and matrices of continuous and categorical modifiers, returns MCMC samples of the coefficient functions evaluated the provided points.

Usage

predict_betas(fit,              Z_cont = matrix(0, nrow = 1, ncol = 1),              Z_cat =  matrix(0, nrow = 1, ncol = 1),              verbose = TRUE)

Arguments

fit

A list returned byVCBART_ind orVCBART_cs

Z_cont

Matrix of continuous modifiers at which you wish to evaluate the covariate effect functions. Default is a 1x1 matrix, which signals that no continuous modifiers are required for these evaluations.

Z_cat

Integer matrix of categorical modifiers at which you wish to evaluate the covariate effect functions. Default is a 1x1 matrix, which signals that no continuous modifiers are required for these evaluations.

verbose

Boolean indicating whether the code should print its progress (TRUE). Default isTRUE.

Value

An array of size nd x N x (p+1) where nd is the total number of MCMC draws, N is the total number of points at which you are evaluating the covariate effect functions (i.e.nrow(Z_cont) ornrow(Z_cat)), and p is the number of covariates.Note that the intercept function is included as the first slice in the third dimension.

Compute posterior mean and 95% credible interval for evaluations of each coefficient function.

Description

Given an array of posterior samples of coefficient function evaluations, returns the posterior mean and 95% credible interval for each evaluation.

Usage

summarize_beta(beta_samples)

Arguments

beta_samples

An array, returned byVCBART_ind,VCBART_cs, orpredict_betas of posterior samples of coefficient function evaluations

Value

An array of size N x 3 x p where N is the number of inputs at which the coefficient functions are evaluated (i.e. N =dim(beta_samples)[2]) and p is the total number of coefficient functions including the intercept (i.e. p =dim(beta_samples)[3]). The j-th slice is an N x 3 matrix where the columns correspond to the posterior mean, 2.5% quantile, and 97.5% quantile of each evaluation of the (j-1)-th coefficient function. Note the effect of predictorX_j (i.e.,\beta_{j}(Z) is the (j+1)-st coefficient function.