Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Spatial Bayesian Factor Analysis
Version:1.4.0
Date:2025-09-30
Description:Implements a spatial Bayesian non-parametric factor analysis model with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). Spatial correlation is introduced in the columns of the factor loadings matrix using a Bayesian non-parametric prior, the probit stick-breaking process. Areal spatial data is modeled using a conditional autoregressive (CAR) prior and point-referenced spatial data is treated using a Gaussian process. The response variable can be modeled as Gaussian, probit, Tobit, or Binomial (using Polya-Gamma augmentation). Temporal correlation is introduced for the latent factors through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Bayesian Non-Parametric Factor Analysis for Longitudinal Spatial Surfaces", by Berchuck et al (2019), <doi:10.1214/20-BA1253> in Bayesian Analysis.
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
Encoding:UTF-8
LazyData:true
LazyDataCompression:xz
RoxygenNote:7.3.2
NeedsCompilation:yes
Depends:R (≥ 3.0.2)
Imports:graphics, grDevices, msm (≥ 1.0.0), mvtnorm (≥ 1.0-0),pgdraw (≥ 1.0), Rcpp (≥ 0.12.9), stats, utils
Suggests:coda, classInt, knitr, rmarkdown, womblR (≥ 1.0.3)
LinkingTo:Rcpp, RcppArmadillo (≥ 0.7.500.0.0)
VignetteBuilder:knitr
Language:en-US
Author:Samuel I. Berchuck [aut, cre]
Maintainer:Samuel I. Berchuck <sib2@duke.edu>
Packaged:2025-09-30 12:39:45 UTC; sib2
Repository:CRAN
Date/Publication:2025-09-30 13:00:02 UTC

spBFA

Description

spBFA is a package for Bayesian spatial factor analysis. A corresponding manuscript is forthcoming.

Author(s)

Samuel I. Berchucksib2@duke.edu


Spatial factor analysis using a Bayesian hierarchical model.

Description

bfa_sp is a Markov chain Monte Carlo (MCMC) sampler for a Bayesian spatial factor analysis model. The spatial component is introduced using a Probit stick-breaking process prior on the factor loadings. The model is implemented using a Bayesian hierarchical framework.

Usage

bfa_sp(  formula,  data,  dist,  time,  K,  L = Inf,  trials = NULL,  family = "normal",  temporal.structure = "exponential",  spatial.structure = "discrete",  starting = NULL,  hypers = NULL,  tuning = NULL,  mcmc = NULL,  seed = 54,  gamma.shrinkage = TRUE,  include.space = TRUE,  clustering = TRUE)

Arguments

formula

Aformula object, corresponding to the spatial factor analysis model. The response must be on the left of a~ operator, and the terms on the right must indicate the covariates to be included in the fixed effects. If no covariates are desired a zero should be used,~ 0.

data

A requireddata.frame containing the variables in the model. The data frame must containM x O x Nu rows.Here,M represents the number of spatial locations,O the number of different observation typesandNu the number of temporal visits. The observations must be first beordered spatially, second by observation type and then temporally. This means that the firstM x O observations come from the first time point andthe firstM observations come the first spatial observation type.

dist

AM x M dimensional distance matrix. For adiscrete spatial process the matrix contains binary adjacencies that dictate thespatial neighborhood structure and forcontinuous spatial processes the matrix should be a continuous distance matrix (e.g., Euclidean).

time

ANu dimensional vector containing the observed time pointsin increasing order.

K

A scalar that indicates the dimension (i.e., quantity) of latent factors.

L

The number of latent clusters. If finite, a scalar indicating the number of clusters for each column of the factor loadings matrix. By defaultL is set atInfso that the Probit stick-breaking process becomes an infinite mixture model.

trials

A variable indata that contains the number of trials for each of the binomial observations. If there is no count data,trials should be left missing.

family

Character string indicating the distribution of the observed data. Optionsinclude:"normal","probit","tobit", and"binomial".family must have eitherO or1 dimension(s) (the one populates the rest). Any combination of likelihoods can be used.

temporal.structure

Character string indicating the temporal kernel. Options include:"exponential" and"ar1".

spatial.structure

Character string indicating the type of spatial process. Options include:"continuous" (i.e., Gaussian process with exponential kernel) and"discrete" (i.e., proper CAR).

starting

EitherNULL or alist containing starting valuesto be specified for the MCMC sampler. IfNULL is not chosen then none, some or allof the starting values may be specified.

WhenNULL is chosen then default starting values are automatically generated.Otherwise alist must be provided with namesBeta,Delta,Sigma2,Kappa,Rho,Upsilon orPsi containing appropriate objects.Beta (orDelta) must either be aP (orK) dimensionalvector or a scalar (the scalar populates the entire vector).Sigma2 must be either aM x (O - C) matrix or a scalar.Kappa must be aO x O dimensional matrix,Rho a scalar,Upsilon aK x K matrix, andPsi a scalar.

hypers

EitherNULL or alist containing hyperparameter valuesto be specified for the MCMC sampler. IfNULL is not chosen then none, some or allof the hyperparameter values may be specified.

WhenNULL is chosen then default hyperparameter values are automaticallygenerated. These default hyperparameters are described in detail in (Berchuck et al.).Otherwise alist must be provided with namesBeta,Delta,Sigma2,Kappa,Rho,Upsilon orPsi containing further hyperparameter information. These objects are themselveslists and may be constructed as follows.

Beta is alist with two objects,MuBeta andSigmaBeta. These values represent the prior mean and variance parameters for the multivariate normal prior.

Delta is alist with two objects,A1 andA2. These values represent the prior shape parameters for the multiplicative Gamma shrinkage prior.

Sigma2 is alist with two objects,A andB. These values represent the shape and scale for the variance parameters.

Kappa is alist with two objects,SmallUpsilon andBigTheta.SmallUpsilon represents the degrees of freedom parameter for theinverse-Wishart hyperprior and must be a real number scalar, whileBigTheta representsthe scale matrix and must be aO x O dimensional positive definite matrix.

Rho is alist with two objects,ARho andBRho.ARhorepresents the lower bound for the uniform hyperprior, whileBRho representsthe upper bound. The bounds must be specified carefully. This is only specified for continuous spatial processes.

Upsilon is alist with two objects,Zeta andOmega.Zeta represents the degrees of freedom parameter for theinverse-Wishart hyperprior and must be a real number scalar, whileOmega representsthe scale matrix and must be aK x K dimensional positive definite matrix.

Psi is alist with two objects, dependent on if the temporal kernel isexponential orar1.Forexponential, the two objects areAPsi andBPsi.APsirepresents the lower bound for the uniform hyperprior, whileBPsi representsthe upper bound. The bounds must be specified carefully. Forar1, the two objects areBeta andGamma, which are the two shape parameters of a Beta distribution shifted to have domain in (-1, 1).

tuning

EitherNULL or alist containing tuning valuesto be specified for the MCMC Metropolis steps. IfNULL is not chosen then allof the tuning values must be specified.

WhenNULL is chosen then default tuning values are automatically generated to1. Otherwise alist must be provided with namesPsi, orRho. Each of these entries must be scalars containing tuning variances for their corresponding Metropolis updates.Rho is only specified for continuous spatial processes.

mcmc

EitherNULL or alist containing input values to be usedfor implementing the MCMC sampler. IfNULL is not chosen then allof the MCMC input values must be specified.

NBurn: The number of sampler scans included in the burn-in phase. (default =10,000)

NSims: The number of post-burn-in scans for which to perform thesampler. (default =10,000)

NThin: Value such that during the post-burn-in phase, only everyNThin-th scan is recorded for use in posterior inference (For return valueswe define, NKeep = NSims / NThin (default =1).

NPilot: The number of times during the burn-in phase that pilot adaptationis performed (default =20)

seed

An integer value used to set the seed for the random number generator(default = 54).

gamma.shrinkage

A logical indicating whether a gamma shrinkage process prior is used for the variances of the factor loadings columns. If FALSE,the hyperparameters (A1 and A2) indicate the shape and rate for a gamma prior on the precisions. Default is TRUE.

include.space

A logical indicating whether a spatial process should be included. Default is TRUE, however if FALSE the spatial correlation matrix is fixed as an identity matrix. This specification overrides thespatial.structure input.

clustering

A logical indicating whether the Bayesian non-parametric process should be used, default is TRUE. If FALSE is specifiedeach column is instead modeled with an independent spatial process.

Details

Details of the underlying statistical model proposed byBerchuck et al. 2019. are forthcoming.

Value

bfa_sp returns a list containing the following objects

lambda

NKeep x (M x O x K)matrix of posterior samples for factor loadings matrixlambda.The labels for each column are Lambda_O_M_K.

eta

NKeep x (Nu x K)matrix of posterior samples for the latent factorseta.The labels for each column are Eta_Nu_K.

beta

NKeep x Pmatrix of posterior samples forbeta.

sigma2

NKeep x (M * (O - C))matrix of posterior samples for the variancessigma2.The labels for each column are Sigma2_O_M.

kappa

NKeep x ((O * (O + 1)) / 2)matrix of posterior samples forkappa. Thecolumns have names that describe the samples within them. The row is listed first, e.g.,Kappa3_2 refers to the entry in row3, column2.

delta

NKeep x Kmatrix of posterior samples fordelta.

tau

NKeep x Kmatrix of posterior samples fortau.

upsilon

NKeep x ((K * (K + 1)) / 2)matrix of posterior samples forUpsilon. Thecolumns have names that describe the samples within them. The row is listed first, e.g.,Upsilon3_2 refers to the entry in row3, column2.

psi

NKeep x 1matrix of posterior samples forpsi.

xi

NKeep x (M x O x K)matrix of posterior samples for factor loadings cluster labelsxi.The labels for each column are Xi_O_M_K.

rho

NKeep x 1matrix of posterior samples forrho.

metropolis

2 (or 1) x 3matrix of metropolisacceptance rates, updated tuners, and original tuners that result from the pilotadaptation.

runtime

Acharacter string giving the runtime of the MCMC sampler.

datobj

Alist of data objects that are used in futurebfa_sp functionsand should be ignored by the user.

dataug

Alist of data augmentation objects that are used in futurebfa_sp functions and should be ignored by the user.

References

Reference for Berchuck et al. 2019 is forthcoming.

Examples

###Load womblR for example visual field datalibrary(womblR)###Format data for MCMC samplerblind_spot <- c(26, 35) # define blind spotVFSeries <- VFSeries[order(VFSeries$Location), ] # sort by locationVFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visitVFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locationsdat <- data.frame(Y = VFSeries$DLS / 10) # create data frame with scaled dataTime <- unique(VFSeries$Time) / 365 # years since baseline visitW <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix (data object from womblR)M <- dim(W)[1] # number of locations###Prior bounds for psiTimeDist <- as.matrix(dist(Time))BPsi <- log(0.025) / -min(TimeDist[TimeDist > 0])APsi <- log(0.975) / -max(TimeDist)###MCMC optionsK <- 10 # number of latent factorsO <- 1 # number of spatial observation typesHypers <- list(Sigma2 = list(A = 0.001, B = 0.001),               Kappa = list(SmallUpsilon = O + 1, BigTheta = diag(O)),               Delta = list(A1 = 1, A2 = 20),               Psi = list(APsi = APsi, BPsi = BPsi),               Upsilon = list(Zeta = K + 1, Omega = diag(K)))Starting <- list(Sigma2 = 1,                 Kappa = diag(O),                 Delta = 2 * (1:K),                 Psi = (APsi + BPsi) / 2,                 Upsilon = diag(K))Tuning <- list(Psi = 1)MCMC <- list(NBurn = 1000, NSims = 1000, NThin = 2, NPilot = 5)###Fit MCMC Samplerreg.bfa_sp <- bfa_sp(Y ~ 0, data = dat, dist = W, time = Time,  K = 10,                      starting = Starting, hypers = Hypers, tuning = Tuning, mcmc = MCMC,                     L = Inf,                     family = "tobit",                     trials = NULL,                     temporal.structure = "exponential",                     spatial.structure = "discrete",                     seed = 54,                      gamma.shrinkage = TRUE,                     include.space = TRUE,                     clustering = TRUE)###Note that this code produces the pre-computed data object "reg.bfa_sp"###More details can be found in the vignette.

diagnostics

Description

Calculates diagnostic metrics using output from thespBFA model.

Usage

diagnostics(  object,  diags = c("dic", "dinf", "waic"),  keepDeviance = FALSE,  keepPPD = FALSE,  Verbose = TRUE,  seed = 54)

Arguments

object

AspBFA model object for which diagnosticsare desired from.

diags

A vector of character strings indicating the diagnostics to compute.Options include: Deviance Information Criterion ("dic"), d-infinity ("dinf") andWatanabe-Akaike information criterion ("waic"). At least one option must be included.Note: The probit model cannot compute the DIC or WAIC diagnostics due to computationalissues with computing the multivariate normal CDF.

keepDeviance

A logical indicating whether the posterior deviance distributionis returned (default = FALSE).

keepPPD

A logical indicating whether the posterior predictive distributionat each observed location is returned (default = FALSE).

Verbose

A boolean logical indicating whether progress should be output (default = TRUE).

seed

An integer value used to set the seed for the random number generator(default = 54).

Details

To assess model fit, DIC, d-infinity and WAIC are used. DIC is based on thedeviance statistic and penalizes for the complexity of a model with an effectivenumber of parameters estimate pD (Spiegelhalter et al 2002). The d-infinity posteriorpredictive measure is an alternative diagnostic tool to DIC, where d-infinity=P+G.The G term decreases as goodness of fit increases, and P, the penalty term, inflatesas the model becomes over-fit, so small values of both of these terms and, thus, smallvalues of d-infinity are desirable (Gelfand and Ghosh 1998). WAIC is invariant toparametrization and is asymptotically equal to Bayesian cross-validation(Watanabe 2010). WAIC = -2 * (lppd - p_waic_2). Where lppd is the log pointwisepredictive density and p_waic_2 is the estimated effective number of parametersbased on the variance estimator from Vehtari et al. 2016. (p_waic_1 is the meanestimator).

Value

diagnostics returns a list containing the diagnostics requested andpossibly the deviance and/or posterior predictive distribution objects.

Author(s)

Samuel I. Berchuck

References

Gelfand, A. E., & Ghosh, S. K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika, 1-11.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583-639.

Vehtari, A., Gelman, A., & Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 1-20.

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(Dec), 3571-3594.

Examples

###Load pre-computed regression resultsdata(reg.bfa_sp)###Compute and print diagnosticsdiags <- diagnostics(reg.bfa_sp)print(unlist(diags))

is.spBFA

Description

is.spBFA is a general test of an object being interpretable as aspBFA object.

Usage

is.spBFA(x)

Arguments

x

object to be tested.

Details

ThespBFA class is defined as the regression object thatresults from thespBFA regression function.

Value

is.spBFA returns a logical, depending on whether the object is of classspBFA.

Examples

###Load pre-computed resultsdata(reg.bfa_sp)###Test functionis.spBFA(reg.bfa_sp)

predict.spBFA

Description

Predicts future observations from thespBFA model.

Usage

## S3 method for class 'spBFA'predict(  object,  NewTimes,  NewX = NULL,  NewTrials = NULL,  type = "temporal",  Verbose = TRUE,  seed = 54,  ...)

Arguments

object

AspBFA model object for which predictionsare desired from.

NewTimes

A numeric vector including desired time(s) points for prediction.

NewX

A matrix including covariates at timesNewTimes for prediction.NewX must have dimension(M x O x NNewVistis) x P. WhereNNewVisits is the number of temporal locations being predicted. The default setsNewX toNULL, which assumes that the covariates for all predictions are the same as the final time point.

NewTrials

An array indicating the trials for categorical predictions. The array must have dimensionM x C x NNewVisitsand contain only non-negative integers. The default setsNewTrials toNULL, which assumes the trials for all predictionsare the same as the final time point.

type

A character string indicating the type of prediction, choices include "temporal" and "spatial". Spatial prediction has not been implemented yet.

Verbose

A boolean logical indicating whether progress should be output.

seed

An integer value used to set the seed for the random number generator(default = 54).

...

other arguments.

Details

predict.spBFA uses Bayesian krigging to predict vectors at futuretime points. The function returns the krigged factors (Eta) and also the observed outcomes (Y).

Value

predict.spBFA returns a list containing the following objects.

Eta

Alist containingNNewVistis matrices, one for each new time prediction. Each matrix is dimensionNKeep x K, whereK is the number of latent factors Each matrix contains posterior samples obtained by Bayesian krigging.

Y

Alist containingNNewVistis posterior predictive distributionmatrices. Each matrix is dimensionNKeep x (M * O), whereM is the number of spatial locations andO the number of observation types.Each matrix is obtained through Bayesian krigging.

Author(s)

Samuel I. Berchuck

Examples

###Load pre-computed regression resultsdata(reg.bfa_sp)###Compute predictionspred <- predict(reg.bfa_sp, NewTimes = 3)pred.observations <- pred$Y$Y10 # observed data predictionspred.krig <- pred$Eta$Eta10 # krigged parameters

Pre-computed regression results frombfa_sp

Description

The data objectreg.bfa_sp is a pre-computedspBFA data object for use in the package vignette and function examples.

Usage

data(reg.bfa_sp)

Format

The data objectreg.bfa_sp is aspBFA data object that is the result of implementing the MCMC code in the vignette.It is presented here because the run-time would be unecessarily long when compiling the R package.


[8]ページ先頭

©2009-2025 Movatter.jp