To account for spatial correlation between geographic measures, theinterface implements two widely used versions of the ConditionalAutoregressive (CAR) models as prior specifications within the Bayesianhierarchical models:
This vignette provides a brief description of the spatial priors andcorresponding Stan code. We also include the case study applying theBYM2 prior to a hierarchical model for COVID-19 infectionestimation.
These models specifically apply toareal data, which consistof a single aggregated measure for each areal unit or region. In spatialmodeling, the primary interest lies in therelationships betweenunits rather than in the units themselves. Common approaches todefining these relationships includerook andqueen contiguity, which identify two areal units asneighbors if they share a border (or a vertex). We use thespdep package to construct neighborhood graphs from arealunits. For ZIP codes,ZCTAs (ZIP Code Tabulation Areas)are used as proxies to infer the adjacency structure. A commonmathematical representation of this structure is via theadjacency matrix, denoted\(\mathbf{W}\). Because the edges areundirected,\(\mathbf{W}\) is asymmetric\(N \times N\) matrix for aset of\(N\) areal units. Thisrepresentation enables mathematical operations that yield valuableinsights into the neighborhood graph.
Theconnectivity of the graph can affect the choiceof spatial models. For instance, the ICAR prior requires a connectedgraph. Aconnected graph contains a singlecomponent, meaning that each node can be reached fromany other node. In contrast, a graph with multiple components isdisconnected, as nodes in one component cannot reachthose in another. A component of size one is referred to as anisland (orisolate). Since thestructured component of a spatial model relies on neighborhood edges forsmoothing, these isolates require special handling.
TheICAR model is a special case of a GaussianMarkov random field (GMRF), known as the intrinsic GMRF. Let\(\pmb{\phi} = \left(\phi_1, \ldots,\phi_N\right)^{\top}\) denote the spatial random effectsassociated with the\(N\) neighboringregions. Assuming that spatial dependence is fully captured by theneighborhood structure in\(\mathbf{W}\), the ICAR model assumes thateach\(\phi_i\) is conditionallydistributed based its neighboring values\(\phi_{-i}\):\[(\phi_i \mid \phi_{-i}) \sim {N}\!\left( \frac{\sum_j w_{ij}\phi_j}{\sum_j w_{ij}}, \frac{1}{\tau \sum_j w_{ij}}\right),\] where\(w_{ij} > 0\) ifnodes\(i\) and\(j\) are neighbors and\(w_{ij} = 0\) otherwise, and\(\tau\) represents the precision. Thus, thejoint density can be written as
\[ p(\pmb{\phi} \mid \tau ) \propto\tau^{\frac{N-1}{2}}\exp \left(-\frac{\tau}{2} \pmb{\phi}^{\top}\mathbf{L} \pmb{\phi}\right), \]
where\(\mathbf{L} = \mathbf{D} -\mathbf{W}\) is thegraph Laplacian matrix,constructed from the adjacency matrix\(\mathbf{W}\) and the degree matrix\(\mathbf{D}\), where\(\mathbf{D} = \text{diag}(d_i)\) with\(d_i = \sum_j w_{ij}\). Note that\(\mathbf{L}\) satisfies\(\mathbf{L} \mathbf{1}=\mathbf{0}\), and ispositive semidefinite but not invertible. Expanding the quadratic form,we can express the log-probability density in terms of pairwisedifferences between neighboring pairs:
\[ \log p(\pmb{\phi}) = -\frac{\tau}{2}\sum_{i \sim j} w_{ij} (\phi_i - \phi_j)^2 + \text{const.},\]
where\({i \sim j}\) means that\(i\) and\(j\) are neighbors. The joint density isimproper but characterizes local conditional relationships amongneighboring sites, penaling large differences between neighbors and thusencouraging local smoothness. The precision parameter\(\tau\) controls the degree of smoothnesscaptured by the ICAR component. This formulation explicitly shows howthe neighborhood structure is incorporated into the joint probability.It also reveals theunidentifiability problem of theICAR model, because adding a constant to all elements of\(\pmb{\phi}\) does not change thedifferences. To resolve this, asum-to-zero constraintis imposed:\[ \sum_{i=1}^{N} \phi_i = 0.\] This constraint also prevents\(\pmb{\phi}\) from confounding the modelintercept.
The implementation in Stan is straightforward with the pairwisedifference formulation.
Define the function for computing the log probability density:
functions {real icar_normal_lpdf(vector phi,array[]int node1,array[]int node2) {return -0.5 * dot_self(phi[node1] - phi[node2]); } ...Pass neighborhood information using edges defined by nodeindices:
data {int<lower =0> N;// number of areal regionsint<lower =0> N_edges;// number of neighbor pairsarray[N_edges]int<lower =1,upper = N> node1;// neighbor pairs: node1[i] adjacent to node2[i]array[N_edges]int<lower =1,upper = N> node2; ...Constrain\(\phi\) using Stan’ssum_to_zero_vector definition:
Specify the joint probability density:
While the ICAR prior imposes spatial dependence, it can be overlyrestrictive in practice. By construction, all variation in\(\pmb{\phi}\) is spatially structured, soindependent region-specific deviations are absorbed into the smoothsurface. In many real data applications, however, some variability isspatially uncorrelated (e.g., due to measurement error orregion-specific effects). To address this, the Besag–York–Mollié (BYM)model augments the ICAR component with an unstructured random effectthat captures independent noise. The BYM2 reparameterization (Riebler etal., 2016) refines this approach by standardizing the ICAR term andseparating overall scale from the spatial structure. In effect, theprecision parameter\(\tau\) from theICAR model is replaced by a fixed scaling constant derived from theneighborhood graph, ensuring that the total variance and spatialproportion parameters remain interpretable and comparable acrossdatasets.
In the BYM2 formulation, the ICAR precision matrix is rescaled by\(s\), computed from the graphLaplacian\(\mathbf{L}\) as
\[s =\operatorname{mean}\!\big[\mathrm{diag}(\mathbf{L}^{-}_{\text{pseudo}})\big],\]
where\(\mathbf{L}^{-}_{\text{pseudo}}\) denotesthe Moore–Penrose inverse of the Laplacian. This ensures that the scaledICAR component\(\pmb{\phi}^\ast =s^{-1/2}\pmb{\phi}\) has unit marginal variance under the graphstructure, i.e.,\(\operatorname{Var}(\phi_i^\ast) = 1\).
The spatial random effect for region\(i\) is then modeled as
\[a_i = \sigma\,b_i= \sigma\!\left( \sqrt{\rho}\,\phi_i^\ast + \sqrt{1-\rho}\,\theta_i\right) = \sigma\!\left( \sqrt{\frac{\rho}{s}}\,\phi_i + \sqrt{1-\rho}\,\theta_i\right),\qquad i = 1, \dots, N,\]
where:
This formulation removes the precision parameter\(\tau\) from the model by absorbing it intothe scaling of the ICAR component. As a result, the parameters\(\sigma\) and\(\rho\) have clear and interpretablemeanings:\(\sigma\) represents thetotal spatial variability, and\(\rho\)quantifies how much variability is explained by the spatialstructure.
The standard BYM2 model assumes that the spatial graph isconnected, meaning every areal unit can be reached fromany other through a path of neighboring edges. When the adjacency graphhasmultiple connected components, the Laplacian matrix\(\mathbf{L}\) becomes block-diagonal,and the ICAR prior exhibits one degree of freedom (a zero eigenvalue)per component. Consequently, the BYM2 formulation mustbe extended to handle these multiple components. To make theimplementation of this extension easier, we reparameterize, at thecomponent level, the structured ICAR field on a basis that (i) lies ineach component’s sum-to-zero subspace and (ii) is BYM2-standardized.Isolates receive only the spatially independent part.
Recall that\(W\) is the symmetricadjacency,\(D=\mathrm{diag}(d_i)\) isthe degree matrix, and\(L=D-W\) is thegraph Laplacian. Because the graph is connected,\(L\mathbf 1=0\) and\(\mathrm{null}(L)=\mathrm{span}\{\mathbf1\}\). The ICAR prior is improper on\(\mathbb R^N\) but becomes proper on thesum-to-zero subspace\[\mathbf H=\{\phi\in\mathbb R^N:\mathbf 1^\top \phi=0\}.\] Since\(L\) is symmetric andpositive semidefinite, diagonalize\(L=U\Lambda U^\top\), where the eigenvaluessatisfy\(0=\lambda_1<\lambda_2\le\cdots\le\lambda_N\),with\(u_1\propto \mathbf 1\). Write\(U_+=[u_2,\ldots,u_N]\in\mathbbR^{N\times(N-1)}\) and\(\Lambda_+=\mathrm{diag}(\lambda_2,\ldots,\lambda_N)\).The Moore–Penrose pseudo inverse is\[L^{-}=U_+\Lambda_+^{-1}U_+^\top,\] which equals the covariance of the ICAR prior restricted to\(\mathbf H\).
Define the basis\[R \;=\; U_+\,\Lambda_+^{-1/2}\in\mathbb R^{N\times(N-1)},\qquad \eta\sim \mathbf N(0,I_{N-1}),\qquad \phi \;=\; R\,\eta.\] This reparameterization is equivalent to the constrained ICARin the following sense. First, the support matches because each columnof\(R\) is orthogonal to\(\mathbf 1\), hence\(\mathbf 1^\top\phi=\mathbf 1^\top R\eta=0\)for all\(\eta\), i.e.,\(\phi\in\mathbf H\). Second, the meanmatches since\(\mathbb E[\phi]=R\,\mathbbE[\eta]=0\). Third, the covariance matches because\[\mathrm{Var}(\phi) \;=\; R\,\mathrm{Var}(\eta)\,R^\top \;=\; R R^\top\;=\; U_+\Lambda_+^{-1}U_+^\top \;=\; L^{-}.\] Linear images of a multivariate normal are normal; therefore\(\phi\stackrel{d}{=}\mathbfN(0,L^{-})\) on\(\mathbf H\).Right-orthogonal rotations of the scores,\(R\mapsto RQ\) with\(Q\) orthogonal, leave\(R R^\top\) unchanged; the parameterizationis not unique but the induced law for\(\phi\) is.
BYM2 standardization rescales the structured field so that thegeometric mean (GM) of its marginal variances equals one. Let\(v_i=(L^{-})_{ii}\) and define\[s \;=\; \exp\Big(\tfrac{1}{N}\sum_{i=1}^N \log v_i\Big),\qquadR_{\text{BYM2}} \;=\; \frac{1}{\sqrt{s}}\,R,\qquad \tilde\phi \;=\;R_{\text{BYM2}}\,\eta.\] Then\(\mathrm{GM}\big(\operatorname{diag}\mathrm{Var}(\tilde\phi)\big)=1\),while\(\mathbf 1^\top \tilde\phi=0\)still holds because the subspace is unchanged.
The reparameterization allows a clean implementation as thereduced-rank ICAR basis is already BYM2-standardized and enforces thesum-to-zero constraint by construction. The sum_to_zero vector is notlonger needed that can improve sampling.
Pass the scaled reduced-rank ICAR basis:
data {int<lower=0> N;// number of areal regionsint<lower=0> N_pos;//number of positive eigenvaluesmatrix[N_zip, N_pos] R;// scaled so that geometric mean of// marginal variance is 1 ...Define parameters:
parameters {real<lower=0> sigma;// overall standard deviation for spatial effectreal<lower=0,upper=1> rho;// mixing parametervector[N] theta;// unstructured spatial random effectvector[N] eta;// structured reduced-rank scores ...Compute the scaled ICAR component and combined spatial effect:
transformed_parameters {vector[N] phi = R * eta;vector[N] b = sqrt(rho) * phi + sqrt(1 - rho) * theta;vector[N] a = sigma * b ...Assigning priors:
One common application of the BYM2 model is infectious diseasesurveillance. Here, we fit a multilevel regression model to a datasetcontaining COVID-19 test records from a Midwest hospital. This datasetincludes over 120,000 test records from patients residing in more than1,000 ZIP codes, with most records from Michigan and more than a quarterof the ZIP codes having five or fewer test records.
Let the test result for individual\(i\) be\(y_i\), where\(y_i=1\) indicates a positive result and\(y_i=0\) indicates negative. We obtainaggregated counts as the number of tests\(n_j\) and the number of positive cases\(y^*_j\) in cell\(j\). Let\(p_j=\textrm{Pr}(y_{j[i]}=1)\) be theprobability that person\(i\) in cell\(j\) tests positive. We account forthe PCR testing sensitivity and specificity, where the positivity\(p_j\) is a function of the test sensitivity\(\delta\), specificity\(\gamma\), and the true incidence\(\pi_j\) for people in cell\(j\):
\[\begin{align}\label{positivity}p_j=(1-\gamma)(1-\pi_j )+\delta \pi_j.\end{align}\]
We fit a binomial model for\(y^*_j\),\(y^*_j\sim \textrm{binomial}(n_j, p_j)\) with a logistic regression for\(\pi_j\) with covariates—sex, age,race, ZIP codes, and time in weeks—to allow time-varying incidence inthe multilevel model.\[\begin{align}\label{pi}\textrm{logit}(\pi_j)=\beta_1+\beta_2{\rm male}_j+\alpha_{{\rma}[j]}^{\rm age}+\alpha_{{\rm r}[j]}^{\rm race}+\alpha_{{\rm s}[j]}^{\rmZIP}+\alpha_{{\rm t}[j]}^{\rm time},\end{align}\] where\({\rmmale}_j\) is an indicator for men;\({\rm a}[j]\),\({\rm r}[j]\), and\({\rm s}[j]\) represent age, race, and ZIPlevels; and\({\rm t}[j]\) denotes thetime in weeks when the test result is collected for cell\(j\).
We assign the hierarchical priors to the varying intercepts:\[\begin{align}\nonumber &\alpha^{\rm age} \sim \mbox{normal}(0,\sigma^{\rm age} ),\,\,\, \sigma^{\rm age}\sim \mbox{normal}_+ (0,3)\\&\alpha^{\rm race} \sim \mbox{normal}(0,\sigma^{\rm race} ), \,\,\,\sigma^{\rm race}\sim \mbox{normal}_+ (0,3).\\&\alpha_{\rm s}^{\rm ZIP} \sim BYM2,\end{align}\] where\(\alpha_{\rms}^{\rm ZIP}\) is the combined spatial effect (i.e.,\(a\)) defined in the BYM2 model section.
The model specification usingshinymrp is as follows.See the reference page for parameter descriptions.
library(shinymrp)# Initialize workflowworkflow<-mrp_workflow()# Preprocess sample data# ...# Construct poststratification table# ...# Create a new model objectmodel<- workflow$create_model(intercept_prior ="normal(0, 5)",fixed =list(sex ="normal(0, 3)",urbanicity ="normal(0, 3)",college ="normal(0, 3)",employment ="normal(0, 3)",poverty ="normal(0, 3)",income ="normal(0, 3)",adi ="normal(0, 3)" ),varying =list(race ="normal(0, 3)",age ="normal(0, 3)",time ="normal(0, 3)",zip ="bym2" ),sens =0.7,spec =0.999)# Run MCMCmodel$fit(n_iter =4000,n_chains =4,adapt_delta =0.95)