Introduction

This vignette provides an overview of how the specification of priordistributions works in therstanarm package. It isstill a work in progress and more content will be added in futureversions ofrstanarm. Before reading this vignette itis important to first read theHow to Use therstanarm Package vignette, which provides a generaloverview of the package.

Every modeling function inrstanarm offers a subsetof the arguments in the table below which are used for specifying priordistributions for the model parameters.

Argument	Used in	Applies to
`prior_intercept`	All modeling functions except`stan_polr` and`stan_nlmer`	Model intercept, after centering predictors.
`prior`	All modeling functions	Regression coefficients. Doesnot include coefficients thatvary by group in a multilevel model (see`prior_covariance`).
`prior_aux`	`stan_glm`,`stan_glmer`,`stan_gamm4`,`stan_nlmer`	Auxiliary parameter, e.g. error SD (interpretation depends on theGLM).
`prior_covariance`	`stan_glmer`*,`stan_gamm4`,`stan_nlmer`	Covariance matrices in multilevel models with varying slopes andintercepts. See the`stan_glmer`vignette for details on this prior.

*stan_glm also impliesstan_glm.nb.stan_glmer impliesstan_lmer andstan_glmer.nb.

Thestan_polr,stan_betareg, andstan_gamm4 functions also provide additional argumentsspecific only to those models:

Argument	Used only in	Applies to
`prior_smooth`	`stan_gamm4`	Prior for hyperparameters in GAMs (lower values yield less flexiblesmooth functions).
`prior_counts`	`stan_polr`	Prior counts of anordinal outcome (when predictors atsample means).
`prior_z`	`stan_betareg`	Coefficients in the model for`phi`.
`prior_intercept_z`	`stan_betareg`	Intercept in the model for`phi`.
`prior_phi`	`stan_betareg`	`phi`, if not modeled as function of predictors.

To specify these arguments the user provides a call to one of thevarious available functions for specifying priors (e.g.,prior = normal(0, 1),prior = cauchy(c(0, 1), c(1, 2.5))). The documentation forthese functions can be found athelp("priors"). Therstanarm documentation and the othervignettes provide many examples of using thesearguments to specify priors and the documentation for these arguments onthe help pages for the variousrstanarm modelingfunctions (e.g.,help("stan_glm")) also explains whichdistributions can be used when specifying each of the prior-relatedarguments.

Default (Weakly Informative) Prior Distributions

With very few exceptions, the default priors inrstanarm —the priors used if the arguments in thetables above are untouched— arenot flat priors. Rather, thedefaults are intended to beweakly informative. That is, theyare designed to provide moderate regularization and help stabilizecomputation. For many (if not most) applications the defaults willperform well, but this is not guaranteed (there are no default priorsthat make sense for every possible model specification).

The wayrstanarm attempts to make priors weaklyinformative by default is to internally adjust the scales of the priors.How this works (and, importantly, how to turn it off) is explainedbelow, but first we can look at the default priors in action by fittinga basic linear regression model with thestan_glm function.For specifying priors, thestan_glm function accepts theargumentsprior_intercept,prior, andprior_aux. To use the default priors we just leave thosearguments at their defaults (i.e., we don’t specify them):

library("rstanarm")default_prior_test<-stan_glm(mpg~ wt+ am,data = mtcars,chains =1)

Theprior_summary function provides a concise summary ofthe priors used:

prior_summary(default_prior_test)

Priors for model 'default_prior_test' ------Intercept (after predictors centered)  Specified prior:    ~ normal(location = 20, scale = 2.5)  Adjusted prior:    ~ normal(location = 20, scale = 15)Coefficients  Specified prior:    ~ normal(location = [0,0], scale = [2.5,2.5])  Adjusted prior:    ~ normal(location = [0,0], scale = [15.40,30.20])Auxiliary (sigma)  Specified prior:    ~ exponential(rate = 1)  Adjusted prior:    ~ exponential(rate = 0.17)------See help('prior_summary.stanreg') for more details

Starting from the bottom up, we can see that:

Auxiliary:sigma, the errorstandard deviation, has a default prior that is\(\mathsf{exponential}(1)\). However, as aresult of the automatic rescaling, the actual scale used was6.03.
Coefficients: By default the regressioncoefficients (in this case the coefficients on thewt andam variables) are treated as a priori independent withnormal priors centered at 0 and with scale (standard deviation)\(2.5\). Like forsigma, inorder for the default to be weakly informativerstanarmwill adjust the scales of the priors on the coefficients. As a result,the prior scales actually used were 15.40 and 30.20.
Intercept: For the intercept, the default prioris normal with mean\(0\) and standarddeviation\(2.5\), but in this case thestandard deviation was adjusted to 15.07. There is also a note inparentheses informing you that the prior applies to the intercept afterall predictors have been centered (a similar note can be found in thedocumentation of theprior_intercept argument). In manycases the value of\(y\) when\(x=0\) is not meaningful and it is easier tothink about the value when\(x =\bar{x}\). Therefore placing a prior on the intercept aftercentering the predictors typically makes it easier to specify areasonable prior for the intercept. (Note: the user doesnotneed to manually center the predictors.)

To disable the centering of the predictors, you need to omit theintercept from the modelformula and include a column ofones as a predictor (which cannot be named"(Intercept)" inthedata.frame). Then you can specify a prior “coefficient”for the column of ones.

The next two subsections describe how the rescaling works and how toeasily disable it if desired.

Default priors and scale adjustments

Automatic scale adjustments happen in two cases:

When the default priors are used.
When the user setsautoscale=TRUE when specifying theirown prior (e.g.,normal(0, 3, autoscale=TRUE)). Seehelp("priors") for a list of distributions to see whichhave anautoscale argument.

Here we describe how the default priors work for the intercept,regression coefficients, and (if applicable) auxiliary parameters.Autoscaling when not using default priors works analogously (ifautoscale=TRUE).

Assume we have outcome\(y\) andpredictors\(x_1,\ldots,x_k\) and ourmodel has linear predictor

\[\alpha + \beta_1 x_1 + \dots + \beta_K x_K.\]

Regression coefficients

The default prior on regression coefficients\(\beta_k\) is

\[\beta_k \sim \mathsf{Normal}(0, \, 2.5 \cdot s_y/s_x)\] where\(s_x = \text{sd}(x)\)and\[s_y =\begin{cases}\text{sd}(y) & \text{if } \:\: {\tt family=gaussian(link)}, \\1 & \text{otherwise}.\end{cases}\]

This corresponds toprior = normal(0, 2.5, autoscale = TRUE) inrstanarm code.

Intercept

The intercept is assigned a prior indirectly. Theprior_intercept argument refers to the intercept after allpredictors have been centered (internally byrstanarm).That is, instead of placing the prior on the expected value of\(y\) when\(x=0\), we place a prior on the expectedvalue of\(y\) when\(x = \bar{x}\). The default prior for thiscentered intercept, say\(\alpha_c\),is

\[\alpha_c \sim \mathsf{Normal}(m_y, \, 2.5 \cdot s_y)\] where

\[m_y =\begin{cases}\bar{y} & \text{if } \:\: {\ttfamily=gaussian(link="identity")}, \\0 & \text{otherwise}\end{cases}\] and\(s_y\) is the same asabove (either 1 or\(\text{sd(y)}\)).

Auxiliary parameters

The default prior on the auxiliary parameter (residual standarddeviation for Gaussian, shape for gamma, reciprocal dispersion fornegative binomial, etc.) is an exponential distribution with rate\(1/s_y\)

\[\text{aux} \sim \mathsf{Exponential}(1/s_y)\] where\(s_y\) is the same asabove (either 1 or\(\text{sd(y)}\)).

This corresponds toprior_aux = exponential(1, autoscale=TRUE) inrstanarm code.

Note on data-based priors

Because the scaling is based on the scales of the predictors (andpossibly the outcome) these are technically data-dependent priors.However, since these priors are quite wide (and in most cases ratherconservative), the amount of information used is weak and mainly takesinto account the order of magnitude of the variables. This enablesrstanarm to offer defaults that are reasonable for manymodels.

Disabling prior scale adjustments

To disable automatic rescaling simply specify a prior other than thedefault.rstanarm versions up to and including version2.19.3 used to require you to explicitly set theautoscale argument toFALSE, but nowautoscaling only happens by default for the default priors. To useautoscaling with manually specified priors you have to setautoscale = TRUE. For example, this prior specificationwill not include any autoscaling:

test_no_autoscale<-update(    default_prior_test,prior =normal(0,5),prior_intercept =student_t(4,0,10),prior_aux =cauchy(0,3)  )

We can verify that the prior scales weren’t adjusted by checkingprior_summary:

prior_summary(test_no_autoscale)

Priors for model 'test_no_autoscale' ------Intercept (after predictors centered) ~ student_t(df = 4, location = 0, scale = 10)Coefficients ~ normal(location = [0,0], scale = [5,5])Auxiliary (sigma) ~ half-cauchy(location = 0, scale = 3)------See help('prior_summary.stanreg') for more details

How to Specify Flat Priors (and why you typically shouldn’t)

Uninformative is usually unwarranted and unrealistic (flat isfrequently frivolous and fictional)

When “non-informative” or “uninformative” is used in the context ofprior distributions, it typically refers to a flat (uniform)distribution or a nearly flat distribution. Sometimes it may also beused to refer to the parameterization-invariant Jeffreys prior. Althoughrstanarm does not prevent you from using very diffuseor flat priors, unless the data is very strong it is wise to avoidthem.

Rarely is it appropriate in any applied setting to use a prior thatgives the same (or nearly the same) probability mass to values near zeroas it gives values bigger than the age of the universe in nanoseconds.Even a much narrower prior than that, e.g., a normal distribution with\(\sigma = 500\), will tend to put muchmore probability mass on unreasonable parameter values than reasonableones. In fact, using the prior\(\theta \sim\mathsf{Normal(\mu = 0, \sigma = 500)}\) implies some strangeprior beliefs. For example, you believe a priori that\(P(|\theta| < 250) < P(|\theta| >250)\), which can easily be verified by doing the calculationwith the normal CDF

p<-1-2*pnorm(-250,mean =0,sd =500)print(paste("Pr(-250 < theta < 250) =",round(p,2)))

[1] "Pr(-250 < theta < 250) = 0.38"

or via approximation with Monte Carlo draws:

theta<-rnorm(1e5,mean =0,sd =500)p_approx<-mean(abs(theta)<250)print(paste("Pr(-250 < theta < 250) =",round(p_approx,2)))

[1] "Pr(-250 < theta < 250) = 0.38"

d<-data.frame(theta,clr =abs(theta)>250)library(ggplot2)ggplot(d,aes(x = theta,fill = clr))+geom_histogram(binwidth =5,show.legend =FALSE)+scale_y_continuous(name ="",labels =NULL,expand =c(0,0))+scale_x_continuous(name =expression(theta),breaks =c(-1000,-250,250,1000))

There is much more probability mass outside the interval (-250,250).

This will almost never correspond to the prior beliefs of aresearcher about a parameter in a well-specified applied regressionmodel and yet priors like\(\theta \sim\mathsf{Normal(\mu = 0, \sigma = 500)}\) (and more extreme)remain quite popular.

Even when you know very little, a flat or very wide prior will almostnever be the best approximation to your beliefs about the parameters inyour model that you can express usingrstanarm (orother software).Some amount of prior information will beavailable. For example, even if there is nothing to suggest a priorithat a particular coefficient will be positive or negative, there isalmost always enough information to suggest that different orders ofmagnitude are not equally likely. Making use of this information whensetting a prior scale parameter is simple —one heuristic is to set thescale an order of magnitude bigger than you suspect it to be— and hasthe added benefit of helping to stabilize computations.

A more in-depth discussion of non-informative vs weakly informativepriors is available in the case studyHowthe Shape of a Weakly Informative Prior Affects Inferences.

Specifying flat priors

rstanarm will use flat priors ifNULLis specified rather than a distribution. For example, to use a flatprior on regression coefficients you would specifyprior=NULL:

flat_prior_test<-stan_glm(mpg~ wt,data = mtcars,prior =NULL)

In this case we letrstanarm use the default priorsfor the intercept and error standard deviation (we could change that ifwe wanted), but the coefficient on thewt variable willhave a flat prior. To double check that indeed a flat prior was used forthe coefficient onwt we can callprior_summary:

prior_summary(flat_prior_test)

Priors for model 'flat_prior_test' ------Intercept (after predictors centered)  Specified prior:    ~ normal(location = 20, scale = 2.5)  Adjusted prior:    ~ normal(location = 20, scale = 15)Coefficients ~ flatAuxiliary (sigma)  Specified prior:    ~ exponential(rate = 1)  Adjusted prior:    ~ exponential(rate = 0.17)------See help('prior_summary.stanreg') for more details

Movatterモバイル変換

Prior Distributions for rstanarmModels

Jonah Gabry and Ben Goodrich

2025-09-29

July 2020 Update