Movatterモバイル変換


[0]ホーム

URL:


Prior Distributions for rstanarmModels

Jonah Gabry and Ben Goodrich

2025-09-29

library(ggplot2)library(bayesplot)theme_set(bayesplot::theme_default())

July 2020 Update

As of July 2020 there are a few changes to prior distributions:

We recommend the new bookRegression and OtherStories, which discusses the background behind the default priors inrstanarm and also provides examples of specifyingnon-default priors.

Introduction

This vignette provides an overview of how the specification of priordistributions works in therstanarm package. It isstill a work in progress and more content will be added in futureversions ofrstanarm. Before reading this vignette itis important to first read theHow to Use therstanarm Package vignette, which provides a generaloverview of the package.

Every modeling function inrstanarm offers a subsetof the arguments in the table below which are used for specifying priordistributions for the model parameters.


ArgumentUsed inApplies to
prior_interceptAll modeling functions exceptstan_polr andstan_nlmerModel intercept, after centering predictors.
priorAll modeling functionsRegression coefficients. Doesnot include coefficients thatvary by group in a multilevel model (seeprior_covariance).
prior_auxstan_glm*,stan_glmer*,stan_gamm4,stan_nlmerAuxiliary parameter, e.g. error SD (interpretation depends on theGLM).
prior_covariancestan_glmer*,stan_gamm4,stan_nlmerCovariance matrices in multilevel models with varying slopes andintercepts. See thestan_glmervignette for details on this prior.

*stan_glm also impliesstan_glm.nb.stan_glmer impliesstan_lmer andstan_glmer.nb.


Thestan_polr,stan_betareg, andstan_gamm4 functions also provide additional argumentsspecific only to those models:

ArgumentUsed only inApplies to
prior_smoothstan_gamm4Prior for hyperparameters in GAMs (lower values yield less flexiblesmooth functions).
prior_countsstan_polrPrior counts of anordinal outcome (when predictors atsample means).
prior_zstan_betaregCoefficients in the model forphi.
prior_intercept_zstan_betaregIntercept in the model forphi.
prior_phistan_betaregphi, if not modeled as function of predictors.


To specify these arguments the user provides a call to one of thevarious available functions for specifying priors (e.g.,prior = normal(0, 1),prior = cauchy(c(0, 1), c(1, 2.5))). The documentation forthese functions can be found athelp("priors"). Therstanarm documentation and the othervignettes provide many examples of using thesearguments to specify priors and the documentation for these arguments onthe help pages for the variousrstanarm modelingfunctions (e.g.,help("stan_glm")) also explains whichdistributions can be used when specifying each of the prior-relatedarguments.


Default (Weakly Informative) Prior Distributions

With very few exceptions, the default priors inrstanarm —the priors used if the arguments in thetables above are untouched— arenot flat priors. Rather, thedefaults are intended to beweakly informative. That is, theyare designed to provide moderate regularization and help stabilizecomputation. For many (if not most) applications the defaults willperform well, but this is not guaranteed (there are no default priorsthat make sense for every possible model specification).

The wayrstanarm attempts to make priors weaklyinformative by default is to internally adjust the scales of the priors.How this works (and, importantly, how to turn it off) is explainedbelow, but first we can look at the default priors in action by fittinga basic linear regression model with thestan_glm function.For specifying priors, thestan_glm function accepts theargumentsprior_intercept,prior, andprior_aux. To use the default priors we just leave thosearguments at their defaults (i.e., we don’t specify them):

library("rstanarm")default_prior_test<-stan_glm(mpg~ wt+ am,data = mtcars,chains =1)

Theprior_summary function provides a concise summary ofthe priors used:

prior_summary(default_prior_test)
Priors for model 'default_prior_test' ------Intercept (after predictors centered)  Specified prior:    ~ normal(location = 20, scale = 2.5)  Adjusted prior:    ~ normal(location = 20, scale = 15)Coefficients  Specified prior:    ~ normal(location = [0,0], scale = [2.5,2.5])  Adjusted prior:    ~ normal(location = [0,0], scale = [15.40,30.20])Auxiliary (sigma)  Specified prior:    ~ exponential(rate = 1)  Adjusted prior:    ~ exponential(rate = 0.17)------See help('prior_summary.stanreg') for more details

Starting from the bottom up, we can see that:

To disable the centering of the predictors, you need to omit theintercept from the modelformula and include a column ofones as a predictor (which cannot be named"(Intercept)" inthedata.frame). Then you can specify a prior “coefficient”for the column of ones.

The next two subsections describe how the rescaling works and how toeasily disable it if desired.

Default priors and scale adjustments

Automatic scale adjustments happen in two cases:

  1. When the default priors are used.
  2. When the user setsautoscale=TRUE when specifying theirown prior (e.g.,normal(0, 3, autoscale=TRUE)). Seehelp("priors") for a list of distributions to see whichhave anautoscale argument.

Here we describe how the default priors work for the intercept,regression coefficients, and (if applicable) auxiliary parameters.Autoscaling when not using default priors works analogously (ifautoscale=TRUE).

Assume we have outcome\(y\) andpredictors\(x_1,\ldots,x_k\) and ourmodel has linear predictor

\[\alpha + \beta_1 x_1 + \dots + \beta_K x_K.\]

Regression coefficients

The default prior on regression coefficients\(\beta_k\) is

\[\beta_k \sim \mathsf{Normal}(0, \, 2.5 \cdot s_y/s_x)\] where\(s_x = \text{sd}(x)\)and\[s_y =\begin{cases}\text{sd}(y) & \text{if } \:\: {\tt family=gaussian(link)}, \\1 & \text{otherwise}.\end{cases}\]

This corresponds toprior = normal(0, 2.5, autoscale = TRUE) inrstanarm code.

Intercept

The intercept is assigned a prior indirectly. Theprior_intercept argument refers to the intercept after allpredictors have been centered (internally byrstanarm).That is, instead of placing the prior on the expected value of\(y\) when\(x=0\), we place a prior on the expectedvalue of\(y\) when\(x = \bar{x}\). The default prior for thiscentered intercept, say\(\alpha_c\),is

\[\alpha_c \sim \mathsf{Normal}(m_y, \, 2.5 \cdot s_y)\] where

\[m_y =\begin{cases}\bar{y} & \text{if } \:\: {\ttfamily=gaussian(link="identity")}, \\0 & \text{otherwise}\end{cases}\] and\(s_y\) is the same asabove (either 1 or\(\text{sd(y)}\)).

Auxiliary parameters

The default prior on the auxiliary parameter (residual standarddeviation for Gaussian, shape for gamma, reciprocal dispersion fornegative binomial, etc.) is an exponential distribution with rate\(1/s_y\)

\[\text{aux} \sim \mathsf{Exponential}(1/s_y)\] where\(s_y\) is the same asabove (either 1 or\(\text{sd(y)}\)).

This corresponds toprior_aux = exponential(1, autoscale=TRUE) inrstanarm code.

Note on data-based priors

Because the scaling is based on the scales of the predictors (andpossibly the outcome) these are technically data-dependent priors.However, since these priors are quite wide (and in most cases ratherconservative), the amount of information used is weak and mainly takesinto account the order of magnitude of the variables. This enablesrstanarm to offer defaults that are reasonable for manymodels.

Disabling prior scale adjustments

To disable automatic rescaling simply specify a prior other than thedefault.rstanarm versions up to and including version2.19.3 used to require you to explicitly set theautoscale argument toFALSE, but nowautoscaling only happens by default for the default priors. To useautoscaling with manually specified priors you have to setautoscale = TRUE. For example, this prior specificationwill not include any autoscaling:

test_no_autoscale<-update(    default_prior_test,prior =normal(0,5),prior_intercept =student_t(4,0,10),prior_aux =cauchy(0,3)  )

We can verify that the prior scales weren’t adjusted by checkingprior_summary:

prior_summary(test_no_autoscale)
Priors for model 'test_no_autoscale' ------Intercept (after predictors centered) ~ student_t(df = 4, location = 0, scale = 10)Coefficients ~ normal(location = [0,0], scale = [5,5])Auxiliary (sigma) ~ half-cauchy(location = 0, scale = 3)------See help('prior_summary.stanreg') for more details


How to Specify Flat Priors (and why you typically shouldn’t)

Uninformative is usually unwarranted and unrealistic (flat isfrequently frivolous and fictional)

When “non-informative” or “uninformative” is used in the context ofprior distributions, it typically refers to a flat (uniform)distribution or a nearly flat distribution. Sometimes it may also beused to refer to the parameterization-invariant Jeffreys prior. Althoughrstanarm does not prevent you from using very diffuseor flat priors, unless the data is very strong it is wise to avoidthem.

Rarely is it appropriate in any applied setting to use a prior thatgives the same (or nearly the same) probability mass to values near zeroas it gives values bigger than the age of the universe in nanoseconds.Even a much narrower prior than that, e.g., a normal distribution with\(\sigma = 500\), will tend to put muchmore probability mass on unreasonable parameter values than reasonableones. In fact, using the prior\(\theta \sim\mathsf{Normal(\mu = 0, \sigma = 500)}\) implies some strangeprior beliefs. For example, you believe a priori that\(P(|\theta| < 250) < P(|\theta| >250)\), which can easily be verified by doing the calculationwith the normal CDF

p<-1-2*pnorm(-250,mean =0,sd =500)print(paste("Pr(-250 < theta < 250) =",round(p,2)))
[1] "Pr(-250 < theta < 250) = 0.38"

or via approximation with Monte Carlo draws:

theta<-rnorm(1e5,mean =0,sd =500)p_approx<-mean(abs(theta)<250)print(paste("Pr(-250 < theta < 250) =",round(p_approx,2)))
[1] "Pr(-250 < theta < 250) = 0.38"
d<-data.frame(theta,clr =abs(theta)>250)library(ggplot2)ggplot(d,aes(x = theta,fill = clr))+geom_histogram(binwidth =5,show.legend =FALSE)+scale_y_continuous(name ="",labels =NULL,expand =c(0,0))+scale_x_continuous(name =expression(theta),breaks =c(-1000,-250,250,1000))
_There is much more probability mass outside the interval (-250, 250)._

There is much more probability mass outside the interval (-250,250).


This will almost never correspond to the prior beliefs of aresearcher about a parameter in a well-specified applied regressionmodel and yet priors like\(\theta \sim\mathsf{Normal(\mu = 0, \sigma = 500)}\) (and more extreme)remain quite popular.

Even when you know very little, a flat or very wide prior will almostnever be the best approximation to your beliefs about the parameters inyour model that you can express usingrstanarm (orother software).Some amount of prior information will beavailable. For example, even if there is nothing to suggest a priorithat a particular coefficient will be positive or negative, there isalmost always enough information to suggest that different orders ofmagnitude are not equally likely. Making use of this information whensetting a prior scale parameter is simple —one heuristic is to set thescale an order of magnitude bigger than you suspect it to be— and hasthe added benefit of helping to stabilize computations.

A more in-depth discussion of non-informative vs weakly informativepriors is available in the case studyHowthe Shape of a Weakly Informative Prior Affects Inferences.

Specifying flat priors

rstanarm will use flat priors ifNULLis specified rather than a distribution. For example, to use a flatprior on regression coefficients you would specifyprior=NULL:

flat_prior_test<-stan_glm(mpg~ wt,data = mtcars,prior =NULL)

In this case we letrstanarm use the default priorsfor the intercept and error standard deviation (we could change that ifwe wanted), but the coefficient on thewt variable willhave a flat prior. To double check that indeed a flat prior was used forthe coefficient onwt we can callprior_summary:

prior_summary(flat_prior_test)
Priors for model 'flat_prior_test' ------Intercept (after predictors centered)  Specified prior:    ~ normal(location = 20, scale = 2.5)  Adjusted prior:    ~ normal(location = 20, scale = 15)Coefficients ~ flatAuxiliary (sigma)  Specified prior:    ~ exponential(rate = 1)  Adjusted prior:    ~ exponential(rate = 0.17)------See help('prior_summary.stanreg') for more details


Informative Prior Distributions

Although the default priors tend to work well, prudent use of moreinformative priors is encouraged. For example, suppose we have a linearregression model\[y_i \sim\mathsf{Normal}\left(\alpha + \beta_1 x_{1,i} + \beta_2 x_{2,i}, \,\sigma\right)\] and we have evidence (perhaps from previousresearch on the same topic) that approximately\(\beta_1 \in (-15, -5)\) and\(\beta_2 \in (-1, 1)\). An example of aninformative prior for\(\boldsymbol{\beta} =(\beta_1, \beta_2)'\) could be

\[\boldsymbol{\beta} \sim \mathsf{Normal} \left( \begin{pmatrix} -10 \\ 0 \end{pmatrix}, \begin{pmatrix} 5^2 & 0 \\ 0 & 2^2 \end{pmatrix}\right),\] which sets the prior means at the midpoints of the intervalsand then allows for some wiggle room on either side. If the data arehighly informative about the parameter values (enough to overwhelm theprior) then this prior will yield similar results to a non-informativeprior. But as the amount of data and/or the signal-to-noise ratiodecrease, using a more informative prior becomes increasinglyimportant.

If the variablesy,x1, andx2are in the data framedat then this model can be specifiedas

my_prior<-normal(location =c(-10,0),scale =c(5,2))stan_glm(y~ x1+ x2,data = dat,prior = my_prior)

We left the priors for the intercept and error standard deviation attheir defaults, but informative priors can be specified for thoseparameters in an analogous manner.


[8]ページ先頭

©2009-2025 Movatter.jp