Movatterモバイル変換


[0]ホーム

URL:


Estimating ANOVA Models with rstanarm

Jonah Gabry and Ben Goodrich

2025-09-29

library(ggplot2)library(bayesplot)theme_set(bayesplot::theme_default())

Introduction

This vignette explains how to estimate ANalysis Of VAriance (ANOVA)models using thestan_aov function in therstanarm package

The four steps of a Bayesian analysis are

  1. Specify a joint distribution for the outcome(s) and all theunknowns, which typically takes the form of a marginal priordistribution for the unknowns multiplied by a likelihood for theoutcome(s) conditional on the unknowns. This joint distribution isproportional to a posterior distribution of the unknowns conditional onthe observed data
  2. Draw from posterior distribution using Markov Chain Monte Carlo(MCMC).
  3. Evaluate how well the model fits the data and possibly revise themodel.
  4. Draw from the posterior predictive distribution of the outcome(s)given interesting values of the predictors in order to visualize how amanipulation of a predictor affects (a function of) the outcome(s).

Steps 3 and 4 are covered in more depth by the vignette entitled“How to Use therstanarmPackage”. This vignette focuses on Step 1 when the likelihood is theproduct of independent normal distributions. We also demonstrate thatStep 2 is not entirely automatic because it is sometimes necessary tospecify some additional tuning parameters in order to obtain optimallyefficient results.

Likelihood

The likelihood for one observation under a linear model can bewritten as a conditionally normal PDF\[\frac{1}{\sigma_{\epsilon} \sqrt{2 \pi}} e^{-\frac{1}{2} \left(\frac{y -\mu}{\sigma_{\epsilon}}\right)^2},\] where\(\mu = \alpha + \mathbf{x}^\top\boldsymbol{\beta}\) is a linear predictor and\(\sigma_{\epsilon}\) is the standarddeviation of the error in predicting the outcome,\(y\). The likelihood of the entire sample isthe product of\(N\) individuallikelihood contributions.

An ANOVA model can be considered a special case of the above linearregression model where each of the\(K\) predictors in\(\mathbf{x}\) is a dummy variable indicatingmembership in a group. An equivalent linear predictor can be written as\(\mu_j = \alpha + \alpha_j\), whichexpresses the conditional expectation of the outcome in the\(j\)-th group as the sum of a common mean,\(\alpha\), and a group-specificdeviation from the common mean,\(\alpha_j\).

Priors

If we view the ANOVA model as a special case of a linear regressionmodel with only dummy variables as predictors, then the model could beestimated using the prior specification in thestan_lmfunction. In fact, this is exactly how thestan_aovfunction is coded. These functions require the user to specify a valuefor the prior location (by default the mode) of the\(R^2\), the proportion of variance in theoutcome attributable to the predictors under a linear model. This priorspecification is appealing in an ANOVA context because of thefundamental identity\[SS_{\mbox{total}} =SS_{\mbox{model}} + SS_{\mbox{error}},\] where\(SS\) stands for sum-of-squares. If wenormalize this identity, we obtain the tautology\(1 = R^2 + \left(1 - R^2\right)\) but it isreasonable to expect a researcher to have a plausible guess for\(R^2\) before conducting an ANOVA. See thevignette for thestan_lm function(regularized linear models) for more information on this approach.

If we view the ANOVA model as a difference of means, then the modelcould be estimated using the prior specification in thestan_lmer function. In the syntax popularized by thelme4 package,y ~ 1 + (1|group) representsa likelihood where\(\mu_j = \alpha +\alpha_j\) and\(\alpha_j\) isnormally distributed across the\(J\)groups with mean zero and some unknown standard deviation. Thestan_lmer function specifies that this standard deviationhas a Gamma prior with, by default, both its shape and scale parametersequal to\(1\), which is just anstandard exponential distribution. However, the shape and scaleparameters can be specified as other positive values. This approach alsorequires specifying a prior distribution on the standard deviation ofthe errors that is independent of the prior distribution for each\(\alpha_j\). See thevignette for thestan_glmer function(lme4-style models usingrstanarm) formore information on this approach.

Example

We will utilize an example from theHSAUR3 packageby Brian S. Everitt and Torsten Hothorn, which is used in their 2014bookA Handbook of Statistical Analyses Using R (3rd Edition)(Chapman & Hall / CRC). This book is frequentist in nature and wewill show how to obtain the corresponding Bayesian results.

The model in section 4.3.1 analyzes an experiment where rats weresubjected to different diets in order to see how much weight theygained. The experimental factors were whether their diet had low or highprotein and whether the protein was derived from beef or cereal. Beforeseeing the data, one might expect that a moderate proportion of thevariance in weight gain might be attributed to protein (source) in thediet. The frequentist ANOVA estimates can be obtained:

data("weightgain",package ="HSAUR3")coef(aov(weightgain~ source* type,data = weightgain))
         (Intercept)         sourceCereal              typeLow                100.0                -14.1                -20.8 sourceCereal:typeLow                 18.8

To obtain Bayesian estimates we can prependstan_ toaov and specify the prior location of the\(R^2\) as well as optionally the number ofcores that the computer is allowed to utilize:

library(rstanarm)post1<-stan_aov(weightgain~ source* type,data = weightgain,prior =R2(location =0.5),adapt_delta =0.999,seed =12345)post1
stan_aov family:       gaussian [identity] formula:      weightgain ~ source * type observations: 40 predictors:   4------                     Median MAD_SD(Intercept)           99.0    4.5 sourceCereal         -13.0    6.1 typeLow              -18.9    6.5 sourceCereal:typeLow  17.1    8.6 Auxiliary parameter(s):              Median MAD_SDR2             0.2    0.1  log-fit_ratio  0.0    0.1  sigma         14.7    1.7  ANOVA-like table:                    Median MAD_SDMean Sq source      557.1  419.6 Mean Sq type        991.8  602.5 Mean Sq source:type 729.8  688.5 ------* For help interpreting the printed output see ?print.stanreg* For info on the priors used see ?prior_summary.stanreg

Here we have specifiedadapt_delta = 0.999 to decreasethe stepsize and largely prevent divergent transitions. See theTroubleshooting section in the main rstanarmvignette for more details aboutadapt_delta. Also, our prior guess that\(R^2 = 0.5\) was overly optimistic. However,the frequentist estimates presumably overfit the data even more.

Alternatively, we could prependstan_ tolmer and specify the corresponding priors

post2<-stan_lmer(weightgain~1+ (1|source)+ (1|type)+ (1|source:type),data = weightgain,prior_intercept =cauchy(),prior_covariance =decov(shape =2,scale =2),adapt_delta =0.999,seed =12345)

Comparing these two models using theloo function in theloo package reveals a negligible preference for thefirst approach that is almost entirely due to its having a smallernumber of effective parameters as a result of the more regularizingpriors. However, the difference is so small that it may seemadvantageous to present the second results which are more in line with amainstream Bayesian approach to an ANOVA model.

Conclusion

This vignette has compared and contrasted two approaches toestimating an ANOVA model with Bayesian techniques using therstanarm package. They both have the same likelihood,so the (small in this case) differences in the results are attributableto differences in the priors.

Thestan_aov approach just callsstan_lmand thus only requires a prior location on the\(R^2\) of the linear model. This seemsrather easy to do in the context of an ANOVA decomposition of the totalsum-of-squares in the outcome into model sum-of-squares and residualsum-of-squares.

Thestan_lmer approach just callsstan_glmbut specifies a normal prior with mean zero for the deviations from\(\alpha\) across groups. This is morein line with what most Bayesians would do naturally — particularly ifthe factors were considered “random” — but also requires a prior for\(\alpha\),\(\sigma\), and the standard deviation of thenormal prior on the group-level intercepts. Thestan_lmerapproach is very flexible and might be more appropriate for morecomplicated experimental designs.


[8]ページ先頭

©2009-2025 Movatter.jp