library(ggplot2)library(bayesplot)theme_set(bayesplot::theme_default())This vignette explains how to model continuous outcomes on the openunit interval using thestan_betareg function in therstanarm package.
The four steps of a Bayesian analysis are
Steps 3 and 4 are covered in more depth by the vignette entitled“How to Use therstanarmPackage”. This vignette focuses on Step 1 when the likelihood is theproduct of beta distributions.
Beta regression uses the beta distribution as the likelihood for thedata,\[f(y_i | a, b) = \frac{y_i^{(a-1)}(1-y_i)^{(b-1)}}{B(a,b)}\] where\(B(\cdot)\) is thebeta function. The shape parameters for the distribution are\(a\) and\(b\) and enter into the model according tothe following transformations,\[a = \mu\cdot\phi \\b = (1-\mu)\cdot\phi\]
Let\(g_1(\cdot)\) be some linkfunction. Then, in the specification of the shape parameters above,\(\mu =g_1^{-1}(\mathbf{X}\boldsymbol{\beta})\), where\(\boldsymbol{X}\) is a\(N\times K\) dimensional matrix ofpredictors, and\(\boldsymbol{\beta}\)is a\(K\) dimensional vector ofparameters associated with each predictor.
In the simplest case (with only one set of regressors),\(\phi\) is a scalar parameter.Alternatively, it is possible to model\(\phi\) using a second set of regressors\(\mathbf{Z}\). In this context let\(g_2(\cdot)\) be some link functionthat is not necessarily identical to\(g_1(\cdot)\). Then\(\phi =g_2^{-1}(\mathbf{Z}\boldsymbol{\gamma})\), where\(\boldsymbol{\gamma}\) is a\(J\) dimensional vector of parametersassociated with the\(N\times J\)dimensional matrix of predictors\(\mathbf{Z}\).
After substituting the shape parameter values in, the likelihood usedin beta regression takes the following form,\[f(y_i | \mu, \phi) =\frac{y_i^{(\mu\phi-1)}(1-y_i)^{((1-\mu)\phi-1)}}{B(\mu\phi,(1-\mu)\phi)}\]
A full Bayesian analysis requires specifying prior distributions\(f(\boldsymbol{\beta})\) and\(f(\phi)\) for the vector of regressioncoefficients and\(\phi\). When usingstan_betareg, these distributions can be set using theprior_intercept,prior, andprior_phi arguments. Thestan_betareg functionsupports a variety of prior distributions, which are explained in therstanarm documentation(help(priors, package = 'rstanarm')).
When modeling\(\phi\) with a linearpredictor a full Bayesian analysis requires specifying the priordistributions\(f(\boldsymbol{\beta})\)and\(f(\boldsymbol{\gamma})\). Instan_betareg the prior distributions on\(\boldsymbol{\gamma}\) can be set using theprior_intercept_z andprior_z arguments.
As an example, suppose we have\(K\)predictors and believe — prior to seeing the data — that\(\beta_1, \dots, \beta_K\) and\(\phi\) are as likely to be positive as theyare to be negative, but are highly unlikely to be far from zero. Thesebeliefs can be represented by normal distributions with mean zero and asmall scale (standard deviation). To give\(\phi\) and each of the\(\beta\)s this prior (with a scale of 1,say), in the call tostan_betareg we would include theargumentsprior_intercept = normal(0,1),prior = normal(0,1), andprior_phi = normal(0,1).
If, on the other hand, we have less a priori confidence that theparameters will be close to zero then we could use a larger scale forthe normal distribution and/or a distribution with heavier tails thanthe normal like the Student t distribution.Step 1 inthe “How to Use therstanarm Package” vignettediscusses one such example.
After fitting the model we can use theprior_summaryfunction to print information about the prior distributions used whenfitting the model.
When using only asingle set of regressors, the posteriordistribution of\(\boldsymbol{\beta}\)and\(\phi\) is proportional to theproduct of the likelihood contributions, the\(K\) priors on the\(\beta_k\) parameters, and\(\phi\),\[f(\boldsymbol{\beta},\phi|\mathbf{y},\mathbf{X}) \propto\prod_{i=1}^N f(y_i | a, b) \times\prod_{k=1}^K f(\beta_k) \timesf(\phi)\]
When usingtwo sets of regressors, the posteriordistribution of\(\boldsymbol{\beta}\)and\(\boldsymbol{\gamma}\) isproportional to the product of the likelihood contribution, the\(K\) priors on the\(\beta_k\) parameters, and the\(J\) priors on the\(\gamma_j\) parameters,
\[f(\boldsymbol{\beta},\boldsymbol{\gamma}|\mathbf{y},\mathbf{X}) \propto\prod_{i=1}^N f(y_i | a, b) \times\prod_{k=1}^K f(\beta_k) \times\prod_{j=1}^J f(\gamma_j)\]
In this example the outcome variable\(\mathbf{y}\) is simulated in a way thatwarrants the use of beta regression. It is worth mentioning that thedata generation process is quite convoluted, which is apparent in theidentification of the likelihood above.
The data simulated below uses the logistic link function on the firstset of regressors and the log link function on the second set ofregressors.
SEED<-1234set.seed(SEED)eta<-c(1,-0.2)gamma<-c(1.8,0.4)N<-200x<-rnorm(N,2,2)z<-rnorm(N,0,2)mu<-binomial(link = logit)$linkinv(eta[1]+ eta[2]*x)phi<-binomial(link = log)$linkinv(gamma[1]+ gamma[2]*z)y<-rbeta(N, mu* phi, (1- mu)* phi)dat<-data.frame(cbind(y, x, z))hist(dat$y,col ="darkgrey",border = F,main ="Distribution of Outcome Variable",xlab ="y",breaks =20,freq = F)The model can be fit by callingstan_betareg, using theappropriate link functions.
library(rstanarm)fit1<-stan_betareg(y~ x| z,data = dat,link ="logit",link.phi ="log",cores =2,seed =12345)fit2<-stan_betareg(y~-1+ x ,data = dat,link ="logit",link.phi ="log",cores =2,seed =12345)round(coef(fit1),2)round(coef(fit2),2) (Intercept) x (phi)_(Intercept) (phi)_z 0.93 -0.20 1.84 0.31 x (phi)_(Intercept) 0.00 1.08For clarity we can useprior_summary to print theinformation about the prior distributions used to fit the models. Thepriors used infit1 are provided below.
prior_summary(fit1)Priors for model 'fit1' ------Intercept (after predictors centered) ~ normal(location = 0, scale = 2.5)Coefficients Specified prior: ~ normal(location = 0, scale = 2.5) Adjusted prior: ~ normal(location = 0, scale = 1.2)Intercept_z (after predictors centered) ~ normal(location = 0, scale = 2.5)Coefficients_z Specified prior: ~ normal(location = 0, scale = 2.5) Adjusted prior: ~ normal(location = 0, scale = 1.2)------See help('prior_summary.stanreg') for more detailsThe usual posterior analyses are available inrstanarm. The plots below illustrate simulated valuesof the outcome variable. The incorrect model noticeably fails to capturethe top of the distribution consistently in comparison to the truemodel.
library(ggplot2)library(bayesplot)bayesplot_grid(pp_check(fit1),pp_check(fit2),xlim =c(0,1),ylim =c(0,4),titles =c("True Model: y ~ x | z","False Model: y ~ x - 1"),grid_args =list(ncol =2))We can also compare models by evaluating the expected log pointwisepredictive density (elpd), which can be calculated usingtheloo method, which provides an interface forrstanarm models to the functionality in theloo package.
loo1<-loo(fit1)loo2<-loo(fit2)loo_compare(loo1, loo2) elpd_diff se_difffit1 0.0 0.0 fit2 -79.9 11.8The difference inelpd is negative indicating that theexpected predictive accuracy for the first model is higher.
In some applied contexts it may be necessary to work with an outcomevariable that is a proportion. If the proportion is bound on the openunit interval then beta regression can be considered a reasonableestimation method. Thebetareg package provides a dataseton the proportion of crude oil converted to gasoline after distillationand fractionation. This variable is defined as yield. Belowstan_betareg is used to model yield as a function oftemperature, pressure, and the batch of conditions.
library(rstanarm)data("GasolineYield",package ="betareg")gas_fit1<-stan_betareg(yield~ temp+ batch,data = GasolineYield,link ="logit",seed =12345)gas_fit2<-stan_betareg(yield~ temp+ batch| pressure,data = GasolineYield,link ="logit",seed =12345)round(coef(gas_fit1),2)round(coef(gas_fit2),2)(Intercept) temp batch1 batch2 batch3 batch4 -5.16 0.01 1.35 0.95 1.15 0.77 batch5 batch6 batch7 batch8 batch9 (phi) 0.80 0.72 0.33 0.26 0.15 12.04 (Intercept) temp batch1 batch2 -6.06 0.01 1.69 1.29 batch3 batch4 batch5 batch6 1.53 1.03 1.10 1.01 batch7 batch8 batch9 (phi)_(Intercept) 0.52 0.47 0.37 5.34 (phi)_pressure 0.04The plots below illustrate simulated values of gasoline yield. Whilethe first model accounts for variation in batch conditions itspredictions looks somewhat uniform rather than resembling the peaked andright-skewed behavior of the true data. The second model does a somewhatbetter job at capturing the shape of the distribution, however itslocation is off as it is centered around 0.50 rather than 0.20.
library(ggplot2)bayesplot_grid(pp_check(gas_fit1),pp_check(gas_fit2),xlim =c(0,1),ylim =c(0,5),titles =c("gas_fit1","gas_fit2"),grid_args =list(ncol =2))gas_loo1<-loo(gas_fit1)gas_loo2<-loo(gas_fit2)loo_compare(gas_loo1, gas_loo2) elpd_diff se_diffgas_fit2 0.0 0.0 gas_fit1 -33.9 3.3Evaluating the expected log predictive distribution usingloo reveals that the second of the two models ispreferred.
Ferrari, SLP and Cribari-Neto, F (2004) “Beta Regression for ModelingRates and Proportions”.Journal of Applied Statistics. Vol. 31,No. 07, p799-815.