2 Example: Simple LinearRegression

We will illustrate the use of Gibbs Sampling on a simple linearregression model. Recall that we saw yesterday that we can obtain ananalytical solution for a Bayesian linear regression, but that morecomplex models require a simulation approach; in Practical 5, we usedthe packageMCMCpack to fit the model using GibbsSampling.

The approach we take here for Gibbs Sampling on a simple linearregression model is very easily generalised to more complex models, thekey is that whatever your model looks like, you update one parameter ata time, using the conditional distribution of that parameter given thecurrent values of all the other parameters.

The simple linear regression model we will analyse here is a reducedversion of the general linear regression model we saw yesterday, and thesame one as in Practical 5:\[Y_i = \beta_0+\beta_1x_i+\epsilon_i\] for response variable$\mathbf{Y}=\left(Y_1,Y_2,\ldots,Y_n\right)$,explanatory variable$\mathbf{x}=\left(x_1,x_2,\ldots,x_n\right)$and residual vector$\mathbf{\epsilon}=\left(\epsilon_1,\epsilon_2,\ldots,\epsilon_n\right)$ for asample of size$n$, where$\beta_0$ is the regression intercept,$\beta_1$ is the regression slope,and where the$\epsilon_i$ areindependent with$\epsilon_i\sim\textrm{N}\left(0,\sigma^2\right)$$\forall i=1,2,\ldots,n$. For convenience,we refer to the combined set of$\mathbf{Y}$ and$\mathbf{x}$ data as$\mathcal{D}$. We also define$\hat{\mathbf{y}}$ to be the fittedresponse vector (i.e.,$\hat{y}_i = \beta_0 +\beta_1 x_i$ from the regression equation) using the currentvalues of the parameters$\beta_0$,$\beta_1$ and precision$\tau = 1$ (remember that$\tau=\frac{1}{\sigma^2}$) from the GibbsSampling simulations.

For Bayesian inference, it is simpler to work with precisions$\tau$ rather than with variances$\sigma^2$. Given priors\[\begin{align*}\pi(\tau) &= \textrm{Ga}\left(\alpha, \beta\right), \\\pi(\beta_0) &= \textrm{N}\left(\mu_{\beta_0},\tau_{\beta_0}\right), \quad\textrm{and} \\\pi(\beta_1) &= \textrm{N}\left(\mu_{\beta_1}, \tau_{\beta_1}\right)\end{align*}\] then by combining with the likelihood we can derive the fullconditional distributions for each parameter. For$\tau$, we have\[\begin{align*}\pi\left(\tau|\mathcal{D}, \beta_0, \beta_1\right) &\propto\pi(\tau)\prod_{i=1}^nL\left(y_i|\hat{y_i}\right), \\&= \tau^{\alpha-1}e^{-\beta\tau}\tau^{\frac{n}{2}}\exp\left\{-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\}, \\&= \tau^{\alpha-1 + \frac{n}{2}}\exp\left\{-\beta\tau-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\},\\\textrm{so} \quad \tau|\mathcal{D}, \beta_0, \beta_1 &\sim\textrm{Ga}\left(\alpha+\frac{n}{2},\beta +\frac{1}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right).\end{align*}\] For$\beta_0$ we will needto expand$\hat{y}_i=\beta_0+\beta_1x_i$, and then wehave\[\begin{align*}p(\beta_0|\mathcal{D}, \beta_1, \tau) &\propto\pi(\beta_0)\prod_{i=1}^nL\left(y_i|\hat{y_i}\right), \\&= \tau_{\beta_0}^{\frac{1}{2}}\exp\left\{-\frac{\tau_{\beta_0}}{2}(\beta_0-\mu_{\beta_0})^2\right\}\tau^\frac{n}{2}\exp\left\{-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\}, \\&\propto\exp\left\{-\frac{\tau_{\beta_0}}{2}(\beta_0-\mu_{\beta_0})^2-\frac{\tau}{2}\sum_{i=1}^n(y_i-\beta_0-x_i \beta_1)^2\right\}, \\&= \exp \left\{ -\frac{1}{2}\left(\beta_0^2(\tau_{\beta_0} + n\tau)+\beta_0\left(-2\tau_{\beta_0}\mu_{\beta_0} - 2\tau\sum_{i=1}^n(y_i - x_i\beta_1)\right) \right) + C \right\}, \\\textrm{so} \quad \beta_0|\mathcal{D}, \beta_1, \tau &\sim\mathcal{N}\left((\tau_{\beta_0} + n\tau)^{-1}\left(\tau_{\beta_0}\mu_{\beta_0} + \tau\sum_{i=1}^n (y_i -x_i\beta_1)\right),\tau_{\beta_0} + n\tau\right).\end{align*}\]

In the above,$C$ represents aquantity that is constant with respect to$\beta_0$ and hence can be ignored.Finally, for$\beta_1$ we have\[\begin{align*}p(\beta_1|\mathcal{D}, \beta_0, \tau) &\propto\pi(\beta_1)\prod_{i=1}^nL\left(y_i|\hat{y_i}\right), \\&= \tau_{\beta_1}^{\frac{1}{2}}\exp\left\{-\frac{\tau_{\beta_1}}{2}(\beta_1-\mu_{\beta_1})^2\right\}\tau^\frac{n}{2}\exp\left\{-\frac{\tau}{2}\sum_{i=1}^n(y_i-\hat{y_i})^2\right\}, \\&\propto\exp\left\{-\frac{\tau_{\beta_1}}{2}(\beta_1-\mu_{\beta_1})^2-\frac{\tau}{2}\sum_{i=1}^n(y_i-\beta_0-x_i \beta_1)^2\right\}, \\&= \exp \left\{ -\frac{1}{2}\left(\beta_1^2\left(\tau_{\beta_1} +\tau\sum_{i=1}^n x_i^2\right) +\beta_1\left(-2\tau_{\beta_1}\mu_{\beta_1} - 2\tau\sum_{i=1}^n(y_i - \beta_0)x_i\right) \right) + C \right\}, \\\textrm{so} \quad \beta_1|\mathcal{D}, \beta_0, \tau&\sim \textrm{N}\left((\tau_{\beta_1} + \tau\sum_{i=1}^n x_i^2)^{-1}\left(\tau_{\beta_1}\mu_{\beta_1} + \tau\sum_{i=1}^n (y_i -\beta_0)x_i\right),\tau_{\beta_1} + \tau\sum_{i=1}^n x_i^2\right).\end{align*}\] This gives us an easy way to run Gibbs Sampling for linearregression; we have standard distributions as full conditionals andthere are standard functions in R to obtain the simulations.

We shall do this now for an example in ecology, looking at therelationship between water pollution and mayfly size - the data comefrom the bookStatistics for Ecologists Using R and Excel 2ndedition by Mark Gardener (ISBN 9781784271398), seethepublisher’s webpage.

The data are as follows:

length - the length of a mayfly in mm;
BOD - biological oxygen demand in mg of oxygen perlitre, effectively a measure of organic pollution (since more organicpollution requires more oxygen to break it down).

The data can be read into R:

# Read in dataBOD<-c(200,180,135,120,110,120,95,168,180,195,158,145,140,145,165,187,190,157,90,235,200,55,87,97,95)mayfly.length<-c(20,21,22,23,21,20,19,16,15,14,21,21,21,20,19,18,17,19,21,13,16,25,24,23,22)# Create data frame for the Gibbs Sampling, needs x and yData<-data.frame(x=BOD,y=mayfly.length)

We provide here some simple functions for running the analysis byGibbs Sampling, using the full conditionals derived above. The functionsassume you have a data frame with two columns, the responsey and the covariatex.

# Function to update tau, the precisionsample_tau<-function(data, beta0, beta1, alphatau, betatau) {rgamma(1,shape = alphatau+nrow(data)/2,rate = betatau+0.5*sum((data$y- (beta0+ data$x*beta1))^2)  )}# Function to update beta0, the regression interceptsample_beta0<-function(data, beta1, tau, mu0, tau0) {  prec<- tau0+ tau*nrow(data)  mean<- (tau0*mu0+ tau*sum(data$y- data$x*beta1))/ precrnorm(1,mean = mean,sd =1/sqrt(prec))}# Function to update beta1, the regression slopesample_beta1<-function(data, beta0, tau, mu1, tau1) {  prec<- tau1+ tau*sum(data$x* data$x)  mean<- (tau1*mu1+ tau*sum((data$y- beta0)* data$x))/ precrnorm(1,mean = mean,sd =1/sqrt(prec))}# Function to run the Gibbs Sampling, where you specify the number of# simulations `m`, the starting values for each of the three regression# parameters (`tau_start` etc), and the parameters for the prior# distributions of tau, beta0 and beta1gibbs_sample<-function(data,                         tau_start,                         beta0_start,                         beta1_start,                         m,                         alpha_tau,                         beta_tau,                         mu_beta0,                         tau_beta0,                         mu_beta1,                         tau_beta1) {  tau<-numeric(m)  beta0<-numeric(m)  beta1<-numeric(m)  tau[1]<- tau_start  beta0[1]<- beta0_start  beta1[1]<- beta1_startfor (iin2:m) {    tau[i]<-sample_tau(data, beta0[i-1], beta1[i-1], alpha_tau, beta_tau)    beta0[i]<-sample_beta0(data, beta1[i-1], tau[i], mu_beta0, tau_beta0)    beta1[i]<-sample_beta1(data, beta0[i], tau[i], mu_beta1, tau_beta1)  }  Iteration<-1:mdata.frame(Iteration,beta0,beta1,tau)}

2.1 Exercises

We will use Gibbs Sampling to fit a Bayesian Linear Regression modelto the mayfly data, with the above code. We will use the following priordistributions for the regression parameters:\[\begin{align*}\pi(\tau) &= \textrm{Ga}\left(1, 1\right), \\\pi(\beta_0) &= \textrm{N}\left(0, 0.0001\right), \quad\textrm{and}\\\pi(\beta_1) &= \textrm{N}\left(0, 0.0001\right).\end{align*}\] We will set the initial values of all parameters to 1,i.e. $\beta_0^{(0)}=\beta_1^{(0)}=\tau^{(0)}=1$.

2.1.1 Dataexploration

Investigate the data to see whether a linear regression modelwould be sensible. [You may not need to do this step if you recall theoutputs from Practical 5.]

Solution

# Scatterplotplot(BOD,mayfly.length)

plot of chunk unnamed-chunk-3

# Correlation with hypothesis testcor.test(BOD,mayfly.length)

## ##  Pearson's product-moment correlation## ## data:  BOD and mayfly.length## t = -6.52, df = 23, p-value = 1.185e-06## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:##  -0.9107816 -0.6020516## sample estimates:##        cor ## -0.8055507

Run a frequentist simple linear regression using thelm function in R; this will be useful for comparison withour Bayesian analysis.

Solution

# Linear Regression using lm()linreg<-lm(mayfly.length~ BOD,data = Data)summary(linreg)

## ## Call:## lm(formula = mayfly.length ~ BOD, data = Data)## ## Residuals:##    Min     1Q Median     3Q    Max ## -3.453 -1.073  0.307  1.105  3.343 ## ## Coefficients:##              Estimate Std. Error t value Pr(>|t|)    ## (Intercept) 27.697314   1.290822   21.46  < 2e-16 ***## BOD         -0.055202   0.008467   -6.52 1.18e-06 ***## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 1.865 on 23 degrees of freedom## Multiple R-squared:  0.6489, Adjusted R-squared:  0.6336 ## F-statistic: 42.51 on 1 and 23 DF,  p-value: 1.185e-06

2.1.2 Running the GibbsSampler

Use the code above to fit a Bayesian simple linear regressionusinglength as the response variable. Ensure you have aburn-in period so that the initial simulations are discarded. Use theoutput fromgibbs_sample to estimate the regressionparameters (with uncertainty measures). Also plot the estimates of theposterior distributions.

Solution

# Bayesian Linear Regression using a Gibbs Sampler# Set the number of iterations of the Gibbs Sampler - including how many to# discard as the burn-inm.burnin<-500m.keep<-1000m<- m.burnin+ m.keep# Obtain the samplesresults1<-gibbs_sample(Data,1,1,1,m,1,1,0,0.0001,0,0.0001)# Remove the burn-inresults1<- results1[-(1:m.burnin),]# Add variance and standard deviation columnsresults1$Variance<-1/results1$tauresults1$StdDev<-sqrt(results1$Variance)# Estimates are the column meansapply(results1[,-1],2,mean)

##       beta0       beta1         tau    Variance      StdDev ## 27.65318069 -0.05486106  0.30030561  3.64651228  1.88708959

# Also look at uncertaintyapply(results1[,-1],2,sd)

##       beta0       beta1         tau    Variance      StdDev ## 1.333977050 0.008754095 0.088994547 1.180979250 0.292387827

apply(results1[,-1],2,quantile,probs=c(0.025,0.975))

##          beta0       beta1       tau Variance   StdDev## 2.5%  24.94863 -0.07188798 0.1532262 1.997486 1.413324## 97.5% 30.31030 -0.03701739 0.5006294 6.526298 2.554662

# Kernel density plots for beta0, beta1 and the residual stanard deviationplot(density(results1$beta0,bw =0.5),main ="Posterior density for beta0")

plot of chunk unnamed-chunk-5

plot(density(results1$beta1,bw =0.25),main ="Posterior density for beta1")

plot of chunk unnamed-chunk-5

plot(density(results1$StdDev,bw =0.075),main ="Posterior density for StdDev")

plot of chunk unnamed-chunk-5

Use the functionts.plot() to view theautocorrelation in the Gibbs sampling simulation chain. Is there anyvisual evidence of autocorrelation, or do the samples lookindependent?
Solution
```
# Plot sampled seriests.plot(results1$beta0,ylab="beta0",xlab="Iteration")
```
plot of chunk unnamed-chunk-6
```
ts.plot(results1$beta1,ylab="beta1",xlab="Iteration")
```
plot of chunk unnamed-chunk-6
```
ts.plot(results1$StdDev,ylab="sigma",xlab="Iteration")
```
plot of chunk unnamed-chunk-6
How do the results compare with the frequentist output?

2.1.3 Reducing theautocorrelation by mean-centering the covariate

One method for reducing the autocorrelation in the samplingchains for regression parameters is to mean centre the covariate(s);this works because it reduces any dependence between the regressionintercept and slope(s). Do this for the current example, noting that youwill need to make a correction on the estimate of the regressionintercept afterwards.

Solution

# Mean-centre the x covariateDataC<- Datameanx<-mean(DataC$x)DataC$x<- DataC$x- meanx# Bayesian Linear Regression using a Gibbs Sampler# Set the number of iterations of the Gibbs Sampler - including how many to# discard as the burn-inm.burnin<-500m.keep<-1000m<- m.burnin+ m.keep# Obtain the samplesresults2<-gibbs_sample(DataC,1,1,1, m,1,1,0,0.0001,0,0.0001)# Remove the burn-inresults2<- results2[-(1:m.burnin), ]# Correct the effect of the mean-centering on the interceptresults2$beta0<- results2$beta0- meanx* results2$beta1# Add variance and standard deviation columnsresults2$Variance<-1/ results2$tauresults2$StdDev<-sqrt(results2$Variance)# Estimates are the column meansapply(results2[,-1],2, mean)

##      beta0      beta1        tau   Variance     StdDev ## 27.7619863 -0.0554732  0.3055647  3.5419578  1.8627159

# Also look at uncertaintyapply(results2[,-1],2, sd)

##       beta0       beta1         tau    Variance      StdDev ## 1.339327429 0.008712278 0.084838193 1.070089030 0.268923052

apply(results2[,-1],2, quantile,probs =c(0.025,0.975))

##          beta0       beta1       tau Variance   StdDev## 2.5%  25.16694 -0.07283719 0.1603452 2.002122 1.414964## 97.5% 30.36258 -0.03852308 0.4994701 6.236545 2.497308

# Kernel density plots for beta0, beta1 and the residual standard# deviationplot(density(results2$beta0,bw =0.5),main ="Posterior density for beta0")

plot of chunk unnamed-chunk-7

plot(density(results2$beta1,bw =0.25),main ="Posterior density for beta1")

plot of chunk unnamed-chunk-7

plot(density(results2$StdDev,bw =0.075),main ="Posterior density for StdDev")

plot of chunk unnamed-chunk-7

# Plot sampled seriests.plot(results2$beta0,ylab ="beta0",xlab ="Iteration")

plot of chunk unnamed-chunk-7

ts.plot(results2$beta1,ylab ="beta1",xlab ="Iteration")

plot of chunk unnamed-chunk-7

ts.plot(results2$StdDev,ylab ="sigma",xlab ="Iteration")

plot of chunk unnamed-chunk-7

3 INLA

INLA (https://www.r-inla.org/) is based on producing(accurate) approximations to the marginal posterior distributions of themodel parameters. Although this can be enough most of the time, makingmultivariate inference withINLA can be difficult orimpossible. However, in many cases this is not needed andINLA can fit some classes of models in a fraction of thetime it takes with MCMC.

We can obtain Bayesian model fits without using MCMC with the INLAsoftware, implemented in R via theINLA package. If you donot have this package installed already, as it is not on CRAN you willneed to install it via

install.packages("INLA",repos =c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"),dep=TRUE)

After this, the package can be loaded into R:

library(INLA)

3.1 Example: FakeNews

This section is included here as a reminder, as we are going toreanalyse this data set usingINLA.

Thefake_news data set in thebayesrulespackage inR contains information about 150 news articles,some real news and some fake news.

In this example, we will look at trying to predict whether an articleof news is fake or not given three explanatory variables.

We can use the following code to extract the variables we want fromthe data set:

library(bayesrules)fakenews<- fake_news[,c("type","title_has_excl","title_words","negative")]

The response variabletype takes valuesfake orreal, which should beself-explanatory. The three explanatory variables are:

title_has_excl, whether or not the article containsan exclamation mark (valuesTRUE orFALSE);
title_words, the number of words in the title (apositive integer); and
negative, a sentiment rating, recorded on acontinuous scale.

In the exercise to follow, we will examine whether the chance of anarticle being fake news is related to the three covariates here.

Note that the variabletitle_has_excl will need to beeither replaced by or converted to a factor, for example

fakenews$titlehasexcl<-as.factor(fakenews$title_has_excl)

Functionssummary andconfint produce asummary (including parameter estimates etc) and confidence intervals forthe parameters, respectively.

Finally, we create a new version of the response variable of typelogical:

fakenews$typeFAKE<- fakenews$type=="fake"

3.2 Exercises

Perform an exploratory assessment of the fake news data set, inparticular looking at the possible relationships between the explanatoryvariables and the fake/real response variabletypeFAKE. Youmay wish to use the R functionboxplot() here.

Solution

# Is there a link between the fakeness and whether the title has an exclamation mark?table(fakenews$title_has_excl, fakenews$typeFAKE)

##        ##         FALSE TRUE##   FALSE    88   44##   TRUE      2   16

# For the quantitative variables, look at boxplots on fake vs realboxplot(fakenews$title_words~ fakenews$typeFAKE)

plot of chunk unnamed-chunk-13

boxplot(fakenews$negative~ fakenews$typeFAKE)

plot of chunk unnamed-chunk-13

Fit the Bayesian model without MCMC usingINLA; notethat the summary output provides credible intervals for each parameterto help us make inference. Also, inINLA a Binomialresponse needs to be entered as type integer, so we need anotherconversion:

fakenews$typeFAKE.int<-as.integer(fakenews$typeFAKE)

Solution

# Fit model - note similarity with bayesx syntaxinla.output<-inla(formula = typeFAKE.int~ titlehasexcl+ title_words+ negative,data = fakenews,family ="binomial")# Summarise outputsummary(inla.output)

## Time used:##     Pre = 0.274, Running = 0.172, Post = 0.00536, Total = 0.452 ## Fixed effects:##                    mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept)      -3.016 0.761     -4.507   -3.016     -1.525 -3.016   0## titlehasexclTRUE  2.680 0.791      1.131    2.680      4.230  2.680   0## title_words       0.116 0.058      0.002    0.116      0.230  0.116   0## negative          0.328 0.154      0.027    0.328      0.629  0.328   0## ## Marginal log-Likelihood:  -100.78 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Fit a non-Bayesian model usingglm() for comparison.How do the model fits compare?

Solution

# Fit model - note similarity with bayesx syntaxglm.output<-glm(formula = typeFAKE~ titlehasexcl+ title_words+ negative,data = fakenews,family ="binomial")# Summarise outputsummary(glm.output)

## ## Call:## glm(formula = typeFAKE ~ titlehasexcl + title_words + negative, ##     family = "binomial", data = fakenews)## ## Coefficients:##                  Estimate Std. Error z value Pr(>|z|)    ## (Intercept)      -2.91516    0.76096  -3.831 0.000128 ***## titlehasexclTRUE  2.44156    0.79103   3.087 0.002025 ** ## title_words       0.11164    0.05801   1.925 0.054278 .  ## negative          0.31527    0.15371   2.051 0.040266 *  ## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for binomial family taken to be 1)## ##     Null deviance: 201.90  on 149  degrees of freedom## Residual deviance: 169.36  on 146  degrees of freedom## AIC: 177.36## ## Number of Fisher Scoring iterations: 4

# Perform ANOVA on each variable in turndrop1(glm.output,test="Chisq")

## Single term deletions## ## Model:## typeFAKE ~ titlehasexcl + title_words + negative##              Df Deviance    AIC     LRT  Pr(>Chi)    ## <none>            169.36 177.36                      ## titlehasexcl  1   183.51 189.51 14.1519 0.0001686 ***## title_words   1   173.17 179.17  3.8099 0.0509518 .  ## negative      1   173.79 179.79  4.4298 0.0353162 *  ## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3 Example: EmergencyRoom Complaints

This section is included here as a reminder, as we are going toreanalyse this data set usingINLA.

For this example we will use theesdcomp data set, whichis available in thefaraway package. This data set recordscomplaints about emergency room doctors. In particular, data wasrecorded on 44 doctors working in an emergency service at a hospital tostudy the factors affecting the number of complaints received.

The response variable that we will use iscomplaints, aninteger count of the number of complaints received. It is expected thatthe number of complaints will scale by the number of visits (containedin thevisits column), so we are modelling the rate ofcomplaints per visit - thus we will need to include a new variablelog.visits as an offset.

The three explanatory variables we will use in the analysis are:

residency, whether or not the doctor is still inresidency training (valuesN orY);
gender, the gender of the doctor (valuesF orM); and
revenue, dollars per hour earned by the doctor,recorded as an integer.

Our simple aim here is to assess whether the seniority, gender orincome of the doctor is linked with the rate of complaints against thatdoctor.

We can use the following code to extract the data we want withouthaving to load the whole package:

esdcomp<- faraway::esdcomp

3.4 Fitting BayesianPoisson Regression Models

Again we can useINLA to fit this form of Bayesiangeneralised linear model.

If not loaded already, the package must be loaded into R:

As noted above we need to include an offset in this analysis; sincefor a Poisson GLM we will be using a log() link function by default, wemust compute the log of the number of visits and include that in thedata setesdcomp:

esdcomp$log.visits<-log(esdcomp$visits)

The offset term in the model is then written

offset(log.visits)

3.5 Exercises

Perform an exploratory assessment of the emergency roomcomplaints data set, particularly how the response variablecomplaints varies with the proposed explanatory variablesrelative to the number of visits. To do this, create another variablewhich is the ratio ofcomplaints tovisits.
Solution
```
# Compute the ratioesdcomp$ratio<- esdcomp$complaints/ esdcomp$visits# Plot the link with revenueplot(esdcomp$revenue,esdcomp$ratio)
```
plot of chunk unnamed-chunk-20
```
# Use boxplots against residency and genderboxplot(esdcomp$ratio~ esdcomp$residency)
```
plot of chunk unnamed-chunk-20
```
boxplot(esdcomp$ratio~ esdcomp$gender)
```
plot of chunk unnamed-chunk-20

Fit the Bayesian model without MCMC usingINLA; notethat the summary output provides credible intervals for each parameterto help us make inference.

Solution

# Fit model - note similarity with bayesx syntaxinla.output<-inla(formula = complaints~offset(log.visits)+ residency+ gender+ revenue,data = esdcomp,family ="poisson")# Summarise outputsummary(inla.output)

## Time used:##     Pre = 0.129, Running = 0.145, Post = 0.061, Total = 0.335 ## Fixed effects:##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept) -7.171 0.688     -8.520   -7.171     -5.822 -7.171   0## residencyY  -0.354 0.191     -0.728   -0.354      0.021 -0.354   0## genderM      0.139 0.214     -0.281    0.139      0.559  0.139   0## revenue      0.002 0.003     -0.003    0.002      0.008  0.002   0## ## Marginal log-Likelihood:  -111.90 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Fit a non-Bayesian model usingglm() for comparison.How do the model fits compare?

Solution

# Fit model - note similarity with bayesx syntaxesdcomp$log.visits<-log(esdcomp$visits)glm.output<-glm(formula = complaints~offset(log.visits)+ residency+ gender+ revenue,data = esdcomp,family ="poisson")# Summarise outputsummary(glm.output)

## ## Call:## glm(formula = complaints ~ offset(log.visits) + residency + gender + ##     revenue, family = "poisson", data = esdcomp)## ## Coefficients:##              Estimate Std. Error z value Pr(>|z|)    ## (Intercept) -7.157087   0.688148 -10.401   <2e-16 ***## residencyY  -0.350610   0.191077  -1.835   0.0665 .  ## genderM      0.128995   0.214323   0.602   0.5473    ## revenue      0.002362   0.002798   0.844   0.3986    ## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## (Dispersion parameter for poisson family taken to be 1)## ##     Null deviance: 63.435  on 43  degrees of freedom## Residual deviance: 58.698  on 40  degrees of freedom## AIC: 189.48## ## Number of Fisher Scoring iterations: 5

# Perform ANOVA on each variable in turndrop1(glm.output,test ="Chisq")

## Single term deletions## ## Model:## complaints ~ offset(log.visits) + residency + gender + revenue##           Df Deviance    AIC    LRT Pr(>Chi)  ## <none>         58.698 189.48                  ## residency  1   62.128 190.91 3.4303  0.06401 .## gender     1   59.067 187.85 0.3689  0.54361  ## revenue    1   59.407 188.19 0.7093  0.39969  ## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6 MultilevelModelling

Multilevel models are a particular type of mixed-effects models inwhich observations are nested within groups, so that group effects aremodelled using random effects. A typical example is that of studentsnested within classes.

For the next example, thenlschools data set (in packageMASS) will be used. This data set records data aboutstudents’ performance (in particular, about a language score test) andother variables. The variables in this data set are:

lang, language score test.
IQ, verbal IQ.
class, class ID.
GS, class size as number of eighth-grade pupilsrecorded in the class.
SES, social-economic status of pupil’sfamily.
COMB, whether the pupils are taught in themulti-grade class with 7th-grade students.

The data set can be loaded and summarised as follows:

library(MASS)data("nlschools")summary(nlschools)

##       lang             IQ            class            GS             SES        COMB    ##  Min.   : 9.00   Min.   : 4.00   15580  :  33   Min.   :10.00   Min.   :10.00   0:1658  ##  1st Qu.:35.00   1st Qu.:10.50   5480   :  31   1st Qu.:23.00   1st Qu.:20.00   1: 629  ##  Median :42.00   Median :12.00   15980  :  31   Median :27.00   Median :27.00           ##  Mean   :40.93   Mean   :11.83   16180  :  31   Mean   :26.51   Mean   :27.81           ##  3rd Qu.:48.00   3rd Qu.:13.00   18380  :  31   3rd Qu.:31.00   3rd Qu.:35.00           ##  Max.   :58.00   Max.   :18.00   5580   :  30   Max.   :39.00   Max.   :50.00           ##                                  (Other):2100

The model to fit will takelang as the response variableand includeIQ,GS,SES andCOMB as covariates (i.e., fixed effects). This model caneasily be fit withINLA as follows:

library(INLA)m1<-inla(lang~ IQ+ GS+  SES+ COMB,data = nlschools)summary(m1)

## Time used:##     Pre = 0.13, Running = 0.163, Post = 0.0111, Total = 0.305 ## Fixed effects:##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept)  9.685 1.070      7.586    9.685     11.784  9.685   0## IQ           2.391 0.074      2.246    2.391      2.536  2.391   0## GS          -0.026 0.025     -0.076   -0.026      0.024 -0.026   0## SES          0.148 0.014      0.120    0.148      0.175  0.148   0## COMB1       -1.684 0.325     -2.322   -1.684     -1.047 -1.684   0## ## Model hyperparameters:##                                          mean    sd 0.025quant 0.5quant 0.975quant  mode## Precision for the Gaussian observations 0.021 0.001       0.02    0.021      0.022 0.021## ## Marginal log-Likelihood:  -7713.44 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Note that the previous model only includes fixed effects. The dataset includesclass as the class ID to which each studentbelongs. Class effects can have an impact on the performance of thestudents, with students in the same class performing similarly in thelanguage test.

Very conveniently,INLA can include random effects inthe model by adding a term in the right hand side of the formula thatdefined the model. Specifically, the term to add isf(class, model = "iid") (see code below for the fullmodel). This will create a random effect indexed over variableclass and which is of typeiid, i.e., therandom effects are independent and identically distributed using anormal distribution with zero mean and precision$\tau$.

Before fitting the model, the between-class variability can beexplored by means of boxplots:

boxplot(lang~ class,data = nlschools,las =2)

plot of chunk unnamed-chunk-25

The code to fit the model with random effects is:

m2<-inla(  lang~ IQ+ GS+  SES+ COMB+f(class,model ="iid"),data = nlschools)summary(m2)

## Time used:##     Pre = 0.814, Running = 0.19, Post = 0.0219, Total = 1.03 ## Fixed effects:##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept) 10.626 1.495      7.698   10.624     13.565 10.624   0## IQ           2.248 0.072      2.108    2.248      2.388  2.248   0## GS          -0.016 0.047     -0.109   -0.016      0.078 -0.016   0## SES          0.165 0.015      0.136    0.165      0.194  0.165   0## COMB1       -2.017 0.598     -3.193   -2.016     -0.847 -2.016   0## ## Random effects:##   Name     Model##     class IID model## ## Model hyperparameters:##                                          mean    sd 0.025quant 0.5quant 0.975quant  mode## Precision for the Gaussian observations 0.025 0.001      0.024    0.025      0.027 0.025## Precision for class                     0.122 0.020      0.087    0.121      0.167 0.118## ## Marginal log-Likelihood:  -7613.01 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

8 Poisson regression

In this practical we will use the North Carolina Sudden Infant DeathSyndrome (SIDS) data set. It is available in thespDatapackage and it can be loaded using:

library(spData)data(nc.sids)summary(nc.sids)

##     CNTY.ID         BIR74           SID74          NWBIR74           BIR79           SID79      ##  Min.   :1825   Min.   :  248   Min.   : 0.00   Min.   :   1.0   Min.   :  319   Min.   : 0.00  ##  1st Qu.:1902   1st Qu.: 1077   1st Qu.: 2.00   1st Qu.: 190.0   1st Qu.: 1336   1st Qu.: 2.00  ##  Median :1982   Median : 2180   Median : 4.00   Median : 697.5   Median : 2636   Median : 5.00  ##  Mean   :1986   Mean   : 3300   Mean   : 6.67   Mean   :1051.0   Mean   : 4224   Mean   : 8.36  ##  3rd Qu.:2067   3rd Qu.: 3936   3rd Qu.: 8.25   3rd Qu.:1168.5   3rd Qu.: 4889   3rd Qu.:10.25  ##  Max.   :2241   Max.   :21588   Max.   :44.00   Max.   :8027.0   Max.   :30757   Max.   :57.00  ##     NWBIR79             east           north             x                 y       ##  Min.   :    3.0   Min.   : 19.0   Min.   :  6.0   Min.   :-328.04   Min.   :3757  ##  1st Qu.:  250.5   1st Qu.:178.8   1st Qu.: 97.0   1st Qu.: -60.55   1st Qu.:3920  ##  Median :  874.5   Median :285.0   Median :125.5   Median : 114.38   Median :3963  ##  Mean   : 1352.8   Mean   :271.3   Mean   :122.1   Mean   :  91.46   Mean   :3953  ##  3rd Qu.: 1406.8   3rd Qu.:361.2   3rd Qu.:151.5   3rd Qu.: 240.03   3rd Qu.:4000  ##  Max.   :11631.0   Max.   :482.0   Max.   :182.0   Max.   : 439.65   Max.   :4060  ##       lon              lat             L.id           M.id     ##  Min.   :-84.08   Min.   :33.92   Min.   :1.00   Min.   :1.00  ##  1st Qu.:-81.20   1st Qu.:35.26   1st Qu.:1.00   1st Qu.:2.00  ##  Median :-79.26   Median :35.68   Median :2.00   Median :3.00  ##  Mean   :-79.51   Mean   :35.62   Mean   :2.12   Mean   :2.67  ##  3rd Qu.:-77.87   3rd Qu.:36.05   3rd Qu.:3.00   3rd Qu.:3.25  ##  Max.   :-75.67   Max.   :36.52   Max.   :4.00   Max.   :4.00

A full description of the data set is provided in the associatedmanual page (check with?nc.sids) but in this practical wewill only consider these variables:

BIR74, number of births (1974-78).
SID74, number of SID deaths (1974-78).
NWBIR74, number of non-white births(1974-78).

These variables are measured at the county level in North Carolina,of which there are 100.

BecauseSID74 records the number of SID deaths, themodel is Poisson:

\[O_i \mid \mu_i \sim Po(\mu_i),\ i=1,\ldots, 100\] Here,$O_i$ represents thenumber of cases in county$i$ and$\mu_i$ the mean. In addition, mean$\mu_i$ will be written as$\mu_i = E_i \theta_i$, where$E_i$ is theexpected number ofcases and$\theta_i$ the relativerisk in county$i$.

The relative risk$\theta_i$ isoften modelled, on the log-scale, to be equal to a linear predictor:

\[\log(\theta_i) = \beta_0 + \ldots\]

The expected number of cases is computed by multiplying the number ofbirths in county$i$ to the overallmortality rate

\[r = \frac{\sum_{i=1}^{100}O_i}{\sum_{i=1}^{100}B_i}\] where$B_i$ represents thetotal number of births in country$i$. Hence, the expected number of cases incounty$i$ is$E_i = r B_i$.

# Overall mortality rater74<-sum(nc.sids$SID74)/sum(nc.sids$BIR74)# Expected casesnc.sids$EXP74<- r74* nc.sids$BIR74

A common measure of relative risk is thestandardised mortalityratio ($O_i / E_i$):

nc.sids$SMR74<- nc.sids$SID74/ nc.sids$EXP74

A summary of the SMR can be obtained:

hist(nc.sids$SMR,xlab ="SMR")

plot of chunk unnamed-chunk-30

Values above 1 indicate that the county has more observed deaths thanexpected and that there might be an increased risk in the area.

As a covariate, we will compute the proportion of non-whitebirths:

nc.sids$NWPROP74<- nc.sids$NWBIR74/ nc.sids$BIR74

There is a clear relationship between the SMR and the proportion ofnon-white births in a county:

plot(nc.sids$NWPROP74, nc.sids$SMR74)

plot of chunk unnamed-chunk-32

# Correlationcor(nc.sids$NWPROP74, nc.sids$SMR74)

## [1] 0.5793901

A simple Poisson regression can be fit by using the followingcode:

m1nc<-inla(  SID74~1+ NWPROP74,family ="poisson",E = nc.sids$EXP74,data = nc.sids)summary(m1nc)

## Time used:##     Pre = 0.314, Running = 0.148, Post = 0.00488, Total = 0.467 ## Fixed effects:##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept) -0.647 0.090     -0.824   -0.647     -0.471 -0.647   0## NWPROP74     1.867 0.217      1.441    1.867      2.293  1.867   0## ## Marginal log-Likelihood:  -226.13 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Random effects can also be included to account for intrinsicdifferences between the counties:

# Index for random effectsnc.sids$ID<-seq_len(nrow(nc.sids))# Model WITH covariatem2nc<-inla(  SID74~1+ NWPROP74+f(ID,model ="iid"),family ="poisson",E = nc.sids$EXP74,data =as.data.frame(nc.sids))summary(m2nc)

## Time used:##     Pre = 0.138, Running = 0.157, Post = 0.00974, Total = 0.305 ## Fixed effects:##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept) -0.650 0.104     -0.856   -0.649     -0.446 -0.649   0## NWPROP74     1.883 0.254      1.386    1.882      2.385  1.882   0## ## Random effects:##   Name     Model##     ID IID model## ## Model hyperparameters:##                   mean     sd 0.025quant 0.5quant 0.975quant  mode## Precision for ID 89.43 142.78       9.45    28.92     273.93 15.69## ## Marginal log-Likelihood:  -227.82 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

The role of the covariate can be explored by fitting a model withoutit:

# Model WITHOUT covariatem3nc<-inla(  SID74~1+f(ID,model ="iid"),family ="poisson",E = nc.sids$EXP74,data =as.data.frame(nc.sids))summary(m3nc)

## Time used:##     Pre = 0.145, Running = 0.166, Post = 0.00903, Total = 0.32 ## Fixed effects:##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld## (Intercept) -0.029 0.063     -0.156   -0.028      0.091 -0.028   0## ## Random effects:##   Name     Model##     ID IID model## ## Model hyperparameters:##                  mean   sd 0.025quant 0.5quant 0.975quant mode## Precision for ID 7.26 2.57       3.79     6.77      13.55 6.01## ## Marginal log-Likelihood:  -245.52 ##  is computed ## Posterior summaries for the linear predictor and the fitted values are computed## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Now, notice the decrease in the estimate of the precision of therandom effects (i.e., the variance increases). This means that values ofthe random effects are now larger than in the previous case as therandom effects pick some of the effect explained by the covariate.

oldpar<-par(mfrow =c(1,2))boxplot(m2nc$summary.random$ID$mean,ylim =c(-1,1),main ="With NWPROP74")boxplot(m3nc$summary.random$ID$mean,ylim =c(-1,1),main ="Without NWPROP74")par(oldpar)

plot of chunk unnamed-chunk-36

Movatterモバイル変換

Practical 8: Optional Extra and AdvancedMaterial

VIBASS

1 Introduction

2 Example: Simple LinearRegression

2.1 Exercises

2.1.1 Dataexploration

2.1.2 Running the GibbsSampler

2.1.3 Reducing theautocorrelation by mean-centering the covariate

3 INLA

3.1 Example: FakeNews

3.2 Exercises

3.3 Example: EmergencyRoom Complaints

3.4 Fitting BayesianPoisson Regression Models

3.5 Exercises

4 Bayesian HierarchicalModelling

5 Linear MixedModels

6 MultilevelModelling

7 Generalised LinearMixed Models

8 Poisson regression