3 MultilevelModelling

Multilevel models are a particular type of mixed-effects models inwhich observations are nested within groups, so that group effects aremodelled using random effects. A typical example is that of studentsnested within classes.

For the next example, thenlschools data set (in packageMASS) will be used. This data set records data aboutstudents’ performance (in particular, about a language score test) andother variables. The variables in this data set are:

lang, language score test.
IQ, verbal IQ.
class, class ID.
GS, class size as number of eighth-grade pupilsrecorded in the class.
SES, social-economic status of pupil’sfamily.
COMB, whether the pupils are taught in themulti-grade class with 7th-grade students.

The data set can be loaded and summarised as follows:

library("MASS")data("nlschools")summary(nlschools)

##       lang             IQ            class            GS             SES       ##  Min.   : 9.00   Min.   : 4.00   15580  :  33   Min.   :10.00   Min.   :10.00  ##  1st Qu.:35.00   1st Qu.:10.50   5480   :  31   1st Qu.:23.00   1st Qu.:20.00  ##  Median :42.00   Median :12.00   15980  :  31   Median :27.00   Median :27.00  ##  Mean   :40.93   Mean   :11.83   16180  :  31   Mean   :26.51   Mean   :27.81  ##  3rd Qu.:48.00   3rd Qu.:13.00   18380  :  31   3rd Qu.:31.00   3rd Qu.:35.00  ##  Max.   :58.00   Max.   :18.00   5580   :  30   Max.   :39.00   Max.   :50.00  ##                                  (Other):2100                                  ##  COMB    ##  0:1658  ##  1: 629  ##          ##          ##          ##          ##

The model to fit will takelang as the response variableand includeIQ,GS,SES andCOMB as covariates (i.e., fixed effects). This model caneasily be fit withBayesX (using theR2BayesXpackage) as follows:

library("R2BayesX")m1<-bayesx(lang~ IQ+ GS+  SES+ COMB,data = nlschools)summary(m1)

## Call:## bayesx(formula = lang ~ IQ + GS + SES + COMB, data = nlschools)##  ## Fixed effects estimation results:## ## Parametric coefficients:##                Mean      Sd    2.5%     50%   97.5%## (Intercept)  9.5928  1.0323  7.5866  9.5857 11.6194## IQ           2.3940  0.0704  2.2528  2.3916  2.5362## GS          -0.0249  0.0251 -0.0752 -0.0242  0.0228## SES          0.1483  0.0142  0.1200  0.1485  0.1764## COMB1       -1.6625  0.3222 -2.2983 -1.6689 -0.9859##  ## Scale estimate:##           Mean      Sd    2.5%     50%  97.5%## Sigma2 48.0800  1.4591 45.3017 48.0974 50.846##  ## N = 2287  burnin = 2000  method = MCMC  family = gaussian  ## iterations = 12000  step = 10

Note that the previous model only includes fixed effects. The dataset includesclass as the class ID to which each studentbelongs. Class effects can have an impact on the performance of thestudents, with students in the same class performing similarly in thelanguage test.

Very conveniently,Bayesx can include random effects inthe model by adding a term in the right hand side of the formula thatdefined the model. Specifically, the term to add issx(class, bs = "re") (see code below for the full model).This will create a random effect indexed over variableclass and which is of typere, i.e., itprovides random effects which are independent and identicallydistributed using a normal distribution with zero mean and precision\(\tau\) - there are clear similaritieswith a residual terms here (although that is at the observation level,rather than the group level).

Before fitting the model, the between-class variability can beexplored by means of boxplots:

boxplot(lang~ class,data = nlschools,las =2)

The code to fit the model with random effects is:

m2<-bayesx(  lang~ IQ+ GS+  SES+ COMB+sx(class,bs ="re"),data = nlschools)summary(m2)

## Call:## bayesx(formula = lang ~ IQ + GS + SES + COMB + sx(class, bs = "re"), ##     data = nlschools)##  ## Fixed effects estimation results:## ## Parametric coefficients:##                Mean      Sd    2.5%     50%   97.5%## (Intercept) 10.6189  1.5007  7.6998 10.5900 13.5708## IQ           2.2465  0.0738  2.0964  2.2459  2.3914## GS          -0.0145  0.0480 -0.1074 -0.0149  0.0827## SES          0.1648  0.0155  0.1347  0.1659  0.1965## COMB1       -2.0122  0.5975 -3.2192 -1.9981 -0.8280##  ## Random effects variances:##                 Mean      Sd    2.5%     50%   97.5%     Min    Max## sx(class):re  8.5587  1.4424  6.0735  8.4390 11.7344  5.0483 13.982##  ## Scale estimate:##           Mean      Sd    2.5%     50% 97.5%## Sigma2 40.0581  1.2545 37.6186 40.0957 42.51##  ## N = 2287  burnin = 2000  method = MCMC  family = gaussian  ## iterations = 12000  step = 10

You can compare this Bayesian fit with a non-Bayesian analysis, thistime making use of thelmer() function in the packagelme4 (here,lme meanslinear mixedeffects). We present the code only for comparison, and we do notdescribelmer() orlme4 more fully.

library(lme4)

## Loading required package: Matrix

## ## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':## ##     expand, pack, unpack

## ## Attaching package: 'lme4'

## The following object is masked from 'package:nlme':## ##     lmList

m2lmer<-lmer(  lang~ IQ+ GS+  SES+ COMB+ (1| class),# Fit a separate intercept for each level of classdata = nlschools)summary(m2lmer)

## Linear mixed model fit by REML ['lmerMod']## Formula: lang ~ IQ + GS + SES + COMB + (1 | class)##    Data: nlschools## ## REML criterion at convergence: 15132.9## ## Scaled residuals: ##     Min      1Q  Median      3Q     Max ## -3.9588 -0.6488  0.0486  0.7204  3.0765 ## ## Random effects:##  Groups   Name        Variance Std.Dev.##  class    (Intercept)  8.507   2.917   ##  Residual             39.998   6.324   ## Number of obs: 2287, groups:  class, 133## ## Fixed effects:##             Estimate Std. Error t value## (Intercept) 10.63524    1.50012   7.090## IQ           2.24696    0.07133  31.502## GS          -0.01562    0.04770  -0.327## SES          0.16538    0.01479  11.185## COMB1       -2.02161    0.60055  -3.366## ## Correlation of Fixed Effects:##       (Intr) IQ     GS     SES   ## IQ    -0.484                     ## GS    -0.798 -0.002              ## SES   -0.062 -0.292 -0.057       ## COMB1 -0.168  0.033 -0.006  0.017

3.1 Exercises

Are the model fits frombayesx() and fromlmer() similar?
What are the relative sizes of the variances estimated in therandom effect term forclass versus the residual variance?Does this make sense looking at the boxplot graph above?

5 Poisson regression

In this practical we will use the North Carolina Sudden Infant DeathSyndrome (SIDS) data set. It is available in thespDatapackage and it can be loaded using:

library(spData)data(nc.sids)summary(nc.sids)

##     CNTY.ID         BIR74           SID74          NWBIR74      ##  Min.   :1825   Min.   :  248   Min.   : 0.00   Min.   :   1.0  ##  1st Qu.:1902   1st Qu.: 1077   1st Qu.: 2.00   1st Qu.: 190.0  ##  Median :1982   Median : 2180   Median : 4.00   Median : 697.5  ##  Mean   :1986   Mean   : 3300   Mean   : 6.67   Mean   :1051.0  ##  3rd Qu.:2067   3rd Qu.: 3936   3rd Qu.: 8.25   3rd Qu.:1168.5  ##  Max.   :2241   Max.   :21588   Max.   :44.00   Max.   :8027.0  ##      BIR79           SID79          NWBIR79             east      ##  Min.   :  319   Min.   : 0.00   Min.   :    3.0   Min.   : 19.0  ##  1st Qu.: 1336   1st Qu.: 2.00   1st Qu.:  250.5   1st Qu.:178.8  ##  Median : 2636   Median : 5.00   Median :  874.5   Median :285.0  ##  Mean   : 4224   Mean   : 8.36   Mean   : 1352.8   Mean   :271.3  ##  3rd Qu.: 4889   3rd Qu.:10.25   3rd Qu.: 1406.8   3rd Qu.:361.2  ##  Max.   :30757   Max.   :57.00   Max.   :11631.0   Max.   :482.0  ##      north             x                 y             lon        ##  Min.   :  6.0   Min.   :-328.04   Min.   :3757   Min.   :-84.08  ##  1st Qu.: 97.0   1st Qu.: -60.55   1st Qu.:3920   1st Qu.:-81.20  ##  Median :125.5   Median : 114.38   Median :3963   Median :-79.26  ##  Mean   :122.1   Mean   :  91.46   Mean   :3953   Mean   :-79.51  ##  3rd Qu.:151.5   3rd Qu.: 240.03   3rd Qu.:4000   3rd Qu.:-77.87  ##  Max.   :182.0   Max.   : 439.65   Max.   :4060   Max.   :-75.67  ##       lat             L.id           M.id     ##  Min.   :33.92   Min.   :1.00   Min.   :1.00  ##  1st Qu.:35.26   1st Qu.:1.00   1st Qu.:2.00  ##  Median :35.68   Median :2.00   Median :3.00  ##  Mean   :35.62   Mean   :2.12   Mean   :2.67  ##  3rd Qu.:36.05   3rd Qu.:3.00   3rd Qu.:3.25  ##  Max.   :36.52   Max.   :4.00   Max.   :4.00

A full description of the data set is provided in the associatedmanual page (check with?nc.sids) but in this practical wewill only consider these variables:

BIR74, number of births (1974-78).
SID74, number of SID deaths (1974-78).
NWBIR74, number of non-white births(1974-78).

These variables are measured at the county level in North Carolina,of which there are 100.

BecauseSID74 records the number of SID deaths, themodel is Poisson:

\[O_i \mid \mu_i \sim Po(\mu_i),\ i=1,\ldots, 100\] Here,\(O_i\) represents thenumber of cases in county\(i\) and\(\mu_i\) the mean. In addition, mean\(\mu_i\) will be written as\(\mu_i = E_i \theta_i\), where\(E_i\) is theexpected number ofcases and\(\theta_i\) the relativerisk in county\(i\).

The relative risk\(\theta_i\) isoften modelled, on the log-scale, to be equal to a linear predictor:

\[\log(\theta_i) = \beta_0 + \ldots\]

The expected number of cases is computed by multiplying the number ofbirths in county\(i\) to the overallmortality rate

\[r = \frac{\sum_{i=1}^{100}O_i}{\sum_{i=1}^{100}B_i}\] where\(B_i\) represents thetotal number of births in country\(i\). Hence, the expected number of cases incounty\(i\) is\(E_i = r B_i\).

# Overall mortality rater74<-sum(nc.sids$SID74)/sum(nc.sids$BIR74)# Expected casesnc.sids$EXP74<- r74* nc.sids$BIR74

A common measure of relative risk is thestandardised mortalityratio (\(O_i / E_i\)):

nc.sids$SMR74<- nc.sids$SID74/ nc.sids$EXP74

A summary of the SMR can be obtained:

hist(nc.sids$SMR,xlab ="SMR")

Values above 1 indicate that the county has more observed deaths thanexpected and that there might be an increased risk in the area.

As a covariate, we will compute the proportion of non-whitebirths:

nc.sids$NWPROP74<- nc.sids$NWBIR74/ nc.sids$BIR74

There is a clear relationship between the SMR and the proportion ofnon-white births in a county:

plot(nc.sids$NWPROP74, nc.sids$SMR74)

# Correlationcor(nc.sids$NWPROP74, nc.sids$SMR74)

## [1] 0.5793901

A simple Poisson regression can be fit by using the followingcode:

m1nc<-bayesx(  SID74~1+ NWPROP74,family ="poisson",offset =log(nc.sids$EXP74),data = nc.sids)summary(m1nc)

## Call:## bayesx(formula = SID74 ~ 1 + NWPROP74, data = nc.sids, offset = log(nc.sids$EXP74), ##     family = "poisson")##  ## Fixed effects estimation results:## ## Parametric coefficients:##                Mean      Sd    2.5%     50%   97.5%## (Intercept) -0.6483  0.0897 -0.8319 -0.6488 -0.4787## NWPROP74     1.8689  0.2154  1.4681  1.8771  2.2846##  ## N = 100  burnin = 2000  method = MCMC  family = poisson  ## iterations = 12000  step = 10

Random effects can also be included to account for intrinsicdifferences between the counties:

# Index for random effectsnc.sids$ID<-seq_len(nrow(nc.sids))# Model WITH covariatem2nc<-bayesx(  SID74~1+ NWPROP74+sx(ID,bs ="re"),family ="poisson",offset =log(nc.sids$EXP74),data =as.data.frame(nc.sids))summary(m2nc)

## Call:## bayesx(formula = SID74 ~ 1 + NWPROP74 + sx(ID, bs = "re"), data = as.data.frame(nc.sids), ##     offset = log(nc.sids$EXP74), family = "poisson")##  ## Fixed effects estimation results:## ## Parametric coefficients:##                Mean      Sd    2.5%     50%   97.5%## (Intercept) -0.6551  0.1081 -0.8653 -0.6492 -0.4414## NWPROP74     1.8929  0.2688  1.3505  1.8988  2.3863##  ## Random effects variances:##             Mean     Sd   2.5%    50%  97.5%    Min    Max## sx(ID):re 0.0625 0.0307 0.0139 0.0593 0.1326 0.0040 0.1928##  ## N = 100  burnin = 2000  method = MCMC  family = poisson  ## iterations = 12000  step = 10

We can see the fitted relationship withNWPROP74:

x.predict<-seq(0,1,length=1000)y.predict<-exp(coef(m2nc)["(Intercept)","Mean"]+coef(m2nc)["NWPROP74","Mean"]*x.predict)oldpar<-par(mfrow =c(1,1))plot(nc.sids$NWPROP74, nc.sids$SMR74)lines(x.predict,y.predict)

par(oldpar)

The role of the covariate can be explored by fitting a model withoutit:

# Model WITHOUT covariatem3nc<-bayesx(  SID74~1+sx(ID,bs ="re"),family ="poisson",offset =log(nc.sids$EXP74),data =as.data.frame(nc.sids))summary(m3nc)

## Call:## bayesx(formula = SID74 ~ 1 + sx(ID, bs = "re"), data = as.data.frame(nc.sids), ##     offset = log(nc.sids$EXP74), family = "poisson")##  ## Fixed effects estimation results:## ## Parametric coefficients:##                Mean      Sd    2.5%     50%  97.5%## (Intercept) -0.0331  0.0641 -0.1631 -0.0321 0.0879##  ## Random effects variances:##             Mean     Sd   2.5%    50%  97.5%    Min    Max## sx(ID):re 0.1711 0.0527 0.0898 0.1649 0.3008 0.0530 0.4353##  ## N = 100  burnin = 2000  method = MCMC  family = poisson  ## iterations = 12000  step = 10

Now, notice the decrease in the estimate of the precision of therandom effects (i.e., the variance increases). This means that values ofthe random effects are now larger than in the previous case as therandom effects pick some of the effect explained by the covariate.

oldpar<-par(mfrow =c(1,2))boxplot(m2nc$effects$`sx(ID):re`$Mean,ylim =c(-1,1),main ="With NWPROP74")boxplot(m3nc$effects$`sx(ID):re`$Mean,ylim =c(-1,1),main ="Without NWPROP74")

par(oldpar)

Movatterモバイル変換

Practical 7: Bayesian HierarchicalModelling

VIBASS

1 Bayesian HierarchicalModelling

2 Linear MixedModels

3 MultilevelModelling

3.1 Exercises

4 Generalised LinearMixed Models

5 Poisson regression

6 Further Extensions

7 Final Exercises(Optional!!)