First, load in the necessary packages. In addition, if you arerunning the model locally on a multi-core machine, you can setoptions(mc.cores = 4). This ensures the different MCMCchains will be run in parallel, making estimation faster.
In this application, we will work with the bonapersona data(Bonapersona et al., 2019). The data and codebook can be foundhere. First,we read in the data and make sure all relevant variables are of thecorrect type.
#> name type n unique mean sd v#> 1 exp integer 734 411 222.644 111.88 NA#> 2 id integer 734 212 99.402 60.03 NA#> 3 author character 734 209 NA NA 0.990#> 4 year integer 734 21 2011.014 4.45 NA#> 5 journal character 734 69 NA NA 0.945#> 6 species factor 734 3 NA NA 0.433#> 7 strainGrouped factor 734 15 NA NA 0.762#> 8 origin factor 734 4 NA NA 0.617#> 9 sex factor 734 3 NA NA 0.373#> 10 ageWeek numeric 721 25 13.333 5.75 NA#> 11 model factor 734 6 NA NA 0.667#> 12 mTimeStart integer 734 12 2.357 2.51 NA#> 13 mTimeEnd integer 734 18 12.802 4.39 NA#> 14 mHoursAve numeric 734 33 54.076 47.73 NA#> 15 mCageGrouped factor 734 4 NA NA 0.485#> 16 mLDGrouped factor 734 5 NA NA 0.469#> 17 mRepetition factor 734 7 NA NA 0.446#> 18 mControlGrouped factor 734 4 NA NA 0.069#> 19 hit2Grouped factor 734 3 NA NA 0.493#> 20 testAuthorGrouped factor 734 27 NA NA 0.894#> 21 testLDGrouped factor 734 4 NA NA 0.295#> 22 varAuthorGrouped factor 734 47 NA NA 0.914#> 23 waterT factor 734 9 NA NA 0.125#> 24 waterTcate factor 734 5 NA NA 0.123#> 25 freezingType factor 734 10 NA NA 0.221#> 26 retentionGrouped factor 734 5 NA NA 0.520#> 27 directionGrouped factor 734 5 NA NA 0.648#> 28 effectSizeCorrection integer 5 2 1.000 0.00 NA#> 29 cut_nC integer 37 2 1.000 0.00 NA#> 30 n_notRetriev integer 5 2 1.000 0.00 NA#> 31 nC numeric 734 25 10.719 4.07 NA#> 32 meanC numeric 734 412 167.110 112.13 NA#> 33 sdC numeric 734 499 73.919 62.43 NA#> 34 seC numeric 734 209 23.314 19.61 NA#> 35 nE numeric 734 25 11.200 4.15 NA#> 36 meanE numeric 734 430 163.209 114.28 NA#> 37 sdE numeric 734 500 76.684 62.99 NA#> 38 seE numeric 734 206 23.554 19.54 NA#> 39 dataFrom factor 734 4 NA NA 0.288#> 40 seqGeneration factor 734 3 NA NA 0.487#> 41 baseline factor 734 4 NA NA 0.412#> 42 allocation factor 734 3 NA NA 0.170#> 43 housing factor 734 2 NA NA 0.000#> 44 blindExp factor 734 4 NA NA 0.429#> 45 control factor 734 4 NA NA 0.043#> 46 outAss factor 734 4 NA NA 0.456#> 47 outBlind factor 734 5 NA NA 0.654#> 48 incData factor 734 3 NA NA 0.219#> 49 waterTNum numeric 48 8 22.292 1.37 NA#> 50 each integer 734 734 367.500 212.03 NA#> 51 mTimeLength integer 734 19 10.446 5.61 NA#> 52 speciesStrainGrouped factor 734 16 NA NA 0.762#> 53 blindRand factor 734 3 NA NA 0.317#> 54 bias numeric 734 9 2.884 0.75 NA#> 55 tV factor 734 60 NA NA 0.929#> 56 anxiety integer 734 2 0.486 0.50 NA#> 57 sLearning numeric 734 2 0.277 0.45 NA#> 58 nsLearning numeric 734 2 0.143 0.35 NA#> 59 social integer 734 2 0.063 0.24 NA#> 60 multiply integer 734 2 -0.272 0.96 NA#> 61 noMeta numeric 734 2 0.031 0.17 NA#> 62 domain factor 734 6 NA NA 0.662#> 63 directionQual numeric 617 4 0.183 0.58 NA#> 64 yi numeric 734 726 0.243 1.34 NA#> 65 vi numeric 734 728 0.263 0.36 NAFor this application, we use a smaller selection of moderators thanin Bonapersona et al. (2019).
First, for simplicity, we run a two-level model ignoring the factthat certain effect sizes come from the same study.
We start with running a penalized meta-analysis using the lassoprior. Compared to the horseshoe prior, the lasso is easier to usebecause it has only two hyperparameters to set. However, the lightertails of the lasso can result in large coefficients being shrunken toomuch towards zero thereby leading to potentially more bias compared tothe regularized horseshoe prior.
For the lasso prior, we need to specify the degrees of freedomdf and thescale. Both default to 1. Thedegrees of freedom determines the chi-square prior that is specified forthe inverse-tuning parameter. Increasing the degrees of freedom willallow larger values for the inverse-tuning parameter, leading to lessshrinkage. Increasing the scale parameter will also result in lessshrinkage. The influence of these hyperparameters can be visualizedthrough the implemented shiny app, which can be called viashiny_prior().
fit_lasso<-brma(yi~ .,data = dat2l,vi ="vi",method ="lasso",prior =c(df =1,scale =1),mute_stan =FALSE)We can request the results using thesummary function.Before we interpret the results, we need to ensure that the MCMC chainshave converged to the posterior distribution. Two helpful diagnosticsprovided in the summary are the number of effective posterior samplesn_eff and the potential scale reduction factorRhat.n_eff is an estimate of the number ofindependent samples from the posterior. Ideally, the ration_eff to total samples is as close to 1 as possible.Rhat compares the between- and within-chain estimates andis ideally close to 1 (indicating the chains have mixed well). Shouldany values forn_eff orRhat be far from theseideal values, you can try increasing the number of iterations throughtheiter argument. By default, thebrmafunction runs four MCMC chains with 2000 iterations each, half of whichis discarded as burn-in. As a result, a total of 4000 iterations isavailable on which posterior summaries are based. If this does not help,non-convergence might indicate a problem with the modelspecification.
#> mean sd 2.5% 97.5% n_eff Rhat#> Intercept -20.7265 14.4136 -50.9609 3.6187 1657 1#> mTimeLength -0.0032 0.0048 -0.0143 0.0049 2070 1#> year 0.0104 0.0072 -0.0017 0.0254 1660 1#> modelLG 0.0855 0.1438 -0.1759 0.4025 2283 1#> modelLNB 0.1430 0.1059 -0.0270 0.3721 1210 1#> modelM 0.0213 0.0496 -0.0695 0.1329 2208 1#> modelMD 0.0229 0.0773 -0.1253 0.1856 2320 1#> ageWeek -0.0073 0.0053 -0.0187 0.0015 1364 1#> strainGroupedC57Bl6 -0.0189 0.0662 -0.1587 0.1117 2193 1#> strainGroupedCD1 -0.1158 0.1813 -0.5305 0.2004 2074 1#> strainGroupedDBA -0.0253 0.1625 -0.3563 0.3059 1591 1#> strainGroupedlisterHooded -0.0540 0.3135 -0.7133 0.5434 2062 1#> strainGroupedlongEvans 0.0712 0.1205 -0.1421 0.3397 2253 1#> strainGroupedlongEvansHooded 0.1790 0.2181 -0.1823 0.6557 2302 1#> strainGroupedNMRI 0.1200 0.2423 -0.3336 0.6518 2225 1#> strainGroupedNS 0.0289 0.1497 -0.2674 0.3448 2842 1#> strainGroupedother -0.0227 0.1263 -0.2832 0.2302 2079 1#> strainGroupedspragueDawley -0.0041 0.0615 -0.1343 0.1176 1921 1#> strainGroupedswissWebster 0.5557 0.4900 -0.2266 1.6393 2230 1#> strainGroupedwistar 0.0236 0.0567 -0.0807 0.1512 2078 1#> strainGroupedwistarKyoto 0.0295 0.2698 -0.5350 0.5813 2392 1#> bias -0.0086 0.0290 -0.0716 0.0473 2087 1#> speciesrat 0.1191 0.0877 -0.0203 0.3179 1380 1#> domainsLearning 0.0017 0.0501 -0.1012 0.1072 1986 1#> domainnsLearning 0.1159 0.0815 -0.0215 0.2892 1553 1#> domainsocial 0.1702 0.1183 -0.0249 0.4232 1600 1#> domainnoMeta -0.3615 0.1843 -0.7299 -0.0184 1212 1#> sexM 0.1205 0.0689 -0.0017 0.2605 1602 1#> tau2 0.4291 0.0394 0.3524 0.5098 1112 1If we are satisfied with the convergence, we can continue looking atthe posterior summary statistics. Thesummary functionprovides the posterior mean estimate for the effect of each moderator.Since Bayesian penalization does not automatically shrink estimatesexactly to zero, some additional criterion is needed to determine whichmoderators should be selected in the model. Currently, this is doneusing the 95% credible intervals, with a moderator being selected ifzero is excluded in this interval. In thesummary this isdenoted by an asterisk for that moderator. In this model, the onlysignificant moderator is the dummy variable for sex. All othercoefficients are not significant after being shrunken towards zero bythe lasso prior.
Also note the summary statistics fortau2, the(unexplained) residual between-studies heterogeneity. The 95% credibleinterval for this coefficient excludes zero, indicating that there is anon-zero amount of unexplained heterogeneity. It is customary to expressthis heterogeneity in terms of\(I^2\),the percentage of variation across studies that is due to heterogeneityrather than chance (Higgins and Thompson, 2002; Higgins et al., 2003).The helper functionI2() computes the posteriordistribution of\(I^2\) based on theMCMC draws oftau2:
#> mean sd 2.5% 25% 50% 75% 97.5%#> I2 68 2 64 67 68 69 72Next, we look into the regularized horseshoe prior. The horseshoeprior has five hyperparameters that can be set. Three parameters aredegrees of freedom parameters which influence the tails of thedistributions in the prior. Generally, it is not needed to specifydifferent values for these hyperparameters. Here, we focus instead onthe global scale parameter and the scale of the slab.
# use the default settingsfit_hs1<-brma(yi~ .,data = dat2l,vi ="vi",method ="hs",prior =c(df =1,df_global =1,df_slab =4,scale_global =1,scale_slab =1,relevant_pars =NULL),mute_stan =FALSE)# reduce the global scalefit_hs2<-brma(yi~ .,data = dat2l,vi ="vi",method ="hs",prior =c(df =1,df_global =1,df_slab =4,scale_global =0.1,scale_slab =1,relevant_pars =NULL),mute_stan =FALSE)# increase the scale of the slabfit_hs3<-brma(yi~ .,data = dat2l,vi ="vi",method ="hs",prior =c(df =1,df_global =1,df_slab =4,scale_global =1,scale_slab =5,relevant_pars =NULL),mute_stan =FALSE)Note that the horseshoe prior results in some divergent transitions.This can be an indication of non-convergence. However, these divergencesarise often when using the horseshoe prior and as long as there are nottoo many of them, the results can still beused.
Next, we plot the posterior mean estimates for a selection ofmoderators for the different priors.
make_plotdat<-function(fit, prior){ plotdat<-data.frame(fit$coefficients) plotdat$par<-rownames(plotdat) plotdat$Prior<- priorreturn(plotdat)}df0<-make_plotdat(fit_lasso,prior ="lasso")df1<-make_plotdat(fit_hs1,prior ="hs default")df2<-make_plotdat(fit_hs2,prior ="hs reduced global scale")df3<-make_plotdat(fit_hs3,prior ="hs increased slab scale")df<-rbind.data.frame(df0, df1, df2, df3)df<- df[!df$par%in%c("Intercept","tau2"), ]pd<-0.5ggplot(df,aes(x=mean,y=par,group = Prior))+geom_errorbar(aes(xmin=X2.5.,xmax=X97.5.,colour = Prior),width=.1,position =position_dodge(width = pd))+geom_point(aes(colour = Prior),position =position_dodge(width = pd))+geom_vline(xintercept =0)+theme_bw()+xlab("Posterior mean")+ylab("")We can see that, in general, the different priors give quite similarresults in this application. A notable exception is the estimate for thedummy variabletestAuthorGrouped_stepDownAvoidance which ismuch smaller for the lasso compared to the horseshoe specification. Thisindicates that the lasso can shrink large coefficients more towards zerowhereas the horseshoe is better at keeping them large.
Finally, we can also take into account the fact that some effectsizes might come from the same study by fitting a three-level model asfollows:
It is possible to override the default standardization, whichstandardizes all variables (including dummies). To do so, first manuallystandardize any variables that must be standardized. Then, in the calltobrma(), provide a named list with elementslist(center = ..., scale = ...). For variables that arenot to be standardized, usecenter = 0, scale = 1. This retains their original scale.For variables that are to be standardized, use their originalcenter andscale to restore the coefficientsto their original scale.
In the example below, we standardize a continuous predictor, but wedo not standardize the dummies:
moderators<-model.matrix(yi~ageWeek+ strainGrouped,data = datsel)[,-1]scale_age<-scale(moderators[,1])stdz<-list(center =c(attr(scale_age,"scaled:center"),rep(0,length(levels(datsel$strainGrouped))-1)),scale =c(attr(scale_age,"scaled:scale"),rep(1,length(levels(datsel$strainGrouped))-1)))moderators<-data.frame(datsel[c("yi","vi","author")], moderators)fit_std<-brma(yi~ .,data = moderators,vi ="vi",study ="author",method ="lasso",prior =c(df =1,scale =1),standardize = stdz,mute_stan =FALSE)Note that, in this example, onlyageWeek isstandardized; the remaining (dummy) variables are untouched. The objectstdz provides the original center and scale forageWeek, which allowsbrma() to properlyrescale the coefficient for this moderator. As the center and scale forall remaining (dummy) variables are 0 and 1, these coefficients arenot rescaled. See thepema paper for adiscussion of standardization and further references.