Movatterモバイル変換

This vignette illustrates the use ofmitml for thetreatment of missing data at Level 2. Specifically, the vignetteaddresses the following topics:

Example data

For purposes of this vignette, we make use of theleadership data set, which contains simulated data from 750employees in 50 groups including ratings on job satisfaction, leadershipstyle, and work load (Level 1) as well as cohesion (Level 2).

The package and the data set can be loaded as follows.

library(mitml)data(leadership)

In thesummary of the data, it becomes visible that allvariables are affected by missing data.

summary(leadership)

#      GRPID          JOBSAT             COHES            NEGLEAD          WLOAD    #  Min.   : 1.0   Min.   :-7.32934   Min.   :-3.4072   Min.   :-3.13213   low :416  #  1st Qu.:13.0   1st Qu.:-1.61932   1st Qu.:-0.4004   1st Qu.:-0.70299   high:248  #  Median :25.5   Median :-0.02637   Median : 0.2117   Median : 0.08027   NA's: 86  #  Mean   :25.5   Mean   :-0.03168   Mean   : 0.1722   Mean   : 0.04024             #  3rd Qu.:38.0   3rd Qu.: 1.64571   3rd Qu.: 1.1497   3rd Qu.: 0.79111             #  Max.   :50.0   Max.   :10.19227   Max.   : 2.5794   Max.   : 3.16116             #                 NA's   :69         NA's   :30        NA's   :92

The following data segment illustrates this fact, including caseswith missing data at Level 1 (e.g., job satisfaction) and 2 (e.g.,cohesion).

#    GRPID      JOBSAT     COHES     NEGLEAD WLOAD# 73     5 -1.72143400 0.9023198  0.83025589  high# 74     5          NA 0.9023198  0.15335056  high# 75     5 -0.09541178 0.9023198  0.21886272   low# 76     6  0.68626611        NA -0.38190591  high# 77     6          NA        NA          NA   low# 78     6 -1.86298201        NA -0.05351001  high

In the following, we will employ a two-level model to address missingdata at both levels simultaneously.

Specifying the imputation model

The specification of the two-level model, involves two components,one pertaining to the variables at each level of the sample (Goldstein,Carpenter, Kenward, & Levin, 2009; for further discussion, see alsoEnders, Mister, & Keller, 2016; Grund, Lüdtke, & Robitzsch, inpress).

Specifically, the imputation model is specified as a list with twocomponents, where the first component denotes the model for thevariables at Level 1, and the second component denotes the model for thevariables at Level 2.

For example, using theformula interface, an imputationmodel targeting all variables in the data set can be written asfollows.

fml<-list( JOBSAT+ NEGLEAD+ WLOAD~1+ (1|GRPID) ,# Level 1             COHES~1 )# Level 2

The first component of this list includes the three target variablesat Level 1 and a fixed (1) as well as a random intercept(1|GRPID). The second component includes the targetvariable at Level 2 with a fixed intercept (1).

From a statistical point of view, this specification corresponds tothe following model\[\begin{aligned}\mathbf{y}_{1ij} &= \boldsymbol\mu_{1} + \mathbf{u}_{1j} +\mathbf{e}_{ij} \\\mathbf{y}_{2j} &= \boldsymbol\mu_{2} + \mathbf{u}_{1j} \; ,\end{aligned}\] where\(\mathbf{y}_{1ij}\)denotes the target variables at Level 1,\(\mathbf{y}_{2j}\) the target variables atLevel 2, and the right-hand side of the model contains the fixedeffects, random effects, and residual terms as mentioned above.

Note that, even though the two components of the model appear to beseparated, they define a single (joint) model for all target variablesat both Level 1 and 2. Specifically, this model employs a two-levelcovariance structure, which allows for relations between variables atboth Level 1 (i.e., correlated residuals at Level 1) and 2 (i.e.,correlated random effects residuals at Level 2).

Generating imputations

Because the data contain missing values at both levels, imputationswill be generated withjomoImpute (and notpanImpute). Except for the specification of the two-levelmodel, the syntax is the same as in applications with missing data onlyat Level 1.

Here, we will run 5,000 burn-in iterations and generate 20imputations, each 250 iterations apart.

imp<-jomoImpute(leadership,formula = fml,n.burn =5000,n.iter =250,m =20)

By looking at thesummary, we can then review theimputation procedure and verify that the imputation model converged.

summary(imp)

# # Call:# # jomoImpute(data = leadership, formula = fml, n.burn = 5000, n.iter = 250, #     m = 20)# # Level 1:#  # Cluster variable:         GRPID # Target variables:         JOBSAT NEGLEAD WLOAD # Fixed effect predictors:  (Intercept) # Random effect predictors: (Intercept) # # Level 2:#                 # Target variables:         COHES # Fixed effect predictors:  (Intercept) # # Performed 5000 burn-in iterations, and generated 20 imputed data sets,# each 250 iterations apart. # # Potential scale reduction (Rhat, imputation phase):#  #          Min   25%  Mean Median   75%   Max# Beta:  1.001 1.002 1.004  1.004 1.006 1.009# Beta2: 1.001 1.001 1.001  1.001 1.001 1.001# Psi:   1.000 1.001 1.002  1.001 1.002 1.006# Sigma: 1.000 1.002 1.004  1.004 1.006 1.007# # Largest potential scale reduction:# Beta: [1,3], Beta2: [1,1], Psi: [1,1], Sigma: [2,1]# # Missing data per variable:#     GRPID JOBSAT NEGLEAD WLOAD COHES# MD% 0     9.2    12.3    11.5  4.0

Due to the greater complexity of the two-level model, the outputincludes more information than in applications with missing data only atLevel 1. For example, the output features the model specification forvariables at both Level 1 and 2. Furthermore, it provides convergencestatistics for the additional regression coefficients for the targetvariables at Level 2 (i.e.,Beta2).

Finally, it also becomes visible that the two-level model indeedallows for relations between target variables at Level 1 and 2. This canbe seen from the fact that the potential scale reduction factor (\(\hat{R}\)) for the covariance matrix atLevel 2 (Psi) was largest forPsi[4,3], whichis the covariance between cohesion and the random intercept of workload.

Completing the data

The completed data sets can then be extracted withmitmlComplete.

implist<-mitmlComplete(imp,"all")

When inspecting the completed data, it is easy to verify that theimputations for variables at Level 2 are constant within groups asintended, thus preserving the two-level structure of the data.

#    GRPID      JOBSAT     NEGLEAD WLOAD     COHES# 73     5 -1.72143400  0.83025589  high 0.9023198# 74     5  0.68223338  0.15335056  high 0.9023198# 75     5 -0.09541178  0.21886272   low 0.9023198# 76     6  0.68626611 -0.38190591  high 2.1086213# 77     6 -2.97953478 -1.05236552   low 2.1086213# 78     6 -1.86298201 -0.05351001  high 2.1086213

References

Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevelmultiple imputation: A review and evaluation of joint modeling andchained equations imputation.Psychological Methods,21, 222–240. doi: 10.1037/met0000063 (Link)

Goldstein, H., Carpenter, J. R., Kenward, M. G., & Levin, K. A.(2009). Multilevel models with multivariate mixed response types.Statistical Modelling,9, 173–197. doi:10.1177/1471082X0800900301 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (2018). Multipleimputation of missing data for multilevel models: Simulations andrecommendations.Organizational Research Methods,21(1), 111–149. doi: 10.1177/1094428117703686 (Link)

# Author: Simon Grund (simon.grund@uni-hamburg.de)# Date:   2023-03-08