- Notifications
You must be signed in to change notification settings - Fork0
An R package providing multiple Imputation of covariance matrices in order to perform factor analysis.
License
Unknown, MIT licenses found
Licenses found
Teebusch/mifa
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
mifa is an R package that implements multiple imputation of covariancematrices to allow to perform factor analysis on incomplete data. Itworks as follows:
Impute missing values multiple times usingMultivariate Imputationwith Chained Equations (MICE) from themice package.
Combine the covariance matrices of the imputed data sets into asingle covariance matrix using Rubin’s rules1
Use the combined covariance matrix for exploratory factor analysis.
mifa also provides two types of confidence intervals for the varianceexplained by different numbers of principal components: Fiellerconfidence intervals (parametric) for larger samples2 andbootstrapped confidence intervals (nonparametric) for smallersamples.3
For more information about the method, see:
Nassiri, V., Lovik, A., Molenberghs, G., Verbeke, G. (2018). On usingmultiple imputation for exploratory factor analysis of incomplete data.Behavior Research Methods 50, 501–517. doi:10.3758/s13428-017-1013-4
Note: The paper was accompanied by an implementation in R, and thispackage emerged from it. The repository appears to have been abandonedby the authors, but you can still find ithere.
Install from CRAN with:
install.packages("mifa")Or install the development version fromGithub with:
# install.packages("devtools")devtools::install_github("teebusch/mifa")
For this example we use thebfi data set from thepsych package. Itcontains 2,800 subjects’ answers to 25 personality self-report items and3 demographic variables (sex, education, and age). Each of the 25personality questions is meant to tap into one of the “Big 5”personality factors, as indicated by their names:Openness,Conscientiousness,Agreeableness, ,Extraversion,Neuroticism. There are missing responses for most items. Instead ofdropping the incomplete cases from the analysis, we will usemifa toimpute them, and then perform a factor analysis on the imputedcovariance matrix.
First, we usemifa() to impute the covariance matrix and get an ideahow many factors we should use. We use thecov_vars argument to tellmifa to usegender,education, andage for the imputations, butexclude them from the covariance matrix:
library(mifa)library(psych)mi<- mifa(data=bfi,cov_vars=-c(gender,education,age),n_pc=2:8,ci="fieller",print=FALSE)mi#> Imputed covariance matrix of 25 variables#>#> Variable: A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4 O5#> N Imputed: 16 27 26 19 16 21 24 20 26 16 23 16 25 9 21 22 21 11 36 29 22 0 28 14 20#>#> Number of MICE imputations: 5#> Additional variables used for imputations:#> gender education age#>#> Cumulative proportion of variance explained by n principal components:#>#> n prop Fieller CI#> 2 0.33 [0.32, 0.34]#> 3 0.41 [0.40, 0.42]#> 4 0.48 [0.47, 0.49]#> 5 0.54 [0.53, 0.55]#> 6 0.59 [0.58, 0.59]#> 7 0.62 [0.61, 0.63]#> 8 0.66 [0.65, 0.66]
It looks like the first 5 principal components explain more than half ofthe variance in the responses, so we perform a factor analysis with 5factors, using thefa() function from thepsych package. We can getthe imputed covariance matrix of our data frommi$cov_combined. Fromthere on, it’s business as usual.
fit<- fa(mi$cov_combined,n.obs= nrow(bfi),nfactors=5)
The factor diagram shows that the five factors correspond nicely to the5 types of questions:
fa.diagram(fit)We can add the factor scores to the original data, in order to exploregroup differences. Because we need complete data to calculate factorscores, we first impute a single data set with mice:
data_imp<-mice::complete(mice::mice(bfi,1,print=FALSE))fct_scores<-data.frame(factor.scores(data_imp[,1:25],fit)$scores)data_imp<-data.frame(Gender=factor(data_imp$gender),Extraversion=fct_scores$MR1,Neuroticism=fct_scores$MR2,Conscientious=fct_scores$MR3,Openness=fct_scores$MR4,Agreeableness=fct_scores$MR5)levels(data_imp$Gender)<- c("Male","Female")
Then we can visualize the group differences:
library(ggplot2)library(tidyr)data_imp2<-tidyr::pivot_longer(data_imp,cols=-Gender,names_to="factor")ggplot(data_imp2)+ geom_density(aes(value,linetype=Gender))+ facet_wrap(~factor,nrow=2)+ theme(legend.position="inside",legend.position.inside= c(.9,.1))
Footnotes
Rubin D. B. Multiple imputation for nonresponse in surveys (2004).John Wiley & Sons.↩
Fieller, E. C. (1954). Some problems in interval estimation.Journal of the Royal Statistical Society. Series B (Methodological):175-185.↩
Shao, J. & Sitter, R. R. (1996). Bootstrap for imputed surveydata. Journal of the American Statistical Association 91.435 (1996):1278-1288. doi:10.1080/01621459.1996.10476997↩
About
An R package providing multiple Imputation of covariance matrices in order to perform factor analysis.
Topics
Resources
License
Unknown, MIT licenses found
Licenses found
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.
