| Type: | Package |
| Title: | Bayesian Gaussian Graphical Models |
| Version: | 2.1.6 |
| Date: | 2025-12-01 |
| Author: | Donald Williams [aut], Joris Mulder [aut], Philippe Rast [aut, cre] |
| Maintainer: | Philippe Rast <rast.ph@gmail.com> |
| Description: | Fit Bayesian Gaussian graphical models. The methods are separated into two Bayesian approaches for inference: hypothesis testing and estimation. There are extensions for confirmatory hypothesis testing, comparing Gaussian graphical models, and node wise predictability. These methods were recently introduced in the Gaussian graphical model literature, including Williams (2019) <doi:10.31234/osf.io/x8dpr>, Williams and Mulder (2019) <doi:10.31234/osf.io/ypxd8>, Williams, Rast, Pericchi, and Mulder (2019) <doi:10.31234/osf.io/yt386>. |
| Depends: | R (≥ 4.0.0) |
| License: | GPL-2 |
| Imports: | BFpack (≥ 1.2.3), GGally (≥ 1.4.0), ggplot2 (≥ 3.2.1),ggridges (≥ 0.5.1), grDevices, MASS (≥ 7.3-51.5), methods,mvnfast (≥ 0.2.5), network (≥ 1.15), reshape (≥ 0.8.8), Rcpp(≥ 1.0.4.6), Rdpack (≥ 0.11-1), sna (≥ 2.5), stats, utils, |
| Suggests: | abind (≥ 1.4-5), assortnet (≥ 0.12), networktools (≥1.3.0), mice (≥ 3.8.0), psych, knitr, rmarkdown, testthat (≥3.0.0) |
| Encoding: | UTF-8 |
| LazyData: | true |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.2 |
| LinkingTo: | Rcpp, RcppArmadillo, RcppDist, RcppProgress |
| RdMacros: | Rdpack |
| BugReports: | https://github.com/rast-lab/BGGM/issues |
| Config/testthat/edition: | 3 |
| URL: | https://rast-lab.github.io/BGGM/ |
| NeedsCompilation: | yes |
| Packaged: | 2025-12-02 04:28:01 UTC; philippe |
| Repository: | CRAN |
| Date/Publication: | 2025-12-02 07:40:02 UTC |
BGGM: Bayesian Gaussian Graphical Models
Description
TheR packageBGGM provides tools for making Bayesian inference inGaussian graphical models (GGM). The methods are organized around two general approaches forBayesian inference: (1) estimation (Williams 2018) and (2) hypothesis testing(Williams and Mulder 2019). The key distinction is that the former focuses on either theposterior or posterior predictive distribution, whereas the latter focuses on model comparisonwith the Bayes factor.
The methods inBGGM build upon existing algorithms that are well-known in the literature.The central contribution ofBGGM is to extend those approaches:
Bayesian estimation with the novel matrix-F prior distribution (Mulder and Pericchi 2018).
Estimation
estimate.
Bayesian hypothesis testing with the novel matrix-F prior distribution (Mulder and Pericchi 2018).
Comparing GGMs (Williams et al. 2020)
Partial correlation differences
ggm_compare_estimate.Posterior predictive check
ggm_compare_ppc.Exploratory hypothesis testing
ggm_compare_explore.Confirmatory hypothesis testing
ggm_compare_confirm.
Extending inference beyond the conditional (in)dependence structure
Predictability with Bayesian variance explained (Gelman et al. 2019)
predictability.Posterior uncertainty in the partial correlations
estimate.Custom Network Statistics
roll_your_own.
Furthermore, the computationally intensive tasks are written inc++ via theRpackageRcpp (Eddelbuettel et al. 2011) and thec++libraryArmadillo (Sanderson and Curtin 2016), there are plotting functionsfor each method, control variables can be included in the model, and there is support formissing valuesbggm_missing.
Supported Data Types:
Continuous: The continuous method was described in Williams and Mulder (2019).
Binary: The binary method builds directly upon in Talhouk et al. (2012),that, in turn, built upon the approaches of Lawrence et al. (2008) andWebb and Forster (2008) (to name a few).
Ordinal: Ordinal data requires sampling thresholds. There are two approach included inBGGM: (1)the customary approach described in in Albert and Chib (1993) (the default) andthe 'Cowles' algorithm described in in Cowles (1996).
Mixed: The mixed data (a combination of discrete and continuous) method was introducedin Hoff (2007). This is a semi-parametric copula model(i.e., a copula GGM) based on the ranked likelihood. Note that this can be used for dataconsisting entirely of ordinal data.
Additional Features:
The primary focus ofBGGM is Gaussian graphical modeling (the inverse covariance matrix).The residue is a suite of useful methods not explicitly for GGMs:
Bivariate correlations for binary (tetrachoric), ordinal (polychoric), mixed (rank based),and continuous (Pearson's) data
zero_order_cors.Multivariate regression for binary (probit), ordinal (probit),mixed (rank likelihood), and continous data (
estimate).Multiple regression for binary (probit), ordinal (probit),mixed (rank likelihood), and continuous data (e.g.,
coef.estimate).
Note on Conditional (In)dependence Models for Latent Data:
All of the data types (besides continuous) model latent data. That is, unobserved(latent) data is assumed to be Gaussian. For example, a tetrachoric correlation(binary data) is a special case of a polychoric correlation (ordinal data).Both capture relations between "theorized normally distributed continuouslatent variables" (Wikipedia).In both instances, the corresponding partial correlation between observed variables is conditionedon the remaining variables in thelatent space. This implies that interpretationis similar to continuous data, but with respect to latent variables. We refer interested usersto page 2364, section 2.2, in Webb and Forster (2008).
High Dimensional Data?
BGGM was built specifically for social-behavioral scientists. Of course,the methods can be used by all researchers. However, there is currentlynot supportfor high-dimensional data (i.e., more variables than observations) that are commonplace in the genetics literature. These data are rare in the social-behavioral sciences.In the future, support for high-dimensional data may be added toBGGM.
References
Albert JH, Chib S (1993).“Bayesian analysis of binary and polychotomous response data.”Journal of the American statistical Association,88(422), 669–679.
Cowles MK (1996).“Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models.”Statistics and Computing,6(2), 101–111.doi:10.1007/bf00162520.
Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, Chambers J, Bates D (2011).“Rcpp: Seamless R and C++ integration.”Journal of Statistical Software,40(8), 1–18.
Gelman A, Goodrich B, Gabry J, Vehtari A (2019).“R-squared for Bayesian Regression Models.”American Statistician,73(3), 307–309.ISSN 15372731.
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Lawrence E, Bingham D, Liu C, Nair VN (2008).“Bayesian inference for multivariate ordinal data using parameter expansion.”Technometrics,50(2), 182–191.
Mulder J, Pericchi L (2018).“The Matrix-F Prior for Estimating and Testing Covariance Matrices.”Bayesian Analysis, 1–22.ISSN 19316690,doi:10.1214/17-BA1092.
Sanderson C, Curtin R (2016).“Armadillo: a template-based C++ library for linear algebra.”Journal of Open Source Software,1(2), 26.doi:10.21105/joss.00026.
Talhouk A, Doucet A, Murphy K (2012).“Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices.”Journal of Computational and Graphical Statistics,21(3), 739–757.
Webb EL, Forster JJ (2008).“Bayesian model determination for multivariate ordinal and binary data.”Computational statistics & data analysis,52(5), 2632–2649.doi:10.1016/j.csda.2007.09.008.
Williams DR (2018).“Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons.”arXiv.doi:10.31234/OSF.IO/X8DPR.
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
Williams DR, Rast P, Pericchi LR, Mulder J (2020).“Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.”Psychological Methods.doi:10.1037/met0000254.
Data: Sachs Network
Description
Protein expression in human immune system cells
Usage
data("Sachs")Format
A data frame containing 7466 cells (n = 7466) and flow cytometrymeasurements of 11 (p = 11) phosphorylated proteins and phospholipids
@referencesSachs, K., Gifford, D., Jaakkola, T., Sorger, P., & Lauffenburger, D. A. (2002).Bayesian network approach to cell signaling pathway modeling. Sci. STKE, 2002(148), pe38-pe38.
Examples
data("Sachs")Data: Autism and Obssesive Compulsive Disorder
Description
A correlation matrix with 17 variables in total (autsim: 9; OCD: 8).The sample size was 213.
Usage
data("asd_ocd")Format
A correlation matrix including 17 variables. These data were measured on a 4 level likert scale.
Details
Autism:
CICircumscribed interestsUPUnusual preoccupationsRORepetitive use of objects or interests in parts of objectsCRCompulsions and/or ritualsCIUnusual sensory interestsSMComplex mannerisms or stereotyped body movementsSUStereotyped utterances/delayed echolaliaNILNeologisms and/or idiosyncratic languageVRVerbal rituals
OCD
CDConcern with things touched due to dirt/bacteriaTBThoughts of doing something bad around othersCTContinual thoughts that do not go awayHPBelief that someone/higher power put reoccurring thoughts in their headCWContinual washingCChContinual checking CntCheckCCContinual counting/repeatingRDRepeatedly do things until it feels good or just right
References
Jones, P. J., Ma, R., & McNally, R. J. (2019). Bridge centrality:A network approachto understanding comorbidity. Multivariate behavioral research, 1-15.
Ruzzano, L., Borsboom, D., & Geurts, H. M. (2015).Repetitive behaviors in autism and obsessive-compulsivedisorder: New perspectives from a network analysis.Journal of Autism and Developmental Disorders, 45(1),192-202. doi:10.1007/s10803-014-2204-9
Examples
data("asd_ocd")# generate continuousY <- MASS::mvrnorm(n = 213, mu = rep(0, 17), Sigma = asd_ocd, empirical = TRUE)Data: 25 Personality items representing 5 factors
Description
This dataset and the corresponding documentation was taken from thepsych package. We refer users to thatpackage for further details (Revelle 2019).
Usage
data("bfi")Format
A data frame with 25 variables and 2800 observations (including missing values)
Details
A1Am indifferent to the feelings of others. (q_146)A2Inquire about others' well-being. (q_1162)A3Know how to comfort others. (q_1206)A4Love children. (q_1364)A5Make people feel at ease. (q_1419)C1Am exacting in my work. (q_124)C2Continue until everything is perfect. (q_530)C3Do things according to a plan. (q_619)C4Do things in a half-way manner. (q_626)C5Waste my time. (q_1949)E1Don't talk a lot. (q_712)E2Find it difficult to approach others. (q_901)E3Know how to captivate people. (q_1205)E4Make friends easily. (q_1410)E5Take charge. (q_1768)N1Get angry easily. (q_952)N2Get irritated easily. (q_974)N3Have frequent mood swings. (q_1099)N4Often feel blue. (q_1479)N5Panic easily. (q_1505)o1Am full of ideas. (q_128)o2Avoid difficult reading material.(q_316)o3Carry the conversation to a higher level. (q_492)o4Spend time reflecting on things. (q_1738)o5Will not probe deeply into a subject. (q_1964)genderMales = 1, Females =2education1 = HS, 2 = finished HS, 3 = some college, 4 = college graduate 5 = graduate degree
References
Revelle W (2019).psych: Procedures for Psychological, Psychometric, and Personality Research.Northwestern University, Evanston, Illinois.R package version 1.9.12,https://CRAN.R-project.org/package=psych.
GGM: Missing Data
Description
Estimation and exploratory hypothesis testing with missing data.
Usage
bggm_missing(x, iter = 2000, method = "estimate", ...)Arguments
x | An object of class |
iter | Number of iterations for each imputed dataset (posterior samples; defaults to 2000). |
method | Character string. Which method should be used (default set to |
... |
Value
An object of classestimate orexplore.
Note
Currently,BGGM is compatible with the packagemice for handlingthe missing data. This is accomplished by fitting a model for each imputed dataset(i.e., more than one to account for uncertainty in the imputation step) and then poolingthe estimates.
In a future version, an additional option will be added that allows forimputing the missing values during model fitting. This option will be incorporated directly intotheestimate orexplore functions, such thatbggm_missing willalways support missing data withmice.
Support:
There is limited support for missing data. As of version2.0.0, it is possible todetermine the graphical structure with eitherestimate orexplore, in additionto plotting the graph withplot.select. All data typesare currently supported.
Memory Warning:A model is fitted for each imputed dataset. This results in a potentially large object.
Examples
# note: iter = 250 for demonstrative purposes# need this packagelibrary(mice, warn.conflicts = FALSE)# dataY <- ptsd[,1:5]# matrix for indicesmat <- matrix(0, nrow = 221, ncol = 5)# indicesindices <- which(mat == 0, arr.ind = TRUE)# Introduce 50 NAsY[indices[sample(1:nrow(indices), 50),]] <- NA# imputex <- mice(Y, m = 5, print = FALSE)################################ copula ############################### rank based parital correlations# estimate the model fit_est <- bggm_missing(x, method = "estimate", type = "mixed", iter = 250, progress = FALSE, seed = 1234)# select edge setE <- select(fit_est)# plot Eplt_E <- plot(E)$pltplt_ECompute Posterior Distributions from Graph Search Results
Description
The 'bma_posterior' function samples posterior distributions of graph parameters (e.g., partial correlations or precision matrices) based on the graph structures sampled during a Bayesian graph search performed byggm_search.
Usage
bma_posterior(object, param = "pcor", iter = 5000, progress = TRUE)Arguments
object | A ggm_search object |
param | Compute BMA on either partial correlations "pcor" (default) or on precision matrix "Theta". |
iter | Number of samples to be drawn, defaults to 5,000 |
progress | Show progress bar, defaults to TRUE |
Details
This function incorporates uncertainty in both graph structure and parameter estimation, providing Bayesian Model Averaged (BMA) parameter estimates.
Use 'bma_posterior' when detailed posterior inference on graph parameters is needed, or to refine results obtained from 'ggm_search'.
Value
A list containing posterior samples and the Bayesian Model Averaged parameter estimates.
See Also
Compute Regression Parameters forestimate Objects
Description
There is a direct correspondence between the inverse covariance matrix andmultiple regression (Kwan 2014; Stephens 1998). This readily allowsfor converting the GGM parameters to regression coefficients. All data types are supported.
Usage
## S3 method for class 'estimate'coef(object, iter = NULL, progress = TRUE, ...)Arguments
object | An Object of class |
iter | Number of iterations (posterior samples; defaults to the number in the object). |
progress | Logical. Should a progress bar be included (defaults to |
... | Currently ignored. |
Value
An object of classcoef, containting two lists.
betasA list of lengthp, each containing ap - 1 byitermatrix ofposterior samplesobjectAn object of classestimate(the fitted model).
References
Kwan CC (2014).“A regression-based interpretation of the inverse of the sample covariance matrix.”Spreadsheets in Education,7(1), 4613.
Stephens G (1998).“On the Inverse of the Covariance Matrix in Portfolio Analysis.”The Journal of Finance,53(5), 1821–1827.
Examples
# note: iter = 250 for demonstrative purposes############################ example 1: binary ############################# dataY = matrix( rbinom(100, 1, .5), ncol=4)# fit modelfit <- estimate(Y, type = "binary", iter = 250, progress = TRUE)# summarize the partial correlationsreg <- coef(fit, progress = FALSE)# summarysumm <- summary(reg)summCompute Regression Parameters forexplore Objects
Description
There is a direct correspondence between the inverse covariance matrix andmultiple regression (Kwan 2014; Stephens 1998). This readily allowsfor converting the GGM parameters to regression coefficients. All data types are supported.
Usage
## S3 method for class 'explore'coef(object, iter = NULL, progress = TRUE, ...)Arguments
object | An Object of class |
iter | Number of iterations (posterior samples; defaults to the number in the object). |
progress | Logical. Should a progress bar be included (defaults to |
... | Currently ignored. |
Value
An object of classcoef, containting two lists.
betasA list of lengthp, each containing ap - 1 byitermatrix ofposterior samplesobjectAn object of classexplore(the fitted model).
References
Kwan CC (2014).“A regression-based interpretation of the inverse of the sample covariance matrix.”Spreadsheets in Education,7(1), 4613.
Stephens G (1998).“On the Inverse of the Covariance Matrix in Portfolio Analysis.”The Journal of Finance,53(5), 1821–1827.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- ptsd[,1:4]############################# example 1: ordinal ############################## fit model (note + 1, due to zeros)fit <- explore(Y + 1, type = "ordinal", iter = 250, progress = FALSE, seed = 1234)# summarize the partial correlationsreg <- coef(fit, progress = FALSE)# summarysumm <- summary(reg)summGGM: Confirmatory Hypothesis Testing
Description
Confirmatory hypothesis testing in GGMs. Hypotheses are expressed as equalityand/or ineqaulity contraints on the partial correlations of interest. Here the focus isnoton determining the graph (seeexplore) but testing specific hypotheses related tothe conditional (in)dependence structure. These methods were introduced inWilliams and Mulder (2019).
Usage
confirm( Y, hypothesis, prior_sd = 0.5, formula = NULL, type = "continuous", mixed_type = NULL, iter = 25000, progress = TRUE, impute = TRUE, seed = NULL, ...)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
hypothesis | Character string. The hypothesis (or hypotheses) to be tested. See details. |
prior_sd | Numeric. Scale of the prior distribution, approximately the standard deviationof a beta distribution (defaults to 0.5). |
formula | An object of class |
type | Character string. Which type of data forY ? The options include |
mixed_type | Numeric vector of lengthp. An indicator for which varibles should be treated as ranks.(1 for rank and 0 to assume normality). The default is currently (dev version) to treat all integer variablesas ranks when |
iter | Number of iterations (posterior samples; defaults to 25,000). |
progress | Logical. Should a progress bar be included (defaults to |
impute | Logicial. Should the missing values ( |
seed | An integer for the random seed. |
... | Currently ignored. |
Details
The hypotheses can be written either with the respective column names or numbers.For example,1--2 denotes the relation between the variables in column 1 and 2.Note that these must correspond to the upper triangular elements of the correlationmatrix. This is accomplished by ensuring that the first number is smaller than the second number.This also applies when using column names (i.e,, in reference to the column number).
One Hypothesis:
To test whether some relations are larger than others, while othersare expected to be equal, this can be writting as
hyp <- c(1--2 > 1--3 = 1--4 > 0),
where there is an addition additional contraint that all effects are expected to be positive.This is then compared to the complement.
More Than One Hypothesis:
The above hypothesis can also be compared to, say, a null model by using ";"to seperate the hypotheses, for example,
hyp <- c(1--2 > 1--3 = 1--4 > 0; 1--2 = 1--3 = 1--4 = 0).
Any number of hypotheses can be compared this way.
Using "&"
It is also possible to include&. This allows for testing one constraintandanother contraint as one hypothesis.
hyp <- c("A1--A2 > A1--A2 & A1--A3 = A1--A3")
Of course, it is then possible to include additional hypotheses by separating them with ";".Note also that the column names were used in this example (e.g.,A1--A2 is the relationbetween those nodes).
Testing Sums
It might also be interesting to test the sum of partial correlations. For example, that thesum of specific relations is larger than the sum of other relations. This can be written as
hyp <- c("A1--A2 + A1--A3 > A1--A4 + A1--A5; A1--A2 + A1--A3 = A1--A4 + A1--A5")
Potential Delays:
There is a chance for a potentially long delay from the time the progress bar finishesto when the function is done running. This occurs when the hypotheses require furthersampling to be tested, for example, when grouping relationsc("(A1--A2, A1--A3) > (A1--A4, A1--A5)". This is not an error.
Controlling for Variables:
When controlling for variables, it is assumed thatY includesonlythe nodes in the GGM and the control variables. Internally,only the predictorsthat are included informula are removed fromY. This is not behavior of, say,lm, but was adopted to ensure users do not have to write out each variable thatshould be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables (Hoff 2007). This is based on theranked likelihood which requires sampling the ranks for each variable (i.e., the data is not merelytransformed to ranks). This is computationally expensive when there are many levels. For example,with continuous data, there are as many ranks as data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise. This is accomplished by specifying an indicatorvector of lengthp. A one indicates to use the ranks, whereas a zero indicates to "ignore"that variable. By default all integer variables are handled as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sd(i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Value
The returned object of classconfirm contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
out_hyp_probPosterior hypothesis probabilities.infoAn object of classBFfrom the R packageBFpack.
Note
"Default" Prior:
In Bayesian statistics, a default Bayes factor needs to have several properties. I referinterested users to section 2.2 in Dablander et al. (2020). InWilliams and Mulder (2019), some of these propteries were investigated (e.g.,model selection consistency). That said, we would not consider this a "default" or "automatic"Bayes factor and thus we encourage users to perform sensitivity analyses by varying the scale of the priordistribution.
Furthermore, it is important to note there is no "correct" prior and, also, there is no needto entertain the possibility of a "true" model. Rather, the Bayes factor can be interpreted aswhich hypothesis best (relative to each other) predicts the observed data(Section 3.2 in Kass and Raftery 1995).
Interpretation of Conditional (In)dependence Models for Latent Data:
SeeBGGM-package for details about interpreting GGMs based on latent data(i.e, all data types besides"continuous")
References
Dablander F, Bergh Dvd, Ly A, Wagenmakers E (2020).“Default Bayes Factors for Testing the (In) equality of Several Population Variances.”arXiv preprint arXiv:2003.06278.
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Kass RE, Raftery AE (1995).“Bayes Factors.”Journal of the American Statistical Association,90(430), 773–795.
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
Examples
# note: iter = 250 for demonstrative purposes############################# example 1: cheating ############################# Here a true hypothesis is tested,# which shows the method works nicely# (peeked at partials beforehand)# dataY <- BGGM::bfi[,1:10]hypothesis <- c("A1--A2 < A1--A3 < A1--A4 = A1--A5")# test cheattest_cheat <- confirm(Y = Y, type = "continuous", hypothesis = hypothesis, iter = 250, progress = FALSE)# print (probabilty of nearly 1 !)test_cheatConstrained Posterior Distribution
Description
Compute the posterior distributionwith off-diagonal elements of the precision matrix constrainedto zero.
Usage
constrained_posterior( object, adj, method = "direct", iter = 5000, progress = TRUE, ...)Arguments
object | An object of class |
adj | A |
method | Character string. Which method should be used ? Defaults tothe "direct sampler" (i.e., |
iter | Number of iterations (posterior samples; defaults to 5000). |
progress | Logical. Should a progress bar be included (defaults to |
... | Currently ignored. |
Value
An object of classcontrained, including
precision_meanThe posterior mean for the precision matrix.pcor_meanThe posterior mean for the precision matrix.precision_sampsA 3d array of dimensionpbypbyiterincluding the sampled precision matrices.pcor_sampsA 3d array of dimensionpbypbyiterincluding sampled partial correlations matrices.
References
Lenkoski A (2013).“A direct sampler for G-Wishart variates.”Stat,2(1), 119–128.
Examples
# dataY <- bfi[,1:10]# sample posteriorfit <- estimate(Y, iter = 100)# select graphsel <- select(fit)# constrained posteriorpost <- constrained_posterior(object = fit, adj = sel$adj, iter = 100, progress = FALSE)MCMC Convergence
Description
Monitor convergence of the MCMC algorithms.
Usage
convergence(object, param = NULL, type = "trace", print_names = FALSE)Arguments
object | An object of class |
param | Character string. Names of parameters for which to monitor MCMC convergence. |
type | Character string. Which type of convergence plot ? The currentoptions are |
print_names | Logical. Should the parameter names be printed (defaults to |
Value
A list ofggplot objects.
Note
An overview of MCMC diagnostics can be foundhere.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- ptsd[,1:5]############################### continuous ################################fit <- estimate(Y, iter = 250, progress = FALSE)# print names firstconvergence(fit, print_names = TRUE)# trace plotsconvergence(fit, type = "trace", param = c("B1--B2", "B1--B3"))[[1]]# acf plotsconvergence(fit, type = "acf", param = c("B1--B2", "B1--B3"))[[1]]Data: Contingencies of Self-Worth Scale (CSWS)
Description
A dataset containing items from the Contingencies of Self-Worth Scale (CSWS) scale. There are 35 variables and680 observations
Usage
data("csws")Format
A data frame with 35 variables and 680 observations (7 point Likert scale)
Details
1When I think I look attractive, I feel good about myself2My self-worth is based on God's love3I feel worthwhile when I perform better than others on a task or skill.4My self-esteem is unrelated to how I feel about the way my body looks.5Doing something I know is wrong makes me lose my self-respect6I don't care if other people have a negative opinion about me.7Knowing that my family members love me makes me feel good about myself.8I feel worthwhile when I have God's love.9I can’t respect myself if others don't respect me.10My self-worth is not influenced by the quality of my relationships with my family members.11Whenever I follow my moral principles, my sense of self-respect gets a boost.12Knowing that I am better than others on a task raises my self-esteem.13My opinion about myself isn't tied to how well I do in school.14I couldn't respect myself if I didn't live up to a moral code.15I don't care what other people think of me.16When my family members are proud of me, my sense of self-worth increases.17My self-esteem is influenced by how attractive I think my face or facial features are.18My self-esteem would suffer if I didn’t have God's love.19Doing well in school gives me a sense of selfrespect.20Doing better than others gives me a sense of self-respect.21My sense of self-worth suffers whenever I think I don't look good.22I feel better about myself when I know I'm doing well academically.23What others think of me has no effect on what I think about myself.24When I don’t feel loved by my family, my selfesteem goes down.25My self-worth is affected by how well I do when I am competing with others.26My self-esteem goes up when I feel that God loves me.27My self-esteem is influenced by my academic performance.28My self-esteem would suffer if I did something unethical.29It is important to my self-respect that I have a family that cares about me.30My self-esteem does not depend on whether or not I feel attractive.31When I think that I’m disobeying God, I feel bad about myself.32My self-worth is influenced by how well I do on competitive tasks.33I feel bad about myself whenever my academic performance is lacking.34My self-esteem depends on whether or not I follow my moral/ethical principles.35My self-esteem depends on the opinions others hold of me.gender"M" (male) or "F" (female)
Note
There are seven domains
FAMILY SUPPORT: items 7, 10, 16, 24, and 29.
COMPETITION: items 3, 12, 20, 25, and 32.
APPEARANCE: items 1, 4, 17, 21, and 30.
GOD'S LOVE: items 2, 8, 18, 26, and 31.
ACADEMIC COMPETENCE: items 13, 19, 22, 27, and 33.
VIRTUE: items 5, 11, 14, 28, and 34.
APPROVAL FROM OTHERS: items: 6, 9, 15, 23, and 35.
References
Briganti, G., Fried, E. I., & Linkowski, P. (2019). Network analysis of Contingencies of Self-WorthScale in 680 university students. Psychiatry research, 272, 252-257.
Examples
data("csws")Data: Depression and Anxiety (Time 1)
Description
A data frame containing 403 observations (n = 403) and 16 variables (p = 16) measured on the 4-pointlikert scale (depression: 9; anxiety: 7).
Usage
data("depression_anxiety_t1")Format
A data frame containing 403 observations (n = 7466) and 16 variables (p = 16) measured on the 4-pointlikert scale.
Details
Depression:
PHQ1Little interest or pleasure in doing things?PHQ2Feeling down, depressed, or hopeless?PHQ3Trouble falling or staying asleep, or sleeping too much?PHQ4Feeling tired or having little energy?PHQ5Poor appetite or overeating?PHQ6Feeling bad about yourself — or that you are a failure or have letyourself or your family down?PHQ7Trouble concentrating on things, such as reading the newspaper orwatching television?PHQ8Moving or speaking so slowly that other people could have noticed? Or sofidgety or restless that you have been moving a lot more than usual?PHQ9Thoughts that you would be better off dead, or thoughts of hurting yourselfin some way?
Anxiety
GAD1Feeling nervous, anxious, or on edgeGAD2Not being able to stop or control worryingGAD3Worrying too much about different thingsGAD4Trouble relaxingGAD5Being so restless that it's hard to sit stillGAD6Becoming easily annoyed or irritableGAD7Feeling afraid as if something awful might happen
References
Forbes, M. K., Baillie, A. J., & Schniering, C. A. (2016). A structural equation modelinganalysis of the relationships between depression,anxiety, and sexual problems over time.The Journal of Sex Research, 53(8), 942-954.
Forbes, M. K., Wright, A. G., Markon, K. E., & Krueger, R. F. (2019). Quantifying the reliability and replicability of psychopathology network characteristics.Multivariate behavioral research, 1-19.
Jones, P. J., Williams, D. R., & McNally, R. J. (2019). Sampling variability is not nonreplication:a Bayesian reanalysis of Forbes, Wright, Markon, & Krueger.
Examples
data("depression_anxiety_t1")labels<- c("interest", "down", "sleep", "tired", "appetite", "selfest", "concen", "psychmtr", "suicid", "nervous", "unctrworry", "worrylot", "relax", "restless", "irritable", "awful")Data: Depression and Anxiety (Time 2)
Description
A data frame containing 403 observations (n = 403) and 16 variables (p = 16) measured on the 4-pointlikert scale (depression: 9; anxiety: 7).
Usage
data("depression_anxiety_t2")Format
A data frame containing 403 observations (n = 7466) and 16 variables (p = 16) measured on the 4-pointlikert scale.
Details
Depression:
PHQ1Little interest or pleasure in doing things?PHQ2Feeling down, depressed, or hopeless?PHQ3Trouble falling or staying asleep, or sleeping too much?PHQ4Feeling tired or having little energy?PHQ5Poor appetite or overeating?PHQ6Feeling bad about yourself — or that you are a failure or have letyourself or your family down?PHQ7Trouble concentrating on things, such as reading the newspaper orwatching television?PHQ8Moving or speaking so slowly that other people could have noticed? Or sofidgety or restless that you have been moving a lot more than usual?PHQ9Thoughts that you would be better off dead, or thoughts of hurting yourselfin some way?
Anxiety
GAD1Feeling nervous, anxious, or on edgeGAD2Not being able to stop or control worryingGAD3Worrying too much about different thingsGAD4Trouble relaxingGAD5Being so restless that it's hard to sit stillGAD6Becoming easily annoyed or irritableGAD7Feeling afraid as if something awful might happen
References
Forbes, M. K., Baillie, A. J., & Schniering, C. A. (2016). A structural equation modelinganalysis of the relationships between depression,anxiety, and sexual problems over time.The Journal of Sex Research, 53(8), 942-954.
Forbes, M. K., Wright, A. G., Markon, K. E., & Krueger, R. F. (2019). Quantifying the reliability and replicability of psychopathology network characteristics.Multivariate behavioral research, 1-19.
Jones, P. J., Williams, D. R., & McNally, R. J. (2019). Sampling variability is not nonreplication:a Bayesian reanalysis of Forbes, Wright, Markon, & Krueger.
Examples
data("depression_anxiety_t2")labels<- c("interest", "down", "sleep", "tired", "appetite", "selfest", "concen", "psychmtr", "suicid", "nervous", "unctrworry", "worrylot", "relax", "restless", "irritable", "awful")GGM: Estimation
Description
Estimate the conditional (in)dependence with either an analytic solution or efficientlysampling from the posterior distribution. These methods were introduced in Williams (2018).The graph is selected withselect.estimate and then plotted withplot.select.
Usage
estimate( Y, formula = NULL, type = "continuous", mixed_type = NULL, analytic = FALSE, prior_sd = sqrt(1/3), iter = 5000, impute = FALSE, progress = TRUE, seed = NULL, ...)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
formula | An object of class |
type | Character string. Which type of data for |
mixed_type | Numeric vector. An indicator of lengthp for which variables should be treated as ranks.(1 for rank and 0 to assume normality). The default is currently to treat all integer variables as rankswhen |
analytic | Logical. Should the analytic solution be computed (default is |
prior_sd | Scale of the prior distribution, approximately the standard deviation of a beta distribution(defaults to sqrt(1/3)). |
iter | Number of iterations (posterior samples; defaults to 5000). |
impute | Logical. Should the missing values ( |
progress | Logical. Should a progress bar be included (defaults to |
seed | An integer for the random seed. |
... | Currently ignored. |
Details
The default is to draw samples from the posterior distribution (analytic = FALSE). The samples arerequired for computing edge differences (seeggm_compare_estimate), Bayesian R2 introduced inGelman et al. (2019) (seepredictability), etc. If the goal isto *only* determine the non-zero effects, this can be accomplished by settinganalytic = TRUE.This is particularly useful when a fast solution is needed (see the examples inggm_compare_ppc)
Controlling for Variables:
When controlling for variables, it is assumed thatY includesonlythe nodes in the GGM and the control variables. Internally,only the predictorsthat are included informula are removed fromY. This is not behavior of, say,lm, but was adopted to ensure users do not have to write out each variable thatshould be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables. This is based on the ranked likelihood which requires samplingthe ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationallyexpensive when there are many levels. For example, with continuous data, there are as many ranksas data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise (Hoff 2007). This isaccomplished by specifying an indicator vector of lengthp. A one indicates to use the ranks,whereas a zero indicates to "ignore" that variable. By default all integer variables are treated as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sd(i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Imputing Missing Values:
Missing values are imputed with the approach described in Hoff (2009).The basic idea is to impute the missing values with the respective posterior pedictive distribution,given the observed data, as the model is being estimated. Note that the default isTRUE,but this ignored when there are no missing values. If set toFALSE, and there are missingvalues, list-wise deletion is performed withna.omit.
Value
The returned object of classestimate contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
pcor_matPartial correltion matrix (posterior mean).post_sampAn object containing the posterior samples.
Note
Posterior Uncertainty:
A key feature ofBGGM is that there is a posterior distribution for each partial correlation.This readily allows for visiualizing uncertainty in the estimates. This feature workswith all data types and is accomplished by plotting the summary of theestimate object(i.e.,plot(summary(fit))). Several examples are provided below.
Interpretation of Conditional (In)dependence Models for Latent Data:
SeeBGGM-package for details about interpreting GGMs based on latent data(i.e, all data types besides"continuous")
References
Gelman A, Goodrich B, Gabry J, Vehtari A (2019).“R-squared for Bayesian Regression Models.”American Statistician,73(3), 307–309.ISSN 15372731.
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Hoff PD (2009).A first course in Bayesian statistical methods, volume 580.Springer.
Williams DR (2018).“Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons.”arXiv.doi:10.31234/OSF.IO/X8DPR.
Examples
# note: iter = 250 for demonstrative purposes############################################ example 1: continuous and ordinal ############################################# dataY <- ptsd# continuous# fit modelfit <- estimate(Y, type = "continuous", iter = 250)# summarize the partial correlationssumm <- summary(fit)# plot the summaryplt_summ <- plot(summary(fit))# select the graphE <- select(fit)# plot the selected graphplt_E <- plot(select(fit))# ordinal# fit model (note + 1, due to zeros)fit <- estimate(Y + 1, type = "ordinal", iter = 250)# summarize the partial correlationssumm <- summary(fit)# plot the summaryplt <- plot(summary(fit))# select the graphE <- select(fit)# plot the selected graphplt_E <- plot(select(fit))#################################### example 2: analytic solution ##################################### (only continuous)# dataY <- ptsd# fit modelfit <- estimate(Y, analytic = TRUE)# summarize the partial correlationssumm <- summary(fit)# plot summaryplt_summ <- plot(summary(fit))# select graphE <- select(fit)# plot the selected graphplt_E <- plot(select(fit))GGM: Exploratory Hypothesis Testing
Description
Learn the conditional (in)dependence structure with the Bayes factor using the matrix-Fprior distribution (Mulder and Pericchi 2018). These methods were introduced inWilliams and Mulder (2019). The graph is selected withselect.explore andthen plotted withplot.select.
Usage
explore( Y, formula = NULL, type = "continuous", mixed_type = NULL, analytic = FALSE, prior_sd = 0.5, iter = 5000, progress = TRUE, impute = FALSE, seed = NULL, ...)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
formula | An object of class |
type | Character string. Which type of data for |
mixed_type | Numeric vector. An indicator of length p for which varibles should be treated as ranks.(1 for rank and 0 to assume normality). The default is to treat all integer variables as rankswhen |
analytic | Logical. Should the analytic solution be computed (default is |
prior_sd | Scale of the prior distribution, approximately the standard deviationof a beta distribution (defaults to 0.5). |
iter | Number of iterations (posterior samples; defaults to 5000). |
progress | Logical. Should a progress bar be included (defaults to |
impute | Logicial. Should the missing values ( |
seed | An integer for the random seed. |
... | Currently ignored (leave empty). |
Details
Controlling for Variables:
When controlling for variables, it is assumed thatY includesonlythe nodes in the GGM and the control variables. Internally,only the predictorsthat are included informula are removed fromY. This is not behavior of, say,lm, but was adopted to ensure users do not have to write out each variable thatshould be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables. This is based on the ranked likelihood which requires samplingthe ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationallyexpensive when there are many levels. For example, with continuous data, there are as many ranksas data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise. This is accomplished by specifying an indicatorvector of lengthp. A one indicates to use the ranks, whereas a zero indicates to "ignore"that variable. By default all integer variables are handled as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sd(i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Imputing Missing Values:
Missing values are imputed with the approach described in Hoff (2009).The basic idea is to impute the missing values with the respective posterior pedictive distribution,given the observed data, as the model is being estimated. Note that the default isTRUE,but this ignored when there are no missing values. If set toFALSE, and there are missingvalues, list-wise deletion is performed withna.omit.
Value
The returned object of classexplore contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
pcor_matpartial correltion matrix (posterior mean).post_sampan object containing the posterior samples.
Note
Posterior Uncertainty:
A key feature ofBGGM is that there is a posterior distribution for each partial correlation.This readily allows for visiualizing uncertainty in the estimates. This feature workswith all data types and is accomplished by plotting the summary of theexplore object(i.e.,plot(summary(fit))). Note that in contrast toestimate (credible intervals),the posterior standard deviation is plotted forexplore objects.
"Default" Prior:
In Bayesian statistics, a default Bayes factor needs to have several properties. I referinterested users to section 2.2 in Dablander et al. (2020). InWilliams and Mulder (2019), some of these propteries were investigated includingmodel selection consistency. That said, we would not consider this a "default" (or "automatic")Bayes factor and thus we encourage users to perform sensitivity analyses by varyingthe scale of the prior distribution.
Furthermore, it is important to note there is no "correct" prior and, also, there is no needto entertain the possibility of a "true" model. Rather, the Bayes factor can be interpreted aswhich hypothesis best (relative to each other) predicts the observed data(Section 3.2 in Kass and Raftery 1995).
Interpretation of Conditional (In)dependence Models for Latent Data:
SeeBGGM-package for details about interpreting GGMs based on latent data(i.e, all data types besides"continuous")
References
Dablander F, Bergh Dvd, Ly A, Wagenmakers E (2020).“Default Bayes Factors for Testing the (In) equality of Several Population Variances.”arXiv preprint arXiv:2003.06278.
Hoff PD (2009).A first course in Bayesian statistical methods, volume 580.Springer.
Kass RE, Raftery AE (1995).“Bayes Factors.”Journal of the American Statistical Association,90(430), 773–795.
Mulder J, Pericchi L (2018).“The Matrix-F Prior for Estimating and Testing Covariance Matrices.”Bayesian Analysis, 1–22.ISSN 19316690,doi:10.1214/17-BA1092.
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
Examples
# note: iter = 250 for demonstrative purposes############################## example 1: binary ###############################Y <- women_math[1:500,]# fit modelfit <- explore(Y, type = "binary", iter = 250, progress = FALSE)# summarize the partial correlationssumm <- summary(fit)# plot the summaryplt_summ <- plot(summary(fit))# select the graphE <- select(fit)# plot the selected graphplt_E <- plot(E)plt_E$plt_altFisher Z Transformation
Description
Tranform correlations to Fisher's Z
Usage
fisher_r_to_z(r)Arguments
r | correlation (can be a vector) |
Value
Fisher Z transformed correlation(s)
Examples
fisher_r_to_z(0.5)Fisher Z Back Transformation
Description
Back tranform Fisher's Z to correlations
Usage
fisher_z_to_r(z)Arguments
z | Fisher Z |
Value
Correlation (s) (backtransformed)
Examples
fisher_z_to_r(0.5)Simulate a Partial Correlation Matrix
Description
Simulate a Partial Correlation Matrix
Usage
gen_net(p = 20, edge_prob = 0.3, lb = 0.05, ub = 0.3)Arguments
p | number of variables (nodes) |
edge_prob | connectivity |
lb | lower bound for the partial correlations |
ub | upper bound for the partial correlations |
Value
A list containing the following:
pcor: Partial correlation matrix, encodingthe conditional (in)dependence structure.
cors: Correlation matrix.
adj: Adjacency matrix.
trys: Number of attempts to obtain apositive definite matrix.
Note
The function checks for a valid matrix (positive definite),but sometimes this will still fail. For example, forlargerp, to have large partial correlations thisrequires a sparse GGM(accomplished by settingedge_probto a small value).
Examples
true_net <- gen_net(p = 10)Generate Ordinal and Binary data
Description
Generate Multivariate Ordinal and Binary data.
Usage
gen_ordinal(n, p, levels = 2, cor_mat, empirical = FALSE)Arguments
n | Number of observations (n). |
p | Number of variables (p). |
levels | Number of categories (defaults to 2; binary data). |
cor_mat | Ap byp matrix including the true correlation structure. |
empirical | Logical. If true, |
Value
An byp data matrix.
Note
In order to allow users to enjoy the functionality ofBGGM, we had to make minor changes to the functionrmvord_naivfrom theR packageorddata (Leisch et al. 2010). All rights to, and credit for, the functionrmvord_naivbelong to the authors of that package.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.A copy of the GNU General Public License is available online.
References
Leisch F, Kaiser AWS, Hornik K (2010).orddata: Generation of Artificial Ordinal and Binary Data.R package version 0.1/r4,https://R-Forge.R-project.org/projects/orddata/.
Examples
######################################### example 1 ############################################main <- ptsd_cor1[1:5,1:5]p <- ncol(main)pcors <- -(cov2cor(solve(main)) -diag(p))diag(pcors) <- 1pcors <- ifelse(abs(pcors) < 0.05, 0, pcors)inv <- -pcorsdiag(inv) <- 1cors <- cov2cor( solve(inv))# example dataY <- BGGM::gen_ordinal(n = 500, p = 5, levels = 2, cor_mat = cors, empirical = FALSE)######################################### example 2 ############################################# empirical = TRUEY <- gen_ordinal(n = 500, p = 16, levels = 5, cor_mat = ptsd_cor1, empirical = TRUE)GGM Compare: Confirmatory Hypothesis Testing
Description
Confirmatory hypothesis testing for comparing GGMs. Hypotheses are expressed as equalityand/or ineqaulity contraints on the partial correlations of interest. Here the focus isnoton determining the graph (seeexplore) but testing specific hypotheses related tothe conditional (in)dependence structure. These methods were introduced inWilliams and Mulder (2019) and in Williams et al. (2020)
Usage
ggm_compare_confirm( ..., hypothesis, formula = NULL, type = "continuous", mixed_type = NULL, prior_sd = 0.5, iter = 25000, impute = TRUE, progress = TRUE, seed = NULL)Arguments
... | At least two matrices (or data frame) of dimensionsn (observations) byp (nodes). |
hypothesis | Character string. The hypothesis (or hypotheses) to be tested. See notes for futher details. |
formula | an object of class |
type | Character string. Which type of data for |
mixed_type | numeric vector. An indicator of length p for which varibles should be treated as ranks.(1 for rank and 0 to assume normality). The default is currently (dev version) to treat all integer variablesas ranks when |
prior_sd | Numeric. The scale of the prior distribution (centered at zero),in reference to a beta distribtuion (defaults to 0.5). |
iter | Number of iterations (posterior samples; defaults to 25,000). |
impute | Logicial. Should the missing values ( |
progress | Logical. Should a progress bar be included (defaults to |
seed | An integer for the random seed. |
Details
The hypotheses can be written either with the respective column names or numbers.For example,g1_1--2 denotes the relation between the variables in column 1 and 2 for group 1.Theg1_ is required and the only difference fromconfirm (one group).Note that these must correspond to the upper triangular elements of the correlationmatrix. This is accomplished by ensuring that the first number is smaller than the second number.This also applies when using column names (i.e,, in reference to the column number).
One Hypothesis:
To test whether a relation in larger in one group, while both are expectedto be positive, this can be written as
hyp <- c(g1_1--2 > g2_1--2 > 0)
This is then compared to the complement.
More Than One Hypothesis:
The above hypothesis can also be compared to, say, a null model by using ";"to seperate the hypotheses, for example,
hyp <- c(g1_1--2 > g2_1--2 > 0; g1_1--2 = g2_1--2 = 0).
Any number of hypotheses can be compared this way.
Using "&"
It is also possible to include&. This allows for testing one constraintandanother contraint as one hypothesis.
hyp <- c("g1_A1--A2 > g2_A1--A2 & g1_A1--A3 = g2_A1--A3")
Of course, it is then possible to include additional hypotheses by separating them with ";".
Testing Sums
It might also be interesting to test the sum of partial correlations. For example, that thesum of specific relations in one group is larger than the sum in another group.
hyp <- c("g1_A1--A2 + g1_A1--A3 > g2_A1--A2 + g2_A1--A3; g1_A1--A2 + g1_A1--A3 = g2_A1--A2 + g2_A1--A3")
Potential Delays:
There is a chance for a potentially long delay from the time the progress bar finishesto when the function is done running. This occurs when the hypotheses require furthersampling to be tested, for example, when grouping relationsc("(g1_A1--A2, g2_A2--A3) > (g2_A1--A2, g2_A2--A3)".This is not an error.
Controlling for Variables:
When controlling for variables, it is assumed thatY includesonlythe nodes in the GGM and the control variables. Internally,only the predictorsthat are included informula are removed fromY. This is not behavior of, say,lm, but was adopted to ensure users do not have to write out each variable thatshould be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables (Hoff 2007). This is based on theranked likelihood which requires sampling the ranks for each variable (i.e., the data is not merelytransformed to ranks). This is computationally expensive when there are many levels. For example,with continuous data, there are as many ranks as data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise. This is accomplished by specifying an indicatorvector of lengthp. A one indicates to use the ranks, whereas a zero indicates to "ignore"that variable. By default all integer variables are handled as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sd(i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Imputing Missing Values:
Missing values are imputed with the approach described in Hoff (2009).The basic idea is to impute the missing values with the respective posterior pedictive distribution,given the observed data, as the model is being estimated. Note that the default isTRUE,but this ignored when there are no missing values. If set toFALSE, and there are missingvalues, list-wise deletion is performed withna.omit.
Value
The returned object of classconfirm contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
out_hyp_probPosterior hypothesis probabilities.infoAn object of classBFfrom the R packageBFpack(Mulder et al. 2019)
Note
"Default" Prior:
In Bayesian statistics, a default Bayes factor needs to have several properties. I referinterested users to section 2.2 in Dablander et al. (2020). InWilliams and Mulder (2019), some of these propteries were investigated (e.g.,model selection consistency). That said, we would not consider this a "default" or "automatic"Bayes factor and thus we encourage users to perform sensitivity analyses by varying the scale ofthe prior distribution (prior_sd).
Furthermore, it is important to note there is no "correct" prior and, also, there is no needto entertain the possibility of a "true" model. Rather, the Bayes factor can be interpreted aswhich hypothesis best (relative to each other) predicts the observed data(Section 3.2 in Kass and Raftery 1995).
Interpretation of Conditional (In)dependence Models for Latent Data:
SeeBGGM-package for details about interpreting GGMs based on latent data(i.e, all data types besides"continuous")
References
Dablander F, Bergh Dvd, Ly A, Wagenmakers E (2020).“Default Bayes Factors for Testing the (In) equality of Several Population Variances.”arXiv preprint arXiv:2003.06278.
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Hoff PD (2009).A first course in Bayesian statistical methods, volume 580.Springer.
Kass RE, Raftery AE (1995).“Bayes Factors.”Journal of the American Statistical Association,90(430), 773–795.
Mulder J, Gu X, Olsson-Collentine A, Tomarken A, Böing-Messing F, Hoijtink H, Meijerink M, Williams DR, Menke J, Fox J, others (2019).“BFpack: Flexible Bayes Factor Testing of Scientific Theories in R.”arXiv preprint arXiv:1911.07728.
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
Williams DR, Rast P, Pericchi LR, Mulder J (2020).“Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.”Psychological Methods.doi:10.1037/met0000254.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi################################### example 1: continuous #################################### malesYmale <- subset(Y, gender == 1, select = -c(education, gender))[,1:5]# femalesYfemale <- subset(Y, gender == 2, select = -c(education, gender))[,1:5] # exhaustive hypothesis <- c("g1_A1--A2 > g2_A1--A2; g1_A1--A2 < g2_A1--A2; g1_A1--A2 = g2_A1--A2")# test hyptest <- ggm_compare_confirm(Ymale, Yfemale, hypothesis = hypothesis, iter = 250, progress = FALSE)# print (evidence not strong)test############################################# example 2: sensitivity to prior ############################################## continued from example 1# decrease prior SDtest <- ggm_compare_confirm(Ymale, Yfemale, prior_sd = 0.1, hypothesis = hypothesis, iter = 250, progress = FALSE)# printtest# indecrease prior SDtest <- ggm_compare_confirm(Ymale, Yfemale, prior_sd = 0.28, hypothesis = hypothesis, iter = 250, progress = FALSE)# printtest#################################### example 3: mixed data #####################################hypothesis <- c("g1_A1--A2 > g2_A1--A2; g1_A1--A2 < g2_A1--A2; g1_A1--A2 = g2_A1--A2")# test (1000 for example)test <- ggm_compare_confirm(Ymale, Yfemale, type = "mixed", hypothesis = hypothesis, iter = 250, progress = FALSE)# printtest################################### example 4: control #################################### control for education# dataY <- bfi# malesYmale <- subset(Y, gender == 1, select = -c(gender))[,c(1:5, 26)]# femalesYfemale <- subset(Y, gender == 2, select = -c(gender))[,c(1:5, 26)]# testtest <- ggm_compare_confirm(Ymale, Yfemale, formula = ~ education, hypothesis = hypothesis, iter = 250, progress = FALSE)# printtest########################################## example 5: many relations ########################################### dataY <- bfihypothesis <- c("g1_A1--A2 > g2_A1--A2 & g1_A1--A3 = g2_A1--A3; g1_A1--A2 = g2_A1--A2 & g1_A1--A3 = g2_A1--A3; g1_A1--A2 = g2_A1--A2 = g1_A1--A3 = g2_A1--A3")Ymale <- subset(Y, gender == 1, select = -c(education, gender))[,1:5]# femalesYfemale <- subset(Y, gender == 2, select = -c(education, gender))[,1:5]test <- ggm_compare_confirm(Ymale, Yfemale, hypothesis = hypothesis, iter = 250, progress = FALSE)# printtestGGM Compare: Estimate
Description
Compare partial correlations that are estimated from any number of groups. This method works forcontinuous, binary, ordinal, and mixed data (a combination of categorical and continuous variables).The approach (i.e., a difference between posterior distributions) wasdescribed in Williams (2018).
Usage
ggm_compare_estimate( ..., formula = NULL, type = "continuous", mixed_type = NULL, analytic = FALSE, prior_sd = sqrt(1/3), iter = 5000, impute = TRUE, progress = TRUE, seed = 1)Arguments
... | Matrices (or data frames) of dimensionsn (observations) byp (variables).Requires at least two. |
formula | An object of class |
type | Character string. Which type of data forY ? The options include |
mixed_type | Numeric vector. An indicator of lengthp for which varibles should be treated as ranks.(1 for rank and 0 to use the 'empirical' or observed distribution). The default is currently to treat all integer variablesas ranks when |
analytic | Logical. Should the analytic solution be computed (default is |
prior_sd | The scale of the prior distribution (centered at zero), in reference to a beta distribtuion (defaults to sqrt(1/3)). See note for further details. |
iter | Number of iterations (posterior samples; defaults to 5000). |
impute | Logicial. Should the missing values ( |
progress | Logical. Should a progress bar be included (defaults to |
seed | An integer for the random seed. |
Details
This function can be used to compare the partial correlations for any number of groups.This is accomplished with pairwise comparisons for each relation. In the case of three groups,for example, group 1 and group 2 are compared, then group 1 and group 3 are compared, and thengroup 2 and group 3 are compared. There is a full distibution for each difference that can besummarized (i.e.,summary.ggm_compare_estimate) and then visualized(i.e.,plot.summary.ggm_compare_estimate). The graph of difference is selected withselect.ggm_compare_estimate).
Controlling for Variables:
When controlling for variables, it is assumed thatY includesonlythe nodes in the GGM and the control variables. Internally,only the predictorsthat are included informula are removed fromY. This is not behavior of, say,lm, but was adopted to ensure users do not have to write out each variable thatshould be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables. This is based on the ranked likelihood which requires samplingthe ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationallyexpensive when there are many levels. For example, with continuous data, there are as many ranksas data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise. This is accomplished by specifying an indicatorvector of lengthp. A one indicates to use the ranks, whereas a zero indicates to "ignore"that variable. By default all integer variables are handled as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sd(i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Imputing Missing Values:
Missing values are imputed with the approach described in Hoff (2009).The basic idea is to impute the missing values with the respective posterior pedictive distribution,given the observed data, as the model is being estimated. Note that the default isTRUE,but this ignored when there are no missing values. If set toFALSE, and there are missingvalues, list-wise deletion is performed withna.omit.
Value
A list of classggm_compare_estimate containing:
pcor_diffspartial correlation differences (posterior distribution)pnumber of variableinfolist containing information about each group (e.g., sample size, etc.)iternumber of posterior samplescallmatch.call
Note
Mixed Data:
The mixed data approach was introduced in Hoff (2007)(our paper describing an extension to Bayesian hypothesis testing if forthcoming).This is a semi-paramateric copula model based on the ranked likelihood. This is computationallyexpensive when treating continuous data as ranks. The current default is to treat only integer data as ranks.This should of course be adjusted for continous data that is skewed. This can be accomplished with theargumentmixed_type. A1 in the numeric vector of lengthpindicates to treat thatrespective node as a rank (corresponding to the column number) and a zero indicates to use the observed(or "emprical") data.
It is also important to note thattype = "mixed" is not restricted to mixed data (containing a combination ofcategorical and continuous): all the nodes can be ordinal or continuous (but again this will take some time).
Interpretation of Conditional (In)dependence Models for Latent Data:
SeeBGGM-package for details about interpreting GGMs based on latent data(i.e, all data types besides"continuous")
Additional GGM Compare Methods
Bayesian hypothesis testing is implemented inggm_compare_explore andggm_compare_confirm (Williams and Mulder 2019). The latter allows for confirmatoryhypothesis testing. An approach based on a posterior predictive check is implemented inggm_compare_ppc(Williams et al. 2020). This provides a 'global' test for comparing the entire GGM and a 'nodewise'test for comparing each variable in the network Williams (2018).
References
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Hoff PD (2009).A first course in Bayesian statistical methods, volume 580.Springer.
Williams DR (2018).“Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons.”arXiv.doi:10.31234/OSF.IO/X8DPR.
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
Williams DR, Rast P, Pericchi LR, Mulder J (2020).“Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.”Psychological Methods.doi:10.1037/met0000254.
Examples
# note: iter = 250 for demonstrative purposes# data: Remove missings for "ordinal"Y <- bfi[complete.cases(bfi),]# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:10]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:10]# fit modelfit <- ggm_compare_estimate(Ymale, Yfemale, type = "ordinal", iter = 250, progress = FALSE)############################## example 2: analytic ############################### only continuous# fit modelfit <- ggm_compare_estimate(Ymale, Yfemale, analytic = TRUE)# summarysumm <- summary(fit)# plot summaryplt_summ <- plot(summary(fit))# selectE <- select(fit)# plot selectplt_E <- plot(select(fit))GGM Compare: Exploratory Hypothesis Testing
Description
Compare Gaussian graphical models with exploratory hypothesis testing using the matrix-F priordistribution (Mulder and Pericchi 2018). A test for each partial correlation in the model for any numberof groups. This provides evidence for the null hypothesis of no difference and the alternative hypothesisof difference. With more than two groups, the test is forall groups simultaneously (i.e., the relationis the same or different in all groups). This method was introduced in Williams et al. (2020).For confirmatory hypothesis testing seeconfirm_groups.
Usage
ggm_compare_explore( ..., formula = NULL, type = "continuous", mixed_type = NULL, analytic = FALSE, prior_sd = sqrt(1/3), iter = 5000, progress = TRUE, seed = 1)Arguments
... | At least two matrices (or data frame) of dimensionsn (observations) byp (variables). |
formula | An object of class |
type | Character string. Which type of data for |
mixed_type | Numeric vector. An indicator of length p for which varibles should be treated as ranks.(1 for rank and 0 to assume normality). The default is currently (dev version) to treat all integer variablesas ranks when |
analytic | logical. Should the analytic solution be computed (default is |
prior_sd | Numeric. The scale of the prior distribution (centered at zero), in reference to a beta distribtuion.The 'default' is sqrt(1/3) for a flat prior. See note for further details. |
iter | number of iterations (posterior samples; defaults to 5000). |
progress | Logical. Should a progress bar be included (defaults to |
seed | An integer for the random seed. |
Details
Controlling for Variables:
When controlling for variables, it is assumed thatY includesonlythe nodes in the GGM and the control variables. Internally,only the predictorsthat are included informula are removed fromY. This is not behavior of, say,lm, but was adopted to ensure users do not have to write out each variable thatshould be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables. This is based on the ranked likelihood which requires samplingthe ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationallyexpensive when there are many levels. For example, with continuous data, there are as many ranksas data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise. This is accomplished by specifying an indicatorvector of lengthp. A one indicates to use the ranks, whereas a zero indicates to "ignore"that variable. By default all integer variables are handled as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sdbelow sqrt(1/3) (i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Value
The returned object of classggm_compare_explore contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
BF_01Ap byp matrix includingthe Bayes factor for the null hypothesis.pcor_diffAp byp matrix includingthe difference in partial correlations (only for two groups).sampA list containing the fitted models (of classexplore) for each group.
Note
"Default" Prior:
In Bayesian statistics, a default Bayes factor needs to have several properties. I referinterested users to section 2.2 in Dablander et al. (2020). InWilliams and Mulder (2019), some of these propteries were investigated, suchmodel selection consistency. That said, we would not consider this a "default" Bayes factor andthus we encourage users to perform sensitivity analyses by varying the scale of the priordistribution.
Furthermore, it is important to note there is no "correct" prior and, also, there is no needto entertain the possibility of a "true" model. Rather, the Bayes factor can be interpreted aswhich hypothesis best (relative to each other) predicts the observed data(Section 3.2 in Kass and Raftery 1995).
Interpretation of Conditional (In)dependence Models for Latent Data:
SeeBGGM-package for details about interpreting GGMs based on latent data(i.e, all data types besides"continuous")
References
Dablander F, Bergh Dvd, Ly A, Wagenmakers E (2020).“Default Bayes Factors for Testing the (In) equality of Several Population Variances.”arXiv preprint arXiv:2003.06278.
Kass RE, Raftery AE (1995).“Bayes Factors.”Journal of the American Statistical Association,90(430), 773–795.
Mulder J, Pericchi L (2018).“The Matrix-F Prior for Estimating and Testing Covariance Matrices.”Bayesian Analysis, 1–22.ISSN 19316690,doi:10.1214/17-BA1092.
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
Williams DR, Rast P, Pericchi LR, Mulder J (2020).“Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.”Psychological Methods.doi:10.1037/met0000254.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi[complete.cases(bfi),]# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:10]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:10]############################# example 1: ordinal ############################## fit modelfit <- ggm_compare_explore(Ymale, Yfemale, type = "ordinal", iter = 250, progress = FALSE)# summarysumm <- summary(fit)# edge setE <- select(fit)GGM Compare: Posterior Predictive Check
Description
Compare GGMs with a posterior predicitve check (Gelman et al. 1996).This method was introduced in Williams et al. (2020). Currently,there is aglobal (the entire GGM) and anodewise test. The defaultis to compare GGMs with respect to the posterior predictive distribution of KullbackLeibler divergence and the sum of squared errors. It is also possible to compare theGGMs with a user defined test-statistic.
Usage
ggm_compare_ppc( ..., test = "global", iter = 5000, FUN = NULL, custom_obs = NULL, loss = TRUE, progress = TRUE)Arguments
... | At least two matrices (or data frames) of dimensionsn (observations) byp (variables). |
test | Which test should be performed (defaults to |
iter | Number of replicated datasets used to construct the predictivie distribution(defaults to 5000). |
FUN | An optional function for comparing GGMs that returns a number. SeeDetails. |
custom_obs | Number corresponding to the observed score for comparing the GGMs. This isrequired if a function is provided in |
loss | Logical. If a function is provided, is the measure a "loss function"(i.e., a large score is bad thing). This determines how thep-valueis computed. SeeDetails. |
progress | Logical. Should a progress bar be included (defaults to |
Details
TheFUN argument allows for a user defined test-statisic (the measure used to compare the GGMs).The function must include only two agruments, each of which corresponds to a dataset. For example,f <- function(Yg1, Yg2), where each Y is dataset of dimensionsn byp. Thegroups are then compare within the function, returning a single number. An example is provided below.
Further, when using a custom function care must be taken when specifying the argumentloss.We recommended to visualize the results withplot to ensure thep-value was computedin the right direction.
Value
The returned object of classggm_compare_ppc contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
test = "global"
ppp_jsdposterior predictivep-values (JSD).ppp_sseposterior predictivep-values (SSE).predictive_jsdlist containing the posterior predictive distributions (JSD).predictive_sselist containing the posterior predictive distributions (SSE).obs_jsdlist containing the observed error (JSD).obs_sselist containing the observed error (SSE).
test = "nodewise"
ppp_jsdposterior predictivep-values (JSD).predictive_jsdlist containing the posterior predictive distributions (JSD).obs_jsdlist containing the observed error (JSD).
FUN = f()
ppp_customposterior predictivep-values (custom).predictive_customposterior predictive distributions (custom).obs_customobserved error (custom).
Note
Interpretation:
The primary test-statistic is symmetric KL-divergence that is termed Jensen-Shannon divergence (JSD).This is in essence a likelihood ratio that provides the "distance" between two multivariate normaldistributions. The basic idea is to (1) compute the posterior predictive distribution, assuming group equality(the null model). This provides the error that we would expect to see under the null model; (2) computeJSD for the observed groups; and (3) compare the observed JSD to the posterior predictive distribution,from which a posterior predictivep-value is computed.
For theglobal check, the sum of squared error is also provided.This is computed from the partial correlation matrices and it is analagousto the strength test in van Borkulo et al. (2017). Thenodewisetest compares the posterior predictive distribution for each node. This is based on the correspondencebetween the inverse covariance matrix and multiple regresssion (Kwan 2014; Stephens 1998).
If the null model isnot rejected, note that this doesnot provide evidence for equality!Further, if the null model is rejected, this means that the assumption of group equality is not tenable–thegroups are different.
Alternative Methods:
There are several methods inBGGM for comparing groups. Seeggm_compare_estimate (posterior differences for thepartial correlations),ggm_compare_explore (exploratory hypothesis testing),andggm_compare_confirm (confirmatory hypothesis testing).
References
Gelman A, Meng X, Stern H (1996).“Posterior predictive assessment of model fitness via realized discrepancies.”Statistica sinica, 733–760.
Kwan CC (2014).“A regression-based interpretation of the inverse of the sample covariance matrix.”Spreadsheets in Education,7(1), 4613.
Stephens G (1998).“On the Inverse of the Covariance Matrix in Portfolio Analysis.”The Journal of Finance,53(5), 1821–1827.
Williams DR, Rast P, Pericchi LR, Mulder J (2020).“Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.”Psychological Methods.doi:10.1037/met0000254.
van Borkulo CD, Boschloo L, Kossakowski J, Tio P, Schoevers RA, Borsboom D, Waldorp LJ (2017).“Comparing network structures on three aspects: A permutation test.”Manuscript submitted for publication.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi###################################### global ########################################## malesYm <- subset(Y, gender == 1, select = - c(gender, education))# femalesYf <- subset(Y, gender == 2, select = - c(gender, education))global_test <- ggm_compare_ppc(Ym, Yf, iter = 250)global_test################################### custom function #################################### example 1# maximum difference van Borkulo et al. (2017)f <- function(Yg1, Yg2){# remove NAx <- na.omit(Yg1)y <- na.omit(Yg2)# nodesp <- ncol(Yg1)# identity matrixI_p <- diag(p)# partial correlationspcor_1 <- -(cov2cor(solve(cor(x))) - I_p)pcor_2 <- -(cov2cor(solve(cor(y))) - I_p)# max differencemax(abs((pcor_1[upper.tri(I_p)] - pcor_2[upper.tri(I_p)])))}# observed differenceobs <- f(Ym, Yf)global_max <- ggm_compare_ppc(Ym, Yf, iter = 250, FUN = f, custom_obs = obs, progress = FALSE)global_max# example 2# Hamming distance (squared error for adjacency)f <- function(Yg1, Yg2){# remove NAx <- na.omit(Yg1)y <- na.omit(Yg2)# nodesp <- ncol(x)# identity matrixI_p <- diag(p)fit1 <- estimate(x, analytic = TRUE)fit2 <- estimate(y, analytic = TRUE)sel1 <- select(fit1)sel2 <- select(fit2)sum((sel1$adj[upper.tri(I_p)] - sel2$adj[upper.tri(I_p)])^2)}# observed differenceobs <- f(Ym, Yf)global_hd <- ggm_compare_ppc(Ym, Yf, iter = 250, FUN = f, custom_obs = obs, progress = FALSE)global_hd##################################### nodewise #######################################nodewise <- ggm_compare_ppc(Ym, Yf, iter = 250, test = "nodewise")nodewisePerform Bayesian Graph Search and Optional Model Averaging
Description
The 'ggm_search' function performs a Bayesian graph search to identify the most probable graph structure (MAP solution) using the Metropolis-Hastings algorithm. It also computes an optional Bayesian Model Averaged (BMA) solution across the graph structures sampled during the search.
Usage
ggm_search( x, n = NULL, method = "mc3", prior_prob = 0.3, iter = 5000, stop_early = 1000, bma_mean = TRUE, seed = NULL, progress = TRUE, ...)Arguments
x | Data, either raw data or covariance matrix |
n | For x = covariance matrix, provide number of observations |
method | mc3 defaults to MH sampling |
prior_prob | Prior prbability of sparseness. |
iter | Number of iterations#@param burn_in Burn in. Defaults to iter/2 |
stop_early | Default to 1000. Stop MH algorithm if proposals keep being rejected (stopping by default after 1000 rejections). |
bma_mean | Compute Bayesian Model Averaged solution |
seed | Set seed. Current default is to set R's random seed. |
progress | Show progress bar, defaults to TRUE |
... | Not currently in use |
Details
This function is ideal for exploring the graph space and obtaining an initial estimate of the graph structure or adjacency matrix.
To refine the results or compute posterior distributions of graph parameters (e.g., partial correlations), use thebma_posterior function, which builds on the output of 'ggm_search' to account for parameter uncertainty.
Value
A list containing the MAP graph structure, BMA solution (if specified),and posterior probabilities of the sampled graphs.
Author(s)
Donny Williams and Philippe Rast
See Also
Data: 1994 General Social Survey
Description
A data frame containing 1002 rows and 7 variables measured on various scales,including binary and ordered cateogrical (with varying numbers of categories).There are also missing values in each variable
IncIncome of the respondent in 1000s of dollars, binned into 21 ordered categories.DEGHighest degree ever obtained (none, HS, Associates, Bachelors, or Graduate)CHILDNumber of children ever had.PINCFinancial status of respondent's parents when respondent was 16 (on a 5-point scale).PDEGMaximum of mother's and father's highest degreePCHILDNumber of siblings of the respondent plus oneAGEAge of the respondent in years.
Usage
data("gss")Format
A data frame containing 1190 observations (n = 1190) and 6 variables (p = 6) measured on the binary scale(Fowlkes et al. 1988). The variable descriptions were copied fromsection 4, Hoff (2007)
References
Fowlkes EB, Freeny AE, Landwehr JM (1988).“Evaluating logistic models for large contingency tables.”Journal of the American Statistical Association,83(403), 611–622.
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Examples
data("gss")Data: ifit Intensive Longitudinal Data
Description
A data frame containing 8 variables and nearly 200 observations. There aretwo subjects, each of which provided data every data for over 90 days. Six variables are fromthe PANAS scale (positive and negative affect), the daily number of steps, and the subject id.
idSubject idinteresteddisinterestedexcitedupsetstrongstressedstepssteps recorded by a fit bit
Usage
data("ifit")Format
A data frame containing 197 observations and 8 variables. The data have been used in(O'Laughlin et al. 2020) and (Williams et al. 2019)
References
O'Laughlin KD, Liu S, Ferrer E (2020).“Use of Composites in Analysis of Individual Time Series: Implications for Person-Specific Dynamic Parameters.”Multivariate Behavioral Research, 1–18.
Williams DR, Liu S, Martin SR, Rast P (2019).“Bayesian Multivariate Mixed-Effects Location Scale Modeling of Longitudinal Relations among Affective Traits, States, and Physical Activity.”PsyArXiv.doi:10.31234/osf.io/4kfjp.
Examples
data("ifit")Obtain Imputed Datasets
Description
Impute missing values, assuming a multivariate normal distribution, with the posteriorpredictive distribution. For binary, ordinal, and mixed (a combination of discrete and continuous)data, the values are first imputed for the latent data and then converted to the original scale.
Usage
impute_data( Y, type = "continuous", lambda = NULL, mixed_type = NULL, iter = 1000, progress = TRUE)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
type | Character string. Which type of data for |
lambda | Numeric. A regularization parameter, which defaults to p + 2. A larger value resultsin more shrinkage. |
mixed_type | Numeric vector. An indicator of lengthp for which variables should be treated as ranks.(1 for rank and 0 to assume the observed marginal distribution).The default is currently to treat all integer variables as ranks when |
iter | Number of iterations (posterior samples; defaults to 1000). |
progress | Logical. Should a progress bar be included (defaults to |
Details
Missing values are imputed with the approach described in Hoff (2009).The basic idea is to impute the missing values with the respective posterior pedictive distribution,given the observed data, as the model is being estimated. Note that the default isTRUE,but this ignored when there are no missing values. If set toFALSE, and there are missingvalues, list-wise deletion is performed withna.omit.
Value
An object of classmvn_imputation:
imputed_datasetsAn array including the imputed datasets.
References
Hoff PD (2009).A first course in Bayesian statistical methods, volume 580.Springer.
Examples
# obsn <- 5000# n missingn_missing <- 1000# variablesp <- 16# dataY <- MASS::mvrnorm(n, rep(0, p), ptsd_cor1)# for checkingYmain <- Y# all possible indicesindices <- which(matrix(0, n, p) == 0, arr.ind = TRUE)# random sample of 1000 missing valuesna_indices <- indices[sample(5:nrow(indices), size = n_missing, replace = FALSE),]# fill with NAY[na_indices] <- NA# missing = 1Y_miss <- ifelse(is.na(Y), 1, 0)# true values (to check)true <- unlist(sapply(1:p, function(x) Ymain[which(Y_miss[,x] == 1),x] ))# imputefit_missing <- impute_data(Y, progress = FALSE, iter = 250)# imputefit_missing <- impute_data(Y, progress = TRUE, iter = 250)Data: Interpersonal Reactivity Index (IRI)
Description
A dataset containing items from the Interpersonal Reactivity Index (IRI; an empathy measure). There are 28 variables and1973 observations
Usage
data("iri")Format
A data frame with 28 variables and 1973 observations (5 point Likert scale)
Details
1I daydream and fantasize, with some regularity, about things that might happen to me.2I often have tender, concerned feelings for people less fortunate than me.3I sometimes find it difficult to see things from the "other guy's" point of view.4Sometimes I don't feel very sorry for other people when they are having problems.5I really get involved with the feelings of the characters in a novel.6In emergency situations, I feel apprehensive and ill-at-ease.7I am usually objective when I watch a movie or play, and I don't often get completely caught up in it.8I try to look at everybody's side of a disagreement before I make a decision.9When I see someone being taken advantage of, I feel kind of protective towards them.10I sometimes feel helpless when I am in the middle of a very emotional situation.11I sometimes try to understand my friends betterby imagining how things look from their perspective12Becoming extremely involved in a good book or movie is somewhat rare for me.13When I see someone get hurt, I tend to remain calm.14Other people's misfortunes do not usually disturb me a great deal.15If I'm sure I'm right about something, I don't waste muchtime listening to other people's arguments.16After seeing a play or movie, I have felt as though I were one of the characters.17Being in a tense emotional situation scares me.18When I see someone being treated unfairly,I sometimes don't feel very much pity for them.19I am usually pretty effective in dealing with emergencies.20I am often quite touched by things that I see happen.21I believe that there are two sides to every question and try to look at them both.22I would describe myself as a pretty soft-hearted person.23When I watch a good movie, I can very easily put myself inthe place of a leading character24I tend to lose control during emergencies.25When I'm upset at someone, I usually try to "put myself in his shoes" for a while.26When I am reading an interesting story or novel, I imagine how I would feel if theevents in the story were happening to me.27When I see someone who badly needs help in an emergency, I go to pieces.28Before criticizing somebody, I try to imagine how I would feel if I were in their place.gender"M" (male) or "F" (female)
Note
There are four domains
Fantasy: items 1, 5, 7, 12, 16, 23, 26
Perspective taking: items 3, 8, 11, 15, 21, 25, 28
Empathic concern: items 2, 4, 9, 14, 18, 20, 22
Personal distress: items 6, 10, 13, 17, 19, 24, 27,
References
Briganti, G., Kempenaers, C., Braun, S., Fried, E. I., & Linkowski, P. (2018). Network analysis ofempathy items from the interpersonal reactivity index in 1973young adults. Psychiatry research, 265, 87-92.
Examples
data("iri")Maximum A Posteriori Precision Matrix
Description
Maximum A Posteriori Precision Matrix
Usage
map(Y)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
Value
An object of classmap, including the precision matrix,partial correlation matrix, and regression parameters.
Examples
Y <- BGGM::bfi[, 1:5]# mapmap <- map(Y)mapExtract the Partial Correlation Matrix
Description
Extract the partial correlation matrix (posterior mean)fromestimate,explore,ggm_compare_estimate,andggm_compare_explore objects. It is also possible to extract thepartial correlation differences forggm_compare_estimate andggm_compare_explore objects.
Usage
pcor_mat(object, difference = FALSE, ...)Arguments
object | A model estimated withBGGM. All classes are supported, assumingthere is matrix to be extracted. |
difference | Logical. Should the difference be returned (defaults to |
... | Currently ignored. |
Value
The estimated partial correlation matrix.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- ptsd[,1:5] + 1# ordinalfit <- estimate(Y, type = "ordinal", iter = 250, progress = FALSE)pcor_mat(fit)Partial Correlation Sum
Description
Compute and test partial correlation sums either within or between GGMs(e.g., different groups), resulting in a posterior distribution.
Usage
pcor_sum(..., iter = NULL, relations)Arguments
... | An object of class |
iter | Number of iterations (posterior samples; defaults to the number in the object). |
relations | Character string. Which partial correlations should be summed? |
Details
Some care must be taken when writing the string forpartial_sum. Below are several examples
Just a Sum:Perhaps a sum is of interest, and not necessarily the difference of two sums. This can be written as
partial_sum <- c("A1--A2 + A1--A3 + A1--A4")
which will sum those relations.
Comparing Sums:When comparing sums, each must be seperated by ";". For example,
partial_sum <- c("A1--A2 + A1--A3; A1--A2 + A1--A4")
which will sum both and compute the difference. Note that there cannot be more than two sums, suchthatc("A1--A2 + A1--A3; A1--A2 + A1--A4; A1--A2 + A1--A5") will result in an error.
Comparing Groups:
When more than one fitted object is suppled toobject it is assumed that the groupsshould be compared for the same sum. Hence, in this case, only the sum needs to be written.
partial_sum <- c("A1--A2 + A1--A3 + A1--A4")
The above results in that sum being computed for each group and then compared.
Value
An object of classposterior_sum, including the sum and possibly the difference fortwo sums.
Examples
# dataY <- bfi# malesY_males <- subset(Y, gender == 1, select = -c(education, gender))[,1:5]# femalesY_females <- subset(Y, gender == 2, select = -c(education, gender))[,1:5]# malesfit_males <- estimate(Y_males, seed = 1, progress = FALSE)# fit femalesfit_females <- estimate(Y_females, seed = 2, progress = FALSE)sums <- pcor_sum(fit_males, fit_females, relations = "A1--A2 + A1--A3")# printsums# plot differenceplot(sums)[[3]]Compute Correlations from the Partial Correlations
Description
Convert the partial correlation matrices into correlation matrices. To our knowledge,this is the only Bayesianimplementation inR that can estiamte Pearson's, tetrachoric (binary), polychoric(ordinal with more than two cateogries), and rank based correlation coefficients.
Usage
pcor_to_cor(object, iter = NULL)Arguments
object | An object of class |
iter | numeric. How many iterations (i.e., posterior samples) should be used ?The default uses all of the samples, but note that this can take a longtime with large matrices. |
Value
RAn array including the correlation matrices(of dimensionsp byp byiter)R_meanPosterior mean of the correlations (of dimensionsp byp)
Note
The 'default' prior distributions are specified for partial correlations in particular. Thismeans that the implied prior distribution will not be the same for the correlations.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- BGGM::ptsd############################### continuous ################################# estimate the modelfit <- estimate(Y, iter = 250, progress = FALSE)# compute correlationscors <- pcor_to_cor(fit)############################### ordinal ################################### first level must be 1 !Y <- Y + 1# estimate the modelfit <- estimate(Y, type = "ordinal", iter = 250, progress = FALSE)# compute correlationscors <- pcor_to_cor(fit)################################ mixed ################################ rank based correlations# estimate the modelfit <- estimate(Y, type = "mixed", iter = 250, progress = FALSE)# compute correlationscors <- pcor_to_cor(fit)Plotconfirm objects
Description
Plot the posterior hypothesis probabilities as a pie chart, witheach slice corresponding the probability of a given hypothesis.
Usage
## S3 method for class 'confirm'plot(x, ...)Arguments
x | An object of class |
... | Currently ignored. |
Value
Aggplot object.
Examples
########################################## example 1: many relations ########################################### dataY <- bfihypothesis <- c("g1_A1--A2 > g2_A1--A2 & g1_A1--A3 = g2_A1--A3; g1_A1--A2 = g2_A1--A2 & g1_A1--A3 = g2_A1--A3; g1_A1--A2 = g2_A1--A2 = g1_A1--A3 = g2_A1--A3")Ymale <- subset(Y, gender == 1, select = -c(education, gender))[,1:5]# femalesYfemale <- subset(Y, gender == 2, select = -c(education, gender))[,1:5]test <- ggm_compare_confirm(Ymale, Yfemale, hypothesis = hypothesis, iter = 250, progress = FALSE)# plotplot(test)Plotggm_compare_ppc Objects
Description
Plot the predictive check withggridges
Usage
## S3 method for class 'ggm_compare_ppc'plot( x, critical = 0.05, col_noncritical = "#84e184A0", col_critical = "red", point_size = 2, ...)Arguments
x | An object of class |
critical | Numeric. The 'significance' level(defaults to |
col_noncritical | Character string. Fill color for the non-critical region(defaults to |
col_critical | Character string. Fill color for the critical region(defaults to |
point_size | Numeric. The point size for the observed score(defaults to |
... | Currently ignored. |
Value
An object (or list of objects) of classggplot.
Note
Seeggridges formany examples.
See Also
Examples
# dataY <- bfi###################################### global ########################################## malesYm <- subset(Y, gender == 1, select = - c(gender, education))# femalesYf <- subset(Y, gender == 2, select = - c(gender, education))global_test <- ggm_compare_ppc(Ym, Yf, iter = 250, progress = FALSE)plot(global_test)Plotpcor_sum Object
Description
Plotpcor_sum Object
Usage
## S3 method for class 'pcor_sum'plot(x, fill = "#CC79A7", ...)Arguments
x | An object of class |
fill | Character string. What fill for the histogram(defaults to colorblind "pink")? |
... | Currently ignored. |
Value
A list ofggplot objects
Note
Examples:
See Also
pcor_sum
Plotpredictability Objects
Description
Plotpredictability Objects
Usage
## S3 method for class 'predictability'plot( x, type = "error_bar", cred = 0.95, alpha = 0.5, scale = 1, width = 0, size = 1, color = "blue", ...)Arguments
x | An object of class |
type | Character string. Which type of plot ? The optionsare |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
alpha | Numeric. Transparancey of the ridges |
scale | Numeric. This controls the overlap of densitiesfor |
width | Numeric. The width of error bar ends (defaults to |
size | Numeric. The size for the points (defaults to |
color | Character string. What color for the point ( |
... | Currently ignored. |
Value
An object of classggplot.
Examples
Y <- ptsd[,1:5]fit <- explore(Y, iter = 250, progress = FALSE)r2 <- predictability(fit, iter = 250, progress = FALSE)plot(r2)Plotroll_your_own Objects
Description
Plotroll_your_own Objects
Usage
## S3 method for class 'roll_your_own'plot(x, fill = "#CC79A7", alpha = 0.5, ...)Arguments
x | An object of class |
fill | Character string specifying the color for the ridges. |
alpha | Numeric. Transparancey of the ridges |
... | Currently ignored |
Value
An object of classggplot
Examples
########################################## example 1: assortment ############################################ assortmentlibrary(assortnet)Y <- BGGM::bfi[,1:10]membership <- c(rep("a", 5), rep("c", 5))# fit modelfit <- estimate(Y = Y, iter = 250, progress = FALSE)# membershipmembership <- c(rep("a", 5), rep("c", 5))# define functionf <- function(x,...){ assortment.discrete(x, ...)$r}net_stat <- roll_your_own(object = fit, FUN = f, types = membership, weighted = TRUE, SE = FALSE, M = 1, progress = FALSE)# plotplot(net_stat)Network Plot forselect Objects
Description
Visualize the conditional (in)dependence structure.
Usage
## S3 method for class 'select'plot( x, layout = "circle", pos_col = "#009E73", neg_col = "#D55E00", node_size = 10, edge_magnify = 1, groups = NULL, palette = "Set3", ...)Arguments
x | An object of class |
layout | Character string. Which graph layout (defaults is |
pos_col | Character string. Color for the positive edges (defaults to |
neg_col | Character string. Color for the negative edges (defaults to |
node_size | Numeric. The size of the nodes (defaults to |
edge_magnify | Numeric. A value that is multiplied by the edge weights. This increases (> 1) ordecrease (< 1) the line widths (defaults to 1). |
groups | A character string of lengthp (the number of nodes in the model).This indicates groups of nodes that should be the same color(e.g., "clusters" or "communities"). |
palette | A character string sepcifying the palette for the |
... | Additional options passed toggnet2 |
Value
An object (or list of objects) of classggplotthat can then be further customized.
Note
A more extensive example of a custom plot isprovidedhere
Examples
############################ example 1: one ggm ############################ dataY <- bfi[,1:25]# estimatefit <- estimate(Y, iter = 250, progress = FALSE)# "communities"comm <- substring(colnames(Y), 1, 1)# edge setE <- select(fit)# plot edge setplt_E <- plot(E, edge_magnify = 5, palette = "Set1", groups = comm)################################ example 2: ggm compare ################################ compare males vs. females# dataY <- bfi[,1:26]Ym <- subset(Y, gender == 1, select = -gender)Yf <- subset(Y, gender == 2, select = -gender)# estimatefit <- ggm_compare_estimate(Ym, Yf, iter = 250, progress = FALSE)# "communities"comm <- substring(colnames(Ym), 1, 1)# edge setE <- select(fit)# plot edge setplt_E <- plot(E, edge_magnify = 5, palette = "Set1", groups = comm)Plotsummary.estimate Objects
Description
Visualize the posterior distributions for each partial correlation.
Usage
## S3 method for class 'summary.estimate'plot(x, color = "black", size = 2, width = 0, ...)Arguments
x | An object of class |
color | Character string. The color for the error bars.(defaults to |
size | Numeric. The size for the points (defaults to |
width | Numeric. The width of error bar ends (defaults to |
... | Currently ignored |
Value
Aggplot object.
See Also
Examples
# dataY <- ptsd[,1:5]fit <- estimate(Y, iter = 250, progress = FALSE)plot(summary(fit))Plotsummary.explore Objects
Description
Visualize the posterior distributions for each partial correlation.
Usage
## S3 method for class 'summary.explore'plot(x, color = "black", size = 2, width = 0, ...)Arguments
x | An object of class |
color | Character string. The color for the error bars.(defaults to |
size | Numeric. The size for the points (defaults to |
width | Numeric. The width of error bar ends (defaults to |
... | Currently ignored |
Value
Aggplot object
See Also
Examples
# note: iter = 250 for demonstrative purposesY <- ptsd[,1:5]fit <- explore(Y, iter = 250, progress = FALSE)plt <- plot(summary(fit))pltPlotsummary.ggm_compare_estimate Objects
Description
Visualize the posterior distribution differences.
Usage
## S3 method for class 'summary.ggm_compare_estimate'plot(x, color = "black", size = 2, width = 0, ...)Arguments
x | An object of class |
color | Character string. The color of the points(defaults to |
size | Numeric. The size of the points (defaults to 2). |
width | Numeric. The width of error bar ends (defaults to |
... | Currently ignored. |
Value
An object of classggplot
See Also
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi[complete.cases(bfi),]# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:10]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:10]# fit modelfit <- ggm_compare_estimate(Ymale, Yfemale, type = "ordinal", iter = 250, prior_sd = 0.25, progress = FALSE)plot(summary(fit))Plotsummary.ggm_compare_explore Objects
Description
Visualize the posterior hypothesis probabilities.
Usage
## S3 method for class 'summary.ggm_compare_explore'plot(x, size = 2, color = "black", ...)Arguments
x | An object of class |
size | Numeric. The size of the points (defaults to 2). |
color | Character string. The color of the points(defaults to |
... | Currently ignored. |
Value
Aggplot object
See Also
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi[complete.cases(bfi),]# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:10]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:10]############################# example 1: ordinal ############################## fit modelfit <- ggm_compare_explore(Ymale, Yfemale, type = "ordinal", iter = 250, progress = FALSE)# summarysumm <- summary(fit)plot(summ)Plotsummary.select.explore Objects
Description
Visualize the posterior hypothesis probabilities.
Usage
## S3 method for class 'summary.select.explore'plot(x, size = 2, color = "black", ...)Arguments
x | An object of class |
size | Numeric. The size for the points (defaults to 2). |
color | Character string. The Color for the points |
... | Currently ignored |
Value
Aggplot object
Examples
# dataY <- bfi[,1:10]# fit modelfit <- explore(Y, iter = 250, progress = FALSE)# edge setE <- select(fit, alternative = "exhaustive")plot(summary(E))Plotsummary.var_estimate Objects
Description
Visualize the posterior distributions of each partial correlation andregression coefficient.
Usage
## S3 method for class 'summary.var_estimate'plot(x, color = "black", size = 2, width = 0, param = "all", order = TRUE, ...)Arguments
x | An object of class |
color | Character string. The color for the error bars.(defaults to |
size | Numeric. The size for the points (defaults to |
width | Numeric. The width of error bar ends (defaults to |
param | Character string. Which parameters should be plotted ? The optionsare |
order | Logical. Should the relations be ordered by size (defaults to |
... | Currently ignored |
Value
A list ofggplot objects.
Examples
# dataY <- subset(ifit, id == 1)[,-1]# fit model with alias (var_estimate also works)fit <- var_estimate(Y, progress = FALSE)plts <- plot(summary(fit))plts$pcor_pltPlot: Prior Distribution
Description
Visualize the implied prior distribution for the partial correlations. This isparticularly useful for the Bayesian hypothesis testing methods.
Usage
plot_prior(prior_sd = 0.5, iter = 5000)Arguments
prior_sd | Scale of the prior distribution, approximately the standard deviationof a beta distribution (defaults to 0.5). |
iter | Number of iterations (prior samples; defaults to 5000). |
Value
Aggplot object.
Examples
# note: iter = 250 for demonstrative purposesplot_prior(prior_sd = 0.25, iter = 250)Posterior Predictive Distribution
Description
Draw samples from the posterior predictive distribution.
Usage
posterior_predict(object, iter = 1000, progress = TRUE)Arguments
object | An object of class |
iter | Numeric. Number of samples from the predictive distribution |
progress | Logical. Should a progress bar be included (defaults to |
Value
A 3D array containing the predicted datasets
Note
Currently only implemented fortype = "mixed",type = "ordinal",andtype = "binary". Note the term mixed is confusing, in that it canbe used with only, say, ordinal data. In this case, reestimate the model withtype = "mixed"until all data types are supported.
Examples
Y <- gssfit <- estimate(as.matrix(Y), impute = TRUE, iter = 150, type = "mixed")yrep <- posterior_predict(fit, iter = 100)Extract Posterior Samples
Description
Extract posterior samples for all parameters.
Usage
posterior_samples(object, ...)Arguments
object | an object of class |
... | currently ignored. |
Value
A matrix of posterior samples for the partial correlation. Note that if controlling forvariables (e.g., formula~ age), the matrix also includes the coefficients from eachmultivariate regression.
Examples
# note: iter = 250 for demonstrative purposes########################################### example 1: control with formula ############################################ (the following works with all data types)# controlling for genderY <- bfi# to control for only gender# (remove education)Y <- subset(Y, select = - education)# fit modelfit <- estimate(Y, formula = ~ gender, iter = 250)# note regression coefficientssamps <- posterior_samples(fit)hist(samps[,1])Precision Matrix Posterior Distribution
Description
Transform the sampled correlation matrices toprecision matrices (i.e., inverse covariance matrices).
Usage
precision(object, progress = TRUE)Arguments
object | An object of class |
progress | Logical. Should a progress bar be included (defaults to |
Value
precision_meanThe mean of the precision matrix (pbypmatrix).precision3d array of dimensionspbypbyiterincludingunconstrained (i.e., from th full graph)precision matrices.
Note
The estimated precision matrix is the inverse of thecorrelation matrix.
Examples
# dataY <- ptsd# fit modelfit <- estimate(Y)# precision matrixTheta <- precision(fit)Model Predictions forestimate Objects
Description
Model Predictions forestimate Objects
Usage
## S3 method for class 'estimate'predict( object, newdata = NULL, summary = TRUE, cred = 0.95, iter = NULL, progress = TRUE, ...)Arguments
object | object of class |
newdata | an optional data frame for obtaining predictions (e.g., on test data) |
summary | summarize the posterior samples (defaults to |
cred | credible interval used for summarizing |
iter | number of posterior samples (defaults to all in the object). |
progress | Logical. Should a progress bar be included (defaults to |
... | currently ignored |
Value
summary = TRUE: 3D array of dimensions n (observations),4 (posterior summary),p (number of nodes).summary = FALSE:list containing predictions for each variable
Examples
# # dataY <- ptsdfit <- estimate(Y, iter = 250, progress = FALSE)pred <- predict(fit, progress = FALSE)Model Predictions forexplore Objects
Description
Model Predictions forexplore Objects
Usage
## S3 method for class 'explore'predict( object, newdata = NULL, summary = TRUE, cred = 0.95, iter = NULL, progress = TRUE, ...)Arguments
object | object of class |
newdata | an optional data frame for obtaining predictions (e.g., on test data) |
summary | summarize the posterior samples (defaults to |
cred | credible interval used for summarizing |
iter | number of posterior samples (defaults to all in the object). |
progress | Logical. Should a progress bar be included (defaults to |
... | currently ignored |
Value
summary = TRUE: 3D array of dimensions n (observations),4 (posterior summary),p (number of nodes).summary = FALSE:list containing predictions for each variable
Examples
# dataY <- ptsd# fit modelfit <- explore(Y, iter = 250, progress = FALSE)# predictpred <- predict(fit, progress = FALSE)Model Predictions forvar_estimate Objects
Description
Model Predictions forvar_estimate Objects
Usage
## S3 method for class 'var_estimate'predict(object, summary = TRUE, cred = 0.95, iter = NULL, progress = TRUE, ...)Arguments
object | object of class |
summary | summarize the posterior samples (defaults to |
cred | credible interval used for summarizing |
iter | number of posterior samples (defaults to all in the object). |
progress | Logical. Should a progress bar be included (defaults to |
... | Currently ignored |
Value
The predicted values for each regression model.
Examples
# dataY <- subset(ifit, id == 1)[,-1]# fit model with alias (var_estimate also works)fit <- var_estimate(Y, progress = FALSE)# fitted valuespred <- predict(fit, progress = FALSE)# predicted values (1st outcome)pred[,,1]Predictability: Bayesian Variance Explained (R2)
Description
Compute nodewise predictability or Bayesian variance explained (R2 Gelman et al. 2019).In the context of GGMs, this method was described in Williams (2018).
Usage
predictability( object, select = FALSE, cred = 0.95, BF_cut = 3, iter = NULL, progress = TRUE, ...)Arguments
object | object of class |
select | logical. Should the graph be selected ? The default is currently |
cred | numeric. credible interval between 0 and 1 (default is 0.95) that is used for selecting the graph. |
BF_cut | numeric. evidentiary threshold (default is 3). |
iter | interger. iterations (posterior samples) used for computing R2. |
progress | Logical. Should a progress bar be included (defaults to |
... | currently ignored. |
Value
An object of classesbayes_R2 andmetric, including
scoresA list containing the posterior samples of R2. The is one elementfor each node.
Note
Binary and Ordinal Data:
R2 is computed from the latent data.
Mixed Data:
The mixed data approach is somewhat ad-hoc see for example p. 277 in Hoff (2007). Thisis becaue uncertainty in the ranks is not incorporated, which means that variance explained is computed fromthe 'empirical'CDF.
Model Selection:
Currently the default to include all nodes in the model when computing R2. This can be changed (i.e.,select = TRUE), whichthen sets those edges not detected to zero. This is accomplished by subsetting the correlation matrix according to each neighborhoodof relations.
References
Gelman A, Goodrich B, Gabry J, Vehtari A (2019).“R-squared for Bayesian Regression Models.”American Statistician,73(3), 307–309.ISSN 15372731.
Hoff PD (2007).“Extending the rank likelihood for semiparametric copula estimation.”The Annals of Applied Statistics,1(1), 265–283.doi:10.1214/07-AOAS107.
Williams DR (2018).“Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons.”arXiv.doi:10.31234/OSF.IO/X8DPR.
Examples
# dataY <- ptsd[,1:5]fit <- estimate(Y, iter = 250, progress = FALSE)r2 <- predictability(fit, select = TRUE, iter = 250, progress = FALSE)# summaryr2Predicted Probabilities
Description
Compute the predicted probabilities for discrete data, with the possibilityof conditional predictive probabilities (i.e., at fixed values of other nodes)
Usage
predicted_probability(object, outcome, Y, ...)Arguments
object | An object of class |
outcome | Character string. Node for which the probabilities are computed. |
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables).This must include the column names. |
... | Compute conditional probabilities by specifying a column name in |
Value
A list containing a matrix with the computed probabilities(a row for each predictive sample and a column for each category).
Note
There are no checks that the conditional probability exists, i.e., supposeyou wish to condition on, say, B3 = 2 and B4 = 1, yet there is no instance inwhich B3 is 2 AND B4 is 1. This will result in an uninformative error.
Examples
Y <- ptsdfit <- estimate(as.matrix(Y), iter = 150, type = "mixed")pred <- posterior_predict(fit, iter = 100)prob <- predicted_probability(pred, Y = Y, outcome = "B3", B4 = 0, B5 = 0)Print method forBGGM objects
Description
Mainly used to avoid a plethora of different printfunctions that overcrowded the documentation in previous versionsofBGGM.
Usage
## S3 method for class 'BGGM'print(x, ...)Arguments
x | An object of class |
... | currently ignored |
Prior Belief Gaussian Graphical Model
Description
Incorporate prior information into the estimation of theconditional dependence structure. This prior information is expressed asthe prior odds that each relation should be included in the graph.
Usage
prior_belief_ggm(Y, prior_ggm, post_odds_cut = 3, ...)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables/nodes). |
prior_ggm | Matrix of dimensionsp byp, encoding the priorodds for including each relation in the graph (see ' |
post_odds_cut | Numeric. Threshold for including an edge (defaults to 3).Note |
... | Additional arguments passed to |
Details
Technically, the prior odds is not for including an edge in the graph,but for (H1)/p(H0), where H1 captures the hypothesized edge size and H0 is thenull model (see Williams2019_bf). Accordingly, setting anentry inprior_ggm to, say, 10, encodes a prior belief that H1 is 10 timesmore likely than H0. Further, setting an entry inprior_ggm to 1 resultsin equal prior odds (the default inselect.explore).
Value
An object including:
adj: Adjacency matrix
post_prob: Posterior probability for thealternative hypothesis.
Examples
# Assume perfect prior information# synthetic ggmp <- 20main <- gen_net()# prior odds 10:1, assuming graph is knownprior_ggm <- ifelse(main$adj == 1, 10, 1)# generate datay <- MASS::mvrnorm(n = 200, mu = rep(0, 20), Sigma = main$cors)# prior estprior_est <- prior_belief_ggm(Y = y, prior_ggm = prior_ggm, progress = FALSE)# check scoresBGGM:::performance(Estimate = prior_est$adj, True = main$adj)# default in BGGMdefault_est <- select(explore(y, progress = FALSE))# check scoresBGGM:::performance(Estimate = default_est$Adj_10, True = main$adj)Prior Belief Graphical VAR
Description
Prior Belief Graphical VAR
Usage
prior_belief_var( Y, prior_temporal = NULL, post_odds_cut = 3, est_ggm = TRUE, prior_ggm = NULL, progress = TRUE, ...)Arguments
Y | Matrix (or data frame) of dimensionsn(observations) byp (variables/nodes). |
prior_temporal | Matrix of dimensionsp byp,encoding the prior odds for including each relationin the temporal graph (see ' |
post_odds_cut | Numeric. Threshold for including an edge (defaults to 3).Note |
est_ggm | Logical. Should the contemporaneous network be estimated(defaults to |
prior_ggm | Matrix of dimensionsp byp, encoding the priorodds for including each relation in the graph(see ' |
progress | Logical. Should a progress bar be included(defaults to |
... | Additional arguments passed to |
Details
Technically, the prior odds is not for including an edge in the graph,but for (H1)/p(H0), where H1 captures the hypothesized edge size and H0 is thenull model (see Williams2019_bf). Accordingly, setting anentry inprior_ggm to, say, 10, encodes a prior belief that H1 is 10 timesmore likely than H0. Further, setting an entry inprior_ggm orprior_var to 1 results in equal prior odds(the default inselect.explore).
Value
An object including (est_ggm = FALSE):
adj: Adjacency matrix
post_prob: Posterior probability for thealternative hypothesis.
An object including (est_ggm = TRUE):
adj_temporal: Adjacency matrix for the temporal network.
post_prob_temporal: Posterior probability for thealternative hypothesis (temporal edge)
adj_ggm: Adjacency matrix for the contemporaneousnetwork (ggm).
post_prob_ggm: Posterior probability for thealternative hypothesis (contemporaneous edge)
Note
The returned matrices are formatted with the rows indicatingthe outcome and the columns the predictor. Hence, adj_temporal[1,4] is the temporalrelation of node 4 predicting node 1. This follows the convention of thevars package (i.e.,Acoef).
Further, in order to compute the Bayes factor the data isstandardized (mean = 0 and standard deviation = 1).
Examples
# affect data from 1 person# (real data)y <- na.omit(subset(ifit, id == 1)[,2:7])p <- ncol(y)# random prior graph# (dont do this in practice!!)prior_var = matrix(sample(c(1,10), size = p^2, replace = TRUE), nrow = p, ncol = p)# fit modelfit <- prior_belief_var(y, prior_temporal = prior_var, post_odds_cut = 3)Data: Post-Traumatic Stress Disorder
Description
A dataset containing items that measure Post-traumatic stress disorder symptoms (Armour et al. 2017).There are 20 variables (p) and 221 observations (n).
Usage
data("ptsd")Format
A dataframe with 221 rows and 20 variables
Details
Intrusive Thoughts
Nightmares
Flashbacks
Emotional cue reactivity
Psychological cue reactivity
Avoidance of thoughts
Avoidance of reminders
Trauma-related amnesia
Negative beliefs
Negative trauma-related emotions
Loss of interest
Detachment
Restricted affect
Irritability/anger
Self-destructive/reckless behavior
Hypervigilance
Exaggerated startle response
Difficulty concentrating
Sleep disturbance
References
Armour C, Fried EI, Deserno MK, Tsai J, Pietrzak RH (2017).“A network analysis of DSM-5 posttraumatic stress disorder symptoms and correlates in US military veterans.”Journal of anxiety disorders,45, 49–59.doi:10.31234/osf.io/p69m7.
Data: Post-Traumatic Stress Disorder (Sample # 1)
Description
A correlation matrix that includes 16 variables. The correlation matrix was estimated from 526individuals (Fried et al. 2018).
Format
A correlation matrix with 16 variables
Details
Intrusive Thoughts
Nightmares
Flashbacks
Physiological/psychological reactivity
Avoidance of thoughts
Avoidance of situations
Amnesia
Disinterest in activities
Feeling detached
Emotional numbing
Foreshortened future
Sleep problems
Irritability
Concentration problems
Hypervigilance
Startle response
References
Fried EI, Eidhof MB, Palic S, Costantini G, Huisman-van Dijk HM, Bockting CL, Engelhard I, Armour C, Nielsen AB, Karstoft K (2018).“Replicability and generalizability of posttraumatic stress disorder (PTSD) networks: a cross-cultural multisite study of PTSD symptoms in four trauma patient samples.”Clinical Psychological Science,6(3), 335–351.
Examples
data(ptsd_cor1)Y <- MASS::mvrnorm(n = 526, mu = rep(0, 16), Sigma = ptsd_cor1, empirical = TRUE)Data: Post-Traumatic Stress Disorder (Sample # 2)
Description
A correlation matrix that includes 16 variables. The correlation matrixwas estimated from 365 individuals (Fried et al. 2018).
Format
A correlation matrix with 16 variables
Details
Intrusive Thoughts
Nightmares
Flashbacks
Physiological/psychological reactivity
Avoidance of thoughts
Avoidance of situations
Amnesia
Disinterest in activities
Feeling detached
Emotional numbing
Foreshortened future
Sleep problems
Irritability
Concentration problems
Hypervigilance
Startle response
References
Fried EI, Eidhof MB, Palic S, Costantini G, Huisman-van Dijk HM, Bockting CL, Engelhard I, Armour C, Nielsen AB, Karstoft K (2018).“Replicability and generalizability of posttraumatic stress disorder (PTSD) networks: a cross-cultural multisite study of PTSD symptoms in four trauma patient samples.”Clinical Psychological Science,6(3), 335–351.
Examples
data(ptsd_cor2)Y <- MASS::mvrnorm(n = 365, mu = rep(0, 16), Sigma = ptsd_cor2, empirical = TRUE)Data: Post-Traumatic Stress Disorder (Sample # 3)
Description
A correlation matrix that includes 16 variables. The correlation matrixwas estimated from 926 individuals (Fried et al. 2018).
Format
A correlation matrix with 16 variables
Details
Intrusive Thoughts
Nightmares
Flashbacks
Physiological/psychological reactivity
Avoidance of thoughts
Avoidance of situations
Amnesia
Disinterest in activities
Feeling detached
Emotional numbing
Foreshortened future
Sleep problems
Irritability
Concentration problems
Hypervigilance
Startle response
References
Fried EI, Eidhof MB, Palic S, Costantini G, Huisman-van Dijk HM, Bockting CL, Engelhard I, Armour C, Nielsen AB, Karstoft K (2018).“Replicability and generalizability of posttraumatic stress disorder (PTSD) networks: a cross-cultural multisite study of PTSD symptoms in four trauma patient samples.”Clinical Psychological Science,6(3), 335–351.
Examples
data(ptsd_cor3)Y <- MASS::mvrnorm(n = 926, mu = rep(0, 16), Sigma = ptsd_cor3, empirical = TRUE)Data: Post-Traumatic Stress Disorder (Sample # 4)
Description
A correlation matrix that includes 16 variables. The correlation matrixwas estimated from 965 individuals (Fried et al. 2018).
Format
A correlation matrix with 16 variables
Details
Intrusive Thoughts
Nightmares
Flashbacks
Physiological/psychological reactivity
Avoidance of thoughts
Avoidance of situations
Amnesia
Disinterest in activities
Feeling detached
Emotional numbing
Foreshortened future
Sleep problems
Irritability
Concentration problems
Hypervigilance
Startle response
References
Fried EI, Eidhof MB, Palic S, Costantini G, Huisman-van Dijk HM, Bockting CL, Engelhard I, Armour C, Nielsen AB, Karstoft K (2018).“Replicability and generalizability of posttraumatic stress disorder (PTSD) networks: a cross-cultural multisite study of PTSD symptoms in four trauma patient samples.”Clinical Psychological Science,6(3), 335–351.
Examples
data(ptsd_cor4)Y <- MASS::mvrnorm(n = 965, mu = rep(0, 16), Sigma = ptsd_cor4, empirical = TRUE)Summarary Method for Multivariate or Univarate Regression
Description
Summarary Method for Multivariate or Univarate Regression
Usage
regression_summary(object, cred = 0.95, ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
... | Currently ignored |
Value
A list of lengthp including thesummaries for each regression.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfiY <- subset(Y, select = c("A1", "A2", "gender", "education"))fit_mv_ordinal <- estimate(Y, formula = ~ gender + as.factor(education), type = "continuous", iter = 250, progress = TRUE)regression_summary(fit_mv_ordinal)Compute Custom Network Statistics
Description
This function allows for computing custom network statistics forweighted adjacency matrices (partial correlations). The statistics are computed foreach of the sampled matrices, resulting in a distribution.
Usage
roll_your_own( object, FUN, iter = NULL, select = FALSE, cred = 0.95, progress = TRUE, ...)Arguments
object | An object of class |
FUN | A custom function for computing the statistic. The first argument must bea partial correlation matrix. |
iter | Number of iterations (posterior samples; defaults to the number in the object). |
select | Logical. Should the graph be selected ? The default is currently |
cred | Numeric. Credible interval between 0 and 1 (default is 0.95) that is used for selecting the graph. |
progress | Logical. Should a progress bar be included (defaults to |
... | Arguments passed to the function. |
Details
The user has complete control of this function. Hence, care must be taken as to whatFUNreturns and in what format. The function should return a single number (one for the entire GGM)or a vector (one for each node). This ensures that the print andplot.roll_your_ownwill work.
Whenselect = TRUE, the graph is selected and then the network statistics are computed based onthe weigthed adjacency matrix. This is accomplished internally by multiplying each of the sampledpartial correlation matrices by the adjacency matrix.
Value
An object defined byFUN.
Examples
########################################## example 1: assortment ############################################ assortmentlibrary(assortnet)Y <- BGGM::bfi[,1:10]membership <- c(rep("a", 5), rep("c", 5))# fit modelfit <- estimate(Y = Y, iter = 250, progress = FALSE)# membershipmembership <- c(rep("a", 5), rep("c", 5))# define functionf <- function(x,...){ assortment.discrete(x, ...)$r}net_stat <- roll_your_own(object = fit, FUN = f, types = membership, weighted = TRUE, SE = FALSE, M = 1, progress = FALSE)# printnet_stat################################################## example 2: expected influence #################################################### expected influence from this packagelibrary(networktools)# dataY <- depression# fit modelfit <- estimate(Y = Y, iter = 250)# define functionf <- function(x,...){ expectedInf(x,...)$step1}# computenet_stat <- roll_your_own(object = fit, FUN = f, progress = FALSE)########################################## example 3: mixed data & bridge ############################################ bridge from this packagelibrary(networktools)# dataY <- ptsd[,1:7]fit <- estimate(Y, type = "mixed", iter = 250)# clusterscommunities <- substring(colnames(Y), 1, 1)# function is slowf <- function(x, ...){ bridge(x, ...)$`Bridge Strength`}net_stat <- roll_your_own(fit, FUN = f, select = TRUE, communities = communities, progress = FALSE)Data: Resilience Scale of Adults (RSA)
Description
A dataset containing items from the Resilience Scale of Adults (RSA). There are 33 items and675 observations
Usage
data("rsa")Format
A data frame with 28 variables and 1973 observations (5 point Likert scale)
Details
1My plans for the future are2When something unforeseen happens3My family understanding of what is important in life is4I feel that my future looks5My goals6I can discuss personal issues with7I feel8I enjoy being9Those who are good at encouraging are10The bonds among my friends11My personal problems12When a family member experiences a crisis/emergency13My family is characterised by14To be flexible in social settings15I get support from16In difficult periods my family17My judgements and decisions18New friendships are something19When needed, I have20I am at my best when I21Meeting new people is22When I am with others23When I start on new things/projects24Facing other people, our family acts25Belief in myself26For me, thinking of good topics of conversation is27My close friends/family members28I am good at29In my family, we like to30Rules and regular routines31In difficult periods I have a tendency to32My goals for the future are33Events in my life that I cannot influencegender"M" (male) or "F" (female)
Note
There are 6 domains
Planned future: items 1, 4, 5, 32
Perception of self: items 2, 11, 17, 25, 31, 33
Family cohesion: items 3, 7, 13, 16, 24, 29
Social resources: items 6, 9, 10, 12, 15, 19, 27
Social Competence: items 8, 14, 18, 21, 22, 26,
Structured style: items 23, 28, 30
References
Briganti, G., & Linkowski, P. (2019). Item and domain network structures of the ResilienceScale for Adults in 675 university students. Epidemiology and psychiatric sciences, 1-9.
Examples
data("rsa")S3select method
Description
S3 select method
Usage
select(object, ...)Arguments
object | object of class |
... | not currently used |
Value
select works with the following methods:
Graph Selection forestimate Objects
Description
Provides the selected graph based on credible intervals forthe partial correlations that did not contain zero(Williams 2018).
Usage
## S3 method for class 'estimate'select(object, cred = 0.95, alternative = "two.sided", ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for selecting the graph(defaults to 0.95; must be between 0 and 1). |
alternative | A character string specifying the alternative hypothesis. Itmust be one of "two.sided" (default), "greater" or "less".See note for futher details. |
... | Currently ignored. |
Details
This package was built for the social-behavioral sciences in particular. In these applications, there isstrong theory that expectsall effects to be positive. This is known as a "positive manifold" andthis notion has a rich tradition in psychometrics. Hence, this can be incorporated into the graph withalternative = "greater". This results in the estimated structure including only positive edges.
Value
The returned object of classselect.estimate contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
pcor_adjSelected partial correlation matrix (weighted adjacency).adjAdjacency matrix for the selected edgesobjectAn object of classestimate(the fitted model).
References
Williams DR (2018).“Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons.”arXiv.doi:10.31234/OSF.IO/X8DPR.
See Also
estimate andggm_compare_estimate for several examples.
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi[,1:10]# estimatefit <- estimate(Y, iter = 250, progress = FALSE)# select edge setE <- select(fit)Graph selection forexplore Objects
Description
Provides the selected graph based on the Bayes factor(Williams and Mulder 2019).
Usage
## S3 method for class 'explore'select(object, BF_cut = 3, alternative = "two.sided", ...)Arguments
object | An object of class |
BF_cut | Numeric. Threshold for including an edge (defaults to 3). |
alternative | A character string specifying the alternative hypothesis. Itmust be one of "two.sided" (default), "greater", "less",or "exhaustive". See note for further details. |
... | Currently ignored. |
Details
Exhaustive provides the posterior hypothesis probabilities fora positive, negative, or null relation (see Table 3 in Williams and Mulder 2019).
Value
The returned object of classselect.explore contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
alternative = "two.sided"
pcor_mat_zeroSelected partial correlation matrix (weighted adjacency).pcor_matPartial correlation matrix (posterior mean).Adj_10Adjacency matrix for the selected edges.Adj_01Adjacency matrix for which there wasevidence for the null hypothesis.
alternative = "greater" and"less"
pcor_mat_zeroSelected partial correlation matrix (weighted adjacency).pcor_matPartial correlation matrix (posterior mean).Adj_20Adjacency matrix for the selected edges.Adj_02Adjacency matrix for which there wasevidence for the null hypothesis (see note).
alternative = "exhaustive"
post_probA data frame that included the posterior hypothesis probabilities.neg_matAdjacency matrix for which there was evidence for negative edges.pos_matAdjacency matrix for which there was evidence for positive edges.neg_matAdjacency matrix for which there wasevidence for the null hypothesis (see note).pcor_matPartial correlation matrix (posterior mean). The weighted adjacencymatrices can be computed by multiplyingpcor_matwith an adjacency matrix.
Note
Care must be taken with the optionsalternative = "less" andalternative = "greater". This is because the full parameter space is not included,such, foralternative = "greater", there can be evidence for the "null" whenthe relation is negative. This inference is correct: the null model better predictedthe data than the positive model. But note this is relative and doesnotprovide absolute evidence for the null hypothesis.
References
Williams DR, Mulder J (2019).“Bayesian Hypothesis Testing for Gaussian Graphical Models: Conditional Independence and Order Constraints.”PsyArXiv.doi:10.31234/osf.io/ypxd8.
See Also
explore andggm_compare_explore for several examples.
Examples
#################### example 1 ##################### dataY <- bfi[,1:10]# fit modelfit <- explore(Y, progress = FALSE)# edge setE <- select(fit, alternative = "exhaustive")Graph Selection forggm_compare_estimate Objects
Description
Provides the selected graph (of differences) based on credible intervals forthe partial correlations that did not contain zero(Williams 2018).
Usage
## S3 method for class 'ggm_compare_estimate'select(object, cred = 0.95, ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for selecting the graph(defaults to 0.95; must be between 0 and 1). |
... | not currently used |
Value
The returned object of classselect.ggm_compare_estimate contains a lot of information thatis used for printing and plotting the results. For users ofBGGM, the followingare the useful objects:
mean_diffA list of matrices for each group comparsion (partial correlation differences).pcor_adjA list of weighted adjacency matrices for each group comparsion.adjA list of adjacency matrices for each group comparsion.
Examples
# note: iter = 250 for demonstrative purposes##################### example 1: ###################### dataY <- bfi# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))Yfemale <- subset(Y, gender == 2, select = -c(gender, education))# fit modelfit <- ggm_compare_estimate(Ymale, Yfemale, type = "continuous", iter = 250, progress = FALSE)E <- select(fit)Graph selection forggm_compare_explore Objects
Description
Provides the selected graph (of differences) based on the Bayes factor(Williams et al. 2020).
Usage
## S3 method for class 'ggm_compare_explore'select(object, BF_cut = 3, ...)Arguments
object | An object of class |
BF_cut | Numeric. Threshold for including an edge (defaults to 3). |
... | Currently ignored. |
Value
The returned object of classselect.ggm_compare_explore containsa lot of information that is used for printing and plotting the results.For users ofBGGM, the following are the useful objects:
adj_10Adjacency matrix for which there was evidence for a difference.adj_10Adjacency matrix for which there was evidence for a null relationpcor_mat_10Selected partial correlation matrix (weighted adjacency; only for two groups).
See Also
explore andggm_compare_explore for several examples.
Examples
##################### example 1: ###################### dataY <- bfi# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:10]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:10]# fit modelfit <- ggm_compare_explore(Ymale, Yfemale, iter = 250, type = "continuous", progress = FALSE)E <- select(fit, post_prob = 0.50)Graph Selection forvar.estimate Object
Description
Graph Selection forvar.estimate Object
Usage
## S3 method for class 'var_estimate'select(object, cred = 0.95, alternative = "two.sided", ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for selecting the graph(defaults to 0.95; must be between 0 and 1). |
alternative | A character string specifying the alternative hypothesis. Itmust be one of "two.sided" (default), "greater" or "less".See note for futher details. |
... | Currently ignored. |
Value
An object of classselect.var_estimate, including
pcor_adj Adjacency matrix for the partial correlations.
beta_adj Adjacency matrix for the regression coefficients.
pcor_weighted_adj Weighted adjacency matrix for the partial correlations.
beta_weighted_adj Weighted adjacency matrix for the regression coefficients.
pcor_muPartial correlation matrix (posterior mean).beta_muA matrix including the regression coefficients (posterior mean).
Examples
# dataY <- subset(ifit, id == 1)[,-1]# fit model with alias (var_estimate also works)fit <- var_estimate(Y, progress = FALSE)# select graphsselect(fit, cred = 0.95)Summarizecoef Objects
Description
Summarize regression parameters with the posterior mean,standard deviation, and credible interval.
Usage
## S3 method for class 'coef'summary(object, cred = 0.95, ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
... | Currently ignored |
Value
A list of lengthp including thesummaries for each multiple regression.
Note
Seecoef.estimate andcoef.explore for examples.
Summary method forestimate.default objects
Description
Summarize the posterior distribution of each partial correlationwith the posterior mean and standard deviation.
Usage
## S3 method for class 'estimate'summary(object, col_names = TRUE, cred = 0.95, ...)Arguments
object | An object of class |
col_names | Logical. Should the summary include the column names (default is |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
... | Currently ignored. |
Value
A dataframe containing the summarized posterior distributions.
See Also
Examples
# dataY <- ptsd[,1:5]fit <- estimate(Y, iter = 250, progress = FALSE)summary(fit)Summary Method forexplore.default Objects
Description
Summarize the posterior distribution for each partial correlationwith the posterior mean and standard deviation.
Usage
## S3 method for class 'explore'summary(object, col_names = TRUE, ...)Arguments
object | An object of class |
col_names | Logical. Should the summary include the column names (default is |
... | Currently ignored |
Value
A dataframe containing the summarized posterior distributions.
See Also
Examples
# note: iter = 250 for demonstrative purposesY <- ptsd[,1:5]fit <- explore(Y, iter = 250, progress = FALSE)summ <- summary(fit)summSummary method forggm_compare_estimate objects
Description
Summarize the posterior distribution of each partial correlationdifference with the posterior mean and standard deviation.
Usage
## S3 method for class 'ggm_compare_estimate'summary(object, col_names = TRUE, cred = 0.95, ...)Arguments
object | An object of class |
col_names | Logical. Should the summary include the column names (default is |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
... | Currently ignored. |
Value
A list containing the summarized posterior distributions.
See Also
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:5]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:5]# fit modelfit <- ggm_compare_estimate(Ymale, Yfemale, type = "continuous", iter = 250, progress = FALSE)summary(fit)Summary Method forggm_compare_explore Objects
Description
Summarize the posterior hypothesis probabilities
Usage
## S3 method for class 'ggm_compare_explore'summary(object, col_names = TRUE, ...)Arguments
object | An object of class |
col_names | Logical. Should the summary include the column names (default is |
... | Currently ignored. |
Value
An object of classsummary.ggm_compare_explore
See Also
Examples
# note: iter = 250 for demonstrative purposes# dataY <- bfi[complete.cases(bfi),]# males and femalesYmale <- subset(Y, gender == 1, select = -c(gender, education))[,1:10]Yfemale <- subset(Y, gender == 2, select = -c(gender, education))[,1:10]############################# example 1: ordinal ############################## fit modelfit <- ggm_compare_explore(Ymale, Yfemale, type = "ordinal", iter = 250, progress = FALSE)# summarysumm <- summary(fit)summSummary Method forpredictability Objects
Description
Summary Method forpredictability Objects
Usage
## S3 method for class 'predictability'summary(object, cred = 0.95, ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
... | Currently ignored |
Examples
Y <- ptsd[,1:5]fit <- explore(Y, iter = 250, progress = FALSE)r2 <- predictability(fit, iter = 250, progress = FALSE)summary(r2)Summary Method forselect.explore Objects
Description
Summary Method forselect.explore Objects
Usage
## S3 method for class 'select.explore'summary(object, col_names = TRUE, ...)Arguments
object | object of class |
col_names | Logical. |
... | Currently ignored. |
Value
a data frame including the posterior mean, standard deviation,and posterior hypothesis probabilities for each relation.
Examples
# dataY <- bfi[,1:10]# fit modelfit <- explore(Y, iter = 250, progress = FALSE)# edge setE <- select(fit, alternative = "exhaustive")summary(E)Summary Method forvar_estimate Objects
Description
Summarize the posterior distribution of each partial correlationand regression coefficient with the posterior mean, standard deviation, andcredible intervals.
Usage
## S3 method for class 'var_estimate'summary(object, cred = 0.95, ...)Arguments
object | An object of class |
cred | Numeric. The credible interval width for summarizing the posteriordistributions (defaults to 0.95; must be between 0 and 1). |
... | Currently ignored. |
Value
A dataframe containing the summarized posterior distributions,including both the partial correlations and the regression coefficients.
pcor_resultsA data frame including the summarized partial correlationsbeta_resultsA list containing the summarized regression coefficients (onedata frame for each outcome)
See Also
Examples
# dataY <- subset(ifit, id == 1)[,-1]# fit model with alias (var_estimate also works)fit <- var_estimate(Y, progress = FALSE)# summary ('pcor')print(summary(fit, cred = 0.95),param = "pcor",)# summary ('beta')print(summary(fit, cred = 0.95),param = "beta",)Data: Toronto Alexithymia Scale (TAS)
Description
A dataset containing items from the Toronto Alexithymia Scale (TAS). There are 20 variables and1925 observations
Usage
data("tas")Format
A data frame with 20 variables and 1925 observations (5 point Likert scale)
Details
1I am often confused about what emotion I am feeling2It is difficult for me to find the right words for my feelings3I have physical sensations that even doctors don’t understand4I am able to describe my feelings easily5I prefer to analyze problems rather than just describe them6When I am upset, I don’t know if I am sad, frightened, or angry7I am often puzzled by sensations in my body8I prefer just to let things happen rather than to understand why they turned out that way9I have feelings that I can’t quite identify10Being in touch with emotions is essential11I find it hard to describe how I feel about people12People tell me to describe my feelings more13I don’t know what’s going on inside me14I often don’t know why I am angry15I prefer talking to people about their daily activities rather than their feelings16I prefer to watch “light” entertainment shows rather than psychological dramas17It is difficult for me to reveal my innermost feelings, even to close friends18I can feel close to someone, even in moments of silence19I find examination of my feelings useful in solving personal problems20Looking for hidden meanings in movies or plays distracts from their enjoymentgender"M" (male) or "F" (female)
Note
There are three domains
Difficulty identifying feelings: items 1, 3, 6, 7, 9, 13, 14
Difficulty describing feelings: items 2, 4, 11, 12, 17
Externally oriented thinking: items 10, 15, 16, 18, 19
References
Briganti, G., & Linkowski, P. (2019). Network approach to items and domains fromthe Toronto Alexithymia Scale. Psychological reports.
Examples
data("tas")VAR: Estimation
Description
Estimate VAR(1) models by efficiently sampling from the posterior distribution. Thisprovides two graphical structures: (1) a network of undirected relations (the GGM, controlling for thelagged predictors) and (2) a network of directed relations (the lagged coefficients). Note thatin the graphical modeling literature, this model is also known as a time series chain graphical model(Abegaz and Wit 2013).
Usage
var_estimate( Y, rho_sd = sqrt(1/3), beta_sd = 1, iter = 5000, progress = TRUE, seed = NULL, ...)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
rho_sd | Numeric. Scale of the prior distribution for the partial correlations,approximately the standard deviation of a beta distribution(defaults to sqrt(1/3) as this results to delta = 2, and a uniform distribution across the partial correlations). |
beta_sd | Numeric. Standard deviation of the prior distribution for the regression coefficients(defaults to 1). The prior is by default centered at zero and follows a normal distribution(Equation 9, Sinay and Hsu 2014) |
iter | Number of iterations (posterior samples; defaults to 5000). |
progress | Logical. Should a progress bar be included (defaults to |
seed | An integer for the random seed (defaults to 1). |
... | Currently ignored. |
Details
Each time series inY is standardized (mean = 0; standard deviation = 1).
Value
An object of classvar_estimate containing a lot of information that isused for printing and plotting the results. For users ofBGGM, the following are theuseful objects:
beta_muA matrix including the regression coefficients (posterior mean).pcor_muPartial correlation matrix (posterior mean).fitA list including the posterior samples.
Note
Regularization:
A Bayesian ridge regression can be fitted by decreasingbeta_sd(e.g.,beta_sd = 0.25). This could be advantageous for forecasting(out-of-sample prediction) in particular.
References
Abegaz F, Wit E (2013).“Sparse time series chain graphical models for reconstructing genetic networks.”Biostatistics,14(3), 586–599.doi:10.1093/biostatistics/kxt005.
Sinay MS, Hsu JS (2014).“Bayesian inference of a multivariate regression model.”Journal of Probability and Statistics,2014.
Examples
# dataY <- subset(ifit, id == 1)[,-1]# use alias (var_estimate also works)fit <- var_estimate(Y, progress = FALSE)fitExtract the Weighted Adjacency Matrix
Description
Extract the weighted adjacency matrix (posterior mean) fromestimate,explore,ggm_compare_estimate,andggm_compare_explore objects.
Usage
weighted_adj_mat(object, ...)Arguments
object | A model estimated withBGGM. All classes are supported, assumingthere is matrix to be extracted. |
... | Currently ignored. |
Value
The weighted adjacency matrix (partial correlation matrix with zeros).
Examples
# note: iter = 250 for demonstrative purposesY <- bfi[,1:5]# estimatefit <- estimate(Y, iter = 250, progress = FALSE)# select graphE <- select(fit)# extract weighted adj matrixweighted_adj_mat(E)Data: Women and Mathematics
Description
A data frame containing 1190 observations (n = 1190) and 6 variables (p = 6) measured on the binary scale.
Usage
data("women_math")Format
A data frame containing 1190 observations (n = 1190) and 6 variables (p = 6) measured on the binary scale(Fowlkes et al. 1988). These data have been analyzed in Tarantola (2004)and in (Madigan and Raftery 1994). The variable descriptions were copied from (section 5.2 )(section 5.2, Talhouk et al. 2012)
Details
1Lecture attendance (attend/did not attend)2Gender (male/female)3School type (urban/suburban)4“I will be needing Mathematics in my future work” (agree/disagree)5Subject preference (math/science vs. liberal arts)6Future plans (college/job)
References
Fowlkes EB, Freeny AE, Landwehr JM (1988).“Evaluating logistic models for large contingency tables.”Journal of the American Statistical Association,83(403), 611–622.
Madigan D, Raftery AE (1994).“Model selection and accounting for model uncertainty in graphical models using Occam's window.”Journal of the American Statistical Association,89(428), 1535–1546.
Talhouk A, Doucet A, Murphy K (2012).“Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices.”Journal of Computational and Graphical Statistics,21(3), 739–757.
Tarantola C (2004).“MCMC model determination for discrete graphical models.”Statistical Modelling,4(1), 39–61.doi:10.1191/1471082x04st063oa.
Examples
data("women_math")Zero-Order Correlations
Description
Estimate zero-order correlations for any type of data. Note zero-order refers to the fact thatno variables are controlled for (i.e., bivariate correlations). To our knowledge, this is the only Bayesianimplementation inR that can estiamte Pearson's, tetrachoric (binary), polychoric(ordinal with more than two cateogries), and rank based correlation coefficients.
Usage
zero_order_cors( Y, type = "continuous", iter = 5000, mixed_type = NULL, progress = TRUE)Arguments
Y | Matrix (or data frame) of dimensionsn (observations) byp (variables). |
type | Character string. Which type of data for |
iter | Number of iterations (posterior samples; defaults to 5000). |
mixed_type | Numeric vector. An indicator of length p for which varibles should be treated as ranks.(1 for rank and 0 to assume normality). The default is currently to treat all integer variables as rankswhen |
progress | Logical. Should a progress bar be included (defaults to |
Details
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data includingonlycontinuous oronly discrete variables. This is based on the ranked likelihood which requires samplingthe ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationallyexpensive when there are many levels. For example, with continuous data, there are as many ranksas data points!
The optionmixed_type allows the user to determine which variable should be treated as ranksand the "emprical" distribution is used otherwise (Hoff 2007). This isaccomplished by specifying an indicator vector of lengthp. A one indicates to use the ranks,whereas a zero indicates to "ignore" that variable. By default all integer variables are treated as ranks.
Dealing with Errors:
An error is most likely to arise whentype = "ordinal". The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.This can result in an ill-defined matrix. If this occurs, we recommend to first trydecreasing
prior_sd(i.e., a more informative prior). If that does not work, thenchange the data type totype = mixedwhich then estimates a copula GGM(this method can be used for data containingonly ordinal variable). This shouldwork without a problem.The second is due to how the ordinal data are categorized. For example, if the error statesthat the index is out of bounds, this indicates that the first category is a zero. This is not allowed, asthe first category must be one. This is addressed by adding one (e.g.,
Y + 1) to the data matrix.
Value
RAn array including the correlation matrices(of dimensionsp byp byiter)R_meanPosterior mean of the correlations (of dimensionsp byp)
Examples
# note: iter = 250 for demonstrative purposesY <- ptsd[,1:3]######################################## example 1: Pearson's #####################################fit <- zero_order_cors(Y, type = "continuous", iter = 250, progress = FALSE)####################################### example 2: polychoric #####################################fit <- zero_order_cors(Y+1, type = "ordinal", iter = 250, progress = FALSE)################################ example 3: rank ################################fit <- zero_order_cors(Y+1, type = "mixed", iter = 250, progress = FALSE)############################## example 4: tetrachoric ############################### binary dataY <- women_math[,1:3]fit <- zero_order_cors(Y, type = "binary", iter = 250, progress = FALSE)