Movatterモバイル変換

This vignette is designed to guide users through the process ofsimulating data and then running an experiment with the RGM package. Wewill start by demonstrating how to simulate data, which is a crucialpreliminary step. Following this, we will showcase how to run theestimation process on the simulated data. The focus will be oninterpreting various plots generated from the experiment, providinginsights into the performance and accuracy of the RGM package.

Simulating Data

The first step in the analysis is to simulate data, which forms thebasis for our subsequent experiment. The RGM package offersfunctionalities for simulating data from a random graphical model,mimicking real-world network structures that show similarities across anumber of environments.

For the simulation of this experiment and using the terminology ofhuman microbiota systems, we consider\(B=13\) distinct body sites, eachrepresenting a different environment, and\(p=87\) microbes identified as OperationalTaxonomic Units (OTUs). For each environment\(k\), where\(k=1,\ldots,B\), let\(\mathbf{Y}^{(k)} = (Y^{(k)}_1, \ldots,Y^{(k)}_p)\) denote the\(p\)-dimensional random vector of OTUabundances. The relationship among these OTUs within each environment ismodeled using the following Gaussian graphical model (the implementationallows for discrete marginal distributions, but we do not consider thesein the simulation for simplicity):\[\begin{equation}\mathbf{Y}^{(k)} | G^{(k)} \sim\mathcal{N}_{p}(\mathbf{0},\boldsymbol{\Omega}^{(k)}),\end{equation}\] with\(\boldsymbol{\Omega}^{(k)}\) the precisionmatrix associated to condition\(k\).

We denote with\(G^{(k)}\) theconditional independence graph for environment\(k\). This is given by the non-zero patternin\(\boldsymbol{\Omega}^{(k)}\). Then,the collection of graphs\(G =\{G^{(k)}\}_k\) across all environments is assumed to bedistributed according to a random graph model. For the simulation, weconsider the following latent probit model\[\begin{equation}\label{eq:latentprobit}P({G_{j_1,j_2}}^{(k)}=1~|~G_{j_1,j_2}^{(-k)}, \Theta, w)=\Phi\Big(\alpha_k+{w_{j_1,j_2}}\beta+\mathbf{c}_k^t\sum_{k' \nek}\mathbf{c}_{k'}1_{\{{G_{j_1,j_2}}^{(k')}=1\}}\Big), \tag{1}\end{equation}\] with environment specific intercepts\(\alpha_k\), one edge covariate\(W\) and a 2D latent space for theenvironments. This is in general the model that is implemented in thepackage, but multiple edge covariates can also be considered.

Data from the model above, with\(n=346\) observations for each environment,is simulated by:

# Running the simulation with specific parametersa<- rgm:::sim.rgm(p=27,B=5,n=146,mcmc_iter =100,seed=1234)

Results

We proceed with the diagnostics plots

ps= rgm:::post_processing_rgm(simulated_data = a,results = res)

We first observe the convergence of\(\beta\) acriss the MCMC iterations

ps$beta_convergence

RGM Recovery Plot

The simulation study results are based on the analysis of the last2500 MCMC iterations. We compare the true probit probabilities, asderived from Equation 1 using the true values of the parameters, withthose calculated using the mean posterior estimates of the parameters\(\boldsymbol{\alpha}\),\(\beta\), and\(\mathbf{c}\).

ps$rgm_recovery

RGM Recovery Plot

Next, Receiver Operating Characteristic (ROC) curves compare therecovered graphs with the true graphs for each of the 13 environments.These curves are generated by varying thresholds on the inferredposterior edge probabilities. Different colors represent each of the 13environments.

ps$roc_plot

ROC Plot

The following plot compares the true values of\(\alpha_k\) with the mean posteriorestimates. The good agreement between the two shows how the procedure isable to recover the right level of sparsity of the inferred graphs ineach environment.

ps$estimation_of_alpha

Estimation of Alpha

In the next plot, we visualize the posterior distribution of\(\beta\) and compare the mean posteriorestimate with the true value via vertical lines.

ps$posterior_distribution

Posterior Distribution of Beta

The following heatmap visualizes the posterior edge probabilities foreach edge and each environment, with rows and columns re-arranged viahierarchical clustering for a clearer pattern identification. Bluecorresponds to probabilities close to 1, while red to probabilitiesclose to zero.

ps$edge_prob

Edge Probability Heatmap

The graph shows the sparsity levels of the graphs as well as thesimilarities between the environments in terms of network structures

Movatterモバイル変換

An Introduction to rgm

Simulating Data

Estimation

Results