Movatterモバイル変換

The sparse marginal epistasis (SME) test performs a genome-widesearch for SNPs involved in genetic interactions while conditioning oninformation derived from functional genomic data.

By examining one SNP at a time, SME fits a linear mixed model to testfor marginal epistasis. It explicitly models the combined additiveeffects from all SNPs a marginal epistatic effect that representspairwise interactions involving a test SNP.

The key to the SME formulation is that the interaction between thetest SNP and other SNPs may be masked depending on additionalinformation. Masking interactions that do not contribute to traitvariance both maximizes the power of the inference as well as realizesthe computational efficiency needed to analyze human Biobank scaledata.

This pages presents a toy example to show case what insights theimplementation of the test provides.

Sparse Marginal Epistasis test (SME) schematic manhattan plot

Figure 1. SME performs a genome-wide search for SNPsinvolved in genetic interactions while conditioning on informationderived from functional genomic data. SME has improved power to detectmarginal epistasis and runs 10x to 90x faster than state-of-the-artmethods.

Run SME with PLINK data

library(smer)library(dplyr)library(ggplot2)

Data Requirements and File Formats

SME is implemented in R and requires your genetic data to be in PLINKformat, which consists of three files:

.bim: Contains SNP information
.bed: Contains genetic data
.fam: Contains sample information

Note:sme() cannot handle missinggenotypes. Prior to runningsme(), use PLINK to removevariants with missing genotypes or impute them.

Additionally, your phenotype (trait) data should be in a separatefile following PLINK’s phenotype format.

Thesme() function includes parameters that let youcontrol memory usage and computational resources. For detailed guidanceon optimizing these settings for your system, please see our tutorialHowTo Optimize the Memory Requirements of SME.

Specifying SNPs for Analysis

When selecting which SNPs to analyze, you must provide theirpositions as1-based indices. These indices correspondto the row numbers in your.bim file, where the first SNPis index 1, the second is index 2, and so on.

For complete details about all function parameters, please refer tothe documentation of thesme() function.

# File inputsplink_file<-"path/to/plink/file"pheno_file<-"path/to/pheno/file"mask_file<-"path/to/mask/file"# Parameter inputschun_ksize<-10n_randvecs<-10n_blocks<-10n_threads<-5# 1-based Indices of SNPs to be analyzedn_snps<-100snp_indices<-1:n_snpssme_result<-sme(  plink_file,  pheno_file,  mask_file,  snp_indices,  chunk_size,  n_randvecs,  n_blocks,  n_threads)

This analysis uses simulated data for demonstration purposes. Wesimulated synthetic phenotypes from 5000 synthetic genotypes with 6000SNPs. If you would like to learn how to simulate data, please refer toour tutorialHowTo Simulate Traits.

Understanding the Results

When you call thesme() function, it returns a listcontaining multiple elements. The most important one is calledsummary, which contains the main analysis results. Theseresults are formatted as tidy data, making them compatible with popularR packages likeggplot2 anddplyr for furtheranalysis and visualization.

Visualizing Genomic Associations

We use Manhattan plots to visualize genome-wide analyses because theyeffectively highlight strong associations between genetic variants andtraits. In this case, the Manhattan plot specifically shows statisticalepistasis (interactions between genes). For reference, we’ve marked thetrue causal SNPs (Single Nucleotide Polymorphisms) in green on the plot- these are the genetic variants we included in our simulation as havingreal effects.

sme_result$summary%>%ggplot(aes(x = index,y =-log10(p),color = true_gxg_snp))+geom_point()+xlab("Position")+labs(color ="Epistatic SNP")

Manhattan plot to illustrate the sparse marginal epistasis test

Understanding Variance Components and Effect Sizes

SME estimates how much of the total trait variation can be explainedby genetic interactions. In this simulation, we set the PhenotypicVariance Explained (PVE) to 5% for each SNP involved in epistaticinteractions.

The plot below shows how well our method recovered these effects. Itdisplays two distributions:

The estimated PVE for SNPs we know are truly involved in epistaticinteractions
The estimated PVE for SNPs that have no real epistatic effects

The dashed line marks the true 5% PVE level we used in thesimulation, allowing you to see how accurately the method estimated theactual effect sizes.

sme_result$summary%>%ggplot(aes(x = true_gxg_snp,y = pve,fill = true_gxg_snp))+geom_boxplot()+geom_hline(yintercept =0.05,color ="grey40",linetype ="dashed")+annotate("text",x =0.8,y =0.055,label ="True per SNP epistatic PVE",color ="black")+xlab("Epistatic SNP")+ylab("Phenotypic Variance Explained")+theme(legend.position ="none")

PVE plot to illustrate the sparse marginal epistasis test

Narrow Sense Heritability Estimates

The SME method uses a linear mixed model to separate differentsources of trait variation. One key component it estimates is narrowsense heritability (\(h^2\)), whichmeasures how much trait variation can be explained by additive geneticeffects.

The plot below breaks down the estimated sources of traitvariation:

“grm”: Shows narrow sense heritability (\(h^2\))
“gxg”: Shows variance due to gene-by-gene interactions
“error”: Shows unexplained variance, such as environmentaleffects

In our simulation, we set the true narrow sense heritability to 30%,shown as a dashed line in the plot. This reference line helps youevaluate how accurately SME estimated the genetic components of traitvariation.

sme_result$vc_estimate%>%ggplot(aes(x = component,y = vc_estimate,fill = component))+geom_boxplot()+geom_hline(yintercept =0.3,color ="grey40",linetype ="dashed")+annotate("text",x =0.7,y =0.33,label ="'True ' * h^2",color ="black",parse =TRUE)+xlab("Component")+ylab("Variance Component Estimate")+theme(legend.position ="none")

h2 plot to illustrate the sparse marginal epistasis test

The estimate of the narrow-sense heritability\(h^2\) is much less variable because it isalways informed by the same genetic relatedness matrix. In this smalldata example it overestimates the heritability but it is unbiased ingeneral.

SessionInfo

sessionInfo()#> R version 4.5.1 (2025-06-13)#> Platform: aarch64-apple-darwin24.5.0#> Running under: macOS Sequoia 15.6.1#>#> Matrix products: default#> BLAS:   /opt/homebrew/Cellar/openblas/0.3.30/lib/libopenblasp-r0.3.30.dylib#> LAPACK: /opt/homebrew/Cellar/r/4.5.1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1#>#> locale:#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#>#> time zone: Europe/Berlin#> tzcode source: internal#>#> attached base packages:#> [1] stats     graphics  grDevices utils     datasets  methods   base#>#> other attached packages:#> [1] ggplot2_3.5.2 dplyr_1.1.4   smer_0.0.2#>#> loaded via a namespace (and not attached):#>  [1] gtable_0.3.6        jsonlite_2.0.0      compiler_4.5.1#>  [4] tidyselect_1.2.1    Rcpp_1.1.0          FMStable_0.1-4#>  [7] parallel_4.5.1      tidyr_1.3.1         jquerylib_0.1.4#> [10] scales_1.4.0        yaml_2.3.10         fastmap_1.2.0#> [13] harmonicmeanp_3.0.1 R6_2.6.1            labeling_0.4.3#> [16] generics_0.1.4      knitr_1.50          genio_1.1.2#> [19] iterators_1.0.14    backports_1.5.0     checkmate_2.3.2#> [22] tibble_3.3.0        RColorBrewer_1.1-3  bslib_0.9.0#> [25] pillar_1.11.0       rlang_1.1.6         cachem_1.1.0#> [28] xfun_0.52           sass_0.4.10         cli_3.6.5#> [31] withr_3.0.2         magrittr_2.0.3      grid_4.5.1#> [34] digest_0.6.37       mvMAPIT_2.0.3       foreach_1.5.2#> [37] mvtnorm_1.3-3       lifecycle_1.0.4     CompQuadForm_1.4.4#> [40] vctrs_0.6.5         evaluate_1.0.4      glue_1.8.0#> [43] farver_2.1.2        codetools_0.2-20    purrr_1.1.0#> [46] rmarkdown_2.29      tools_4.5.1         pkgconfig_2.0.3#> [49] htmltools_0.5.8.1