The sparse marginal epistasis (SME) test performs a genome-widesearch for SNPs involved in genetic interactions while conditioning oninformation derived from functional genomic data.
By examining one SNP at a time, SME fits a linear mixed model to testfor marginal epistasis. It explicitly models the combined additiveeffects from all SNPs a marginal epistatic effect that representspairwise interactions involving a test SNP.
The key to the SME formulation is that the interaction between thetest SNP and other SNPs may be masked depending on additionalinformation. Masking interactions that do not contribute to traitvariance both maximizes the power of the inference as well as realizesthe computational efficiency needed to analyze human Biobank scaledata.
This pages presents a toy example to show case what insights theimplementation of the test provides.
Figure 1. SME performs a genome-wide search for SNPsinvolved in genetic interactions while conditioning on informationderived from functional genomic data. SME has improved power to detectmarginal epistasis and runs 10x to 90x faster than state-of-the-artmethods.
SME is implemented in R and requires your genetic data to be in PLINKformat, which consists of three files:
.bim: Contains SNP information.bed: Contains genetic data.fam: Contains sample informationNote:sme() cannot handle missinggenotypes. Prior to runningsme(), use PLINK to removevariants with missing genotypes or impute them.
Additionally, your phenotype (trait) data should be in a separatefile following PLINK’s phenotype format.
Thesme() function includes parameters that let youcontrol memory usage and computational resources. For detailed guidanceon optimizing these settings for your system, please see our tutorialHowTo Optimize the Memory Requirements of SME.
When selecting which SNPs to analyze, you must provide theirpositions as1-based indices. These indices correspondto the row numbers in your.bim file, where the first SNPis index 1, the second is index 2, and so on.
For complete details about all function parameters, please refer tothe documentation of thesme() function.
# File inputsplink_file<-"path/to/plink/file"pheno_file<-"path/to/pheno/file"mask_file<-"path/to/mask/file"# Parameter inputschun_ksize<-10n_randvecs<-10n_blocks<-10n_threads<-5# 1-based Indices of SNPs to be analyzedn_snps<-100snp_indices<-1:n_snpssme_result<-sme( plink_file, pheno_file, mask_file, snp_indices, chunk_size, n_randvecs, n_blocks, n_threads)This analysis uses simulated data for demonstration purposes. Wesimulated synthetic phenotypes from 5000 synthetic genotypes with 6000SNPs. If you would like to learn how to simulate data, please refer toour tutorialHowTo Simulate Traits.
When you call thesme() function, it returns a listcontaining multiple elements. The most important one is calledsummary, which contains the main analysis results. Theseresults are formatted as tidy data, making them compatible with popularR packages likeggplot2 anddplyr for furtheranalysis and visualization.
We use Manhattan plots to visualize genome-wide analyses because theyeffectively highlight strong associations between genetic variants andtraits. In this case, the Manhattan plot specifically shows statisticalepistasis (interactions between genes). For reference, we’ve marked thetrue causal SNPs (Single Nucleotide Polymorphisms) in green on the plot- these are the genetic variants we included in our simulation as havingreal effects.
sme_result$summary%>%ggplot(aes(x = index,y =-log10(p),color = true_gxg_snp))+geom_point()+xlab("Position")+labs(color ="Epistatic SNP")SME estimates how much of the total trait variation can be explainedby genetic interactions. In this simulation, we set the PhenotypicVariance Explained (PVE) to 5% for each SNP involved in epistaticinteractions.
The plot below shows how well our method recovered these effects. Itdisplays two distributions:
The dashed line marks the true 5% PVE level we used in thesimulation, allowing you to see how accurately the method estimated theactual effect sizes.
sme_result$summary%>%ggplot(aes(x = true_gxg_snp,y = pve,fill = true_gxg_snp))+geom_boxplot()+geom_hline(yintercept =0.05,color ="grey40",linetype ="dashed")+annotate("text",x =0.8,y =0.055,label ="True per SNP epistatic PVE",color ="black")+xlab("Epistatic SNP")+ylab("Phenotypic Variance Explained")+theme(legend.position ="none")The SME method uses a linear mixed model to separate differentsources of trait variation. One key component it estimates is narrowsense heritability (\(h^2\)), whichmeasures how much trait variation can be explained by additive geneticeffects.
The plot below breaks down the estimated sources of traitvariation:
In our simulation, we set the true narrow sense heritability to 30%,shown as a dashed line in the plot. This reference line helps youevaluate how accurately SME estimated the genetic components of traitvariation.
sme_result$vc_estimate%>%ggplot(aes(x = component,y = vc_estimate,fill = component))+geom_boxplot()+geom_hline(yintercept =0.3,color ="grey40",linetype ="dashed")+annotate("text",x =0.7,y =0.33,label ="'True ' * h^2",color ="black",parse =TRUE)+xlab("Component")+ylab("Variance Component Estimate")+theme(legend.position ="none")The estimate of the narrow-sense heritability\(h^2\) is much less variable because it isalways informed by the same genetic relatedness matrix. In this smalldata example it overestimates the heritability but it is unbiased ingeneral.
sessionInfo()#> R version 4.5.1 (2025-06-13)#> Platform: aarch64-apple-darwin24.5.0#> Running under: macOS Sequoia 15.6.1#>#> Matrix products: default#> BLAS: /opt/homebrew/Cellar/openblas/0.3.30/lib/libopenblasp-r0.3.30.dylib#> LAPACK: /opt/homebrew/Cellar/r/4.5.1/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.1#>#> locale:#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#>#> time zone: Europe/Berlin#> tzcode source: internal#>#> attached base packages:#> [1] stats graphics grDevices utils datasets methods base#>#> other attached packages:#> [1] ggplot2_3.5.2 dplyr_1.1.4 smer_0.0.2#>#> loaded via a namespace (and not attached):#> [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.1#> [4] tidyselect_1.2.1 Rcpp_1.1.0 FMStable_0.1-4#> [7] parallel_4.5.1 tidyr_1.3.1 jquerylib_0.1.4#> [10] scales_1.4.0 yaml_2.3.10 fastmap_1.2.0#> [13] harmonicmeanp_3.0.1 R6_2.6.1 labeling_0.4.3#> [16] generics_0.1.4 knitr_1.50 genio_1.1.2#> [19] iterators_1.0.14 backports_1.5.0 checkmate_2.3.2#> [22] tibble_3.3.0 RColorBrewer_1.1-3 bslib_0.9.0#> [25] pillar_1.11.0 rlang_1.1.6 cachem_1.1.0#> [28] xfun_0.52 sass_0.4.10 cli_3.6.5#> [31] withr_3.0.2 magrittr_2.0.3 grid_4.5.1#> [34] digest_0.6.37 mvMAPIT_2.0.3 foreach_1.5.2#> [37] mvtnorm_1.3-3 lifecycle_1.0.4 CompQuadForm_1.4.4#> [40] vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0#> [43] farver_2.1.2 codetools_0.2-20 purrr_1.1.0#> [46] rmarkdown_2.29 tools_4.5.1 pkgconfig_2.0.3#> [49] htmltools_0.5.8.1