Movatterモバイル変換


[0]ホーム

URL:


Title:Sparse Marginal Epistasis Test
Version:0.0.2
URL:https://github.com/lcrawlab/sme,https://lcrawlab.github.io/sme/
BugReports:https://github.com/lcrawlab/sme/issues
Description:The Sparse Marginal Epistasis Test is a computationally efficient genetics method which detects statistical epistasis in complex traits; see Stamp et al. (2025, <doi:10.1101/2025.01.11.632557>) for details.
License:MIT + file LICENSE
Encoding:UTF-8
RoxygenNote:7.3.2
LinkingTo:BH, HighFive, Rcpp, RcppEigen, Rhdf5lib, testthat
Imports:dplyr, genio, logging, mvMAPIT, Rcpp, tidyr
Suggests:GenomicRanges, ggplot2, knitr, rmarkdown, testthat (≥3.0.0), xml2
Config/testthat/edition:3
SystemRequirements:GNU make
VignetteBuilder:knitr
Depends:R (≥ 4.4.0)
LazyData:true
biocViews:GenomeWideAssociation, Epistasis, Genetics, SNP,LinearMixedModel
NeedsCompilation:yes
Packaged:2025-08-22 12:25:10 UTC; jds
Author:Julian StampORCID iD [cre, aut], Lorin CrawfordORCID iD [aut], sriramlab [cph] (Author of included mailman algorithm)
Maintainer:Julian Stamp <julian.d.stamp@gmail.com>
Repository:CRAN
Date/Publication:2025-08-22 12:50:02 UTC

Estimate Memory Requirements for SME Routine

Description

This function provides an approximate estimate of the memory requirements(in gigabytes) for running the Sparse Marginal Epistasis (SME) routinebased on input parameters such as the number of samples, SNPs, and other configurations.

Usage

approximate_memory_requirements(  n_samples,  n_snps,  n_blocks,  n_randvecs,  chunksize)

Arguments

n_samples

Integer. The number of samples in the dataset.

n_snps

Integer. The total number of SNPs in the dataset.

n_blocks

Integer. The number of genotype blocks used to partition SNPs.Affects the size of encoded genotype segments.

n_randvecs

Integer. The number of random vectors used for stochastictrace estimation. Affects memory for operations involving random vectors.

chunksize

Integer. The number of focal SNPs processed per chunk.

Details

The function calculates memory usage by summing the contributions fromvarious components used in the SME routine, including:

The estimated memory requirement is derived from the data dimensionsand operational needs, and it provides a guideline for configuring resourcesfor the analysis.

Value

Numeric. The approximate memory requirement (in gigabytes) for theSME routine.

Examples

n_samples <- 1e5n_snps <- 1e6n_blocks <- 100n_randvecs <- 100chunksize <- 10approximate_memory_requirements(n_samples,                                n_snps,                                n_blocks,                                n_randvecs,                                chunksize)

Create an HDF5 File

Description

This function creates a new, empty HDF5 file at the specified location.

Usage

create_hdf5_file(hdf5_file)

Arguments

hdf5_file

A character string specifying the path and name of the HDF5file to be created.

Value

No return value; the function creates the HDF5 file at the specifiedlocation.

Examples

# Create an empty HDF5 filehdf5_file <- tempfile()create_hdf5_file(hdf5_file)

Simulated Dataset for Genome-Wide Interaction Analysis

Description

getting_started is a simulated dataset created to demonstrate the use ofthesme() function for genome-wide interaction analyses. It containsresults from a simulated analysis involving additive genetic effects andgene-by-gene (GxG) interactions.

Usage

data("getting_started")

Format

A list with results fromsme(), including the following components:

summary

A data frame summarizing the analysis results, includingp-values for SNP associations (p).

pve

A data frame containing the per SNP variance componentestimates normalized to phenotypic variance explained (PVE).

vc

A data frame containing the per SNP variance componentestimates.

gxg_snps

A vector containing the indices of the SNPs assigned tohave epistatic interactions in the trait simulations.

Details

The dataset was generated as follows:

Key Parameters

Source

data-raw/getting_started.R

See Also

sme

Examples

data("getting_started")head(getting_started$summary)

Read Dataset from an HDF5 File

Description

This function reads a dataset from an existing HDF5 file.

Usage

read_hdf5_dataset(file_name, dataset_name)

Arguments

file_name

A character string specifying the path to the HDF5 file.

dataset_name

A character string specifying the name of the datasetwithin the HDF5 file to read.

Value

The content of the dataset from the HDF5 file, typically in the formof an R object.

Examples

data_to_write <- 1:10# Create an empty HDF5 filehdf5_file <- tempfile()create_hdf5_file(hdf5_file)# Write new data to a dataset in the HDF5 filewrite_hdf5_dataset(hdf5_file, "group/dataset", data_to_write)# Read a dataset from an HDF5 filehdf5_data <- read_hdf5_dataset(hdf5_file, "group/dataset")print(hdf5_data)

Simulate Quantitative Traits from PLINK Genotypes

Description

This function simulates a quantitative trait based on additive and epistaticgenetic effects using genotype data from a PLINK dataset. The simulated traitis saved to a specified output file in a phenotype format compatible withPLINK.

Usage

simulate_traits(  plink_file,  output_file,  additive_heritability,  gxg_heritability,  additive_indices,  gxg_indices_1,  gxg_indices_2,  log_level = "WARNING")

Arguments

plink_file

Character. Path to the PLINK dataset (without fileextension). The function will append.bed,.bim, and.fam extensionsas needed.

output_file

Character. Path to the output file where the simulatedtrait will be saved.

additive_heritability

Numeric. A value between 0 and 1 specifying theproportion of trait variance due to additive genetic effects.

gxg_heritability

Numeric. A value between 0 and 1 specifying theproportion of trait variance due to gene-by-gene (epistatic) interactions.The sum ofadditive_heritability andgxg_heritability must not exceed 1.

additive_indices

Integer vector. Indices of SNPs contributing toadditive genetic effects.

gxg_indices_1

Integer vector. Indices of SNPs in the first group forepistatic interactions.

gxg_indices_2

Integer vector. Indices of SNPs in the second group forepistatic interactions.

log_level

Character. Logging level for messages(e.g., "DEBUG", "INFO", "WARNING"). Default is "WARNING".

Details

The function uses the following components to simulate the trait:

The output file is in PLINK-compatible phenotype format with three columns:Family ID (FID), Individual ID (IID), and the simulated trait (TRAIT).

Value

None. The simulated trait is written to the specifiedoutput_file.

Examples

plink_file <- gsub("\\.bed", "", system.file("testdata", "test.bed", package = "smer"))out_file <- tempfile()additive_heritability <- 0.3gxg_heritability <- 0.1additive_snps <- sort(sample(1:100, 50, replace = FALSE))gxg_group_1 <- sort(sample(additive_snps, 10, replace = FALSE))gxg_group_2 <- sort(sample(setdiff(additive_snps, gxg_group_1), 10, replace = FALSE))n_samples <- 200simulate_traits(  plink_file,  out_file,  additive_heritability,  gxg_heritability,  additive_snps,  gxg_group_1,  gxg_group_2)from_file <- read.table(out_file, header = TRUE)head(from_file)

Sparse Marginal Epistasis Test (SME)

Description

SME fits a linear mixed model in order to test for marginal epistasis. Itconcentrates the scans for epistasis to regions of the genome that have knownfunctional enrichment for a trait of interest.

Usage

sme(  plink_file,  pheno_file,  mask_file = NULL,  gxg_indices = NULL,  chunk_size = NULL,  n_randvecs = 10,  n_blocks = 100,  n_threads = 1,  gxg_h5_group = "gxg",  ld_h5_group = "ld",  rand_seed = -1,  log_level = "WARNING")

Arguments

plink_file

Character. File path to the PLINK dataset(without *.bed extension).The function will append.bim,.bed, and.fam extensionsautomatically.The genotype data must not have any missing genotypes. Use PLINK to removevariants with missing genotypes or impute them.

pheno_file

Character. File path to a phenotype file in PLINK format.The file should contain exactly one phenotype column.

mask_file

Character or NULL. File path to an HDF5 file specifyingper-SNP masks for gene-by-gene interaction tests. This file informs whichSNPs are tested for marginal epistasis. Defaults toNULL, indicating nomasking. Masking impacts the scaling of memory and time.

gxg_indices

Integer vector or NULL. List of indices corresponding toSNPs to test for marginal epistasis.IfNULL, all SNPs in the dataset will be tested.These indices are1-based.

chunk_size

Integer or NULL. Number of SNPs processed per chunk.This influences memoryusage and can be leftNULL to automatically determine the chunk sizebased ongxg_indices and number of threads.

n_randvecs

Integer. Number of random vectors used for stochastic traceestimation.Higher values yield more accurate estimates but increase computationalcost. Default is 10.

n_blocks

Integer. Number of blocks into which SNPs are divided forprocessing.This parameter affects memory requirements. Default is 100.

n_threads

Integer. Number of threads for OpenMP parallel processing.Default is 1.

gxg_h5_group

Character. Name of the HDF5 group within the mask filecontaining gene-by-geneinteraction masks. SNPs in this group will be included in the gene-by-geneinteractions. Defaults to "gxg".

ld_h5_group

Character. Name of the HDF5 group within the mask filecontaining linkage disequilibriummasks. SNPs in this group are excluded from analysis. Defaults to "ld".

rand_seed

Integer. Seed for random vector generation. If-1, no seedis set. Default is -1.

log_level

Character. Logging level for messages. Must be in uppercase(e.g., "DEBUG", "INFO", "WARNING", "ERROR"). Default is "WARNING".

Details

This function integrates PLINK-formatted genotype and phenotype data toperform marginal epistasis tests on a set of SNPs. Using stochastic traceestimation, the method computes variance components for gene-by-geneinteraction and genetic relatedness using the MQS estimator. The process isparallelized using OpenMP whenn_threads > 1.

The memory requirements and computation time scaling can be optimized throughthe parameterschunk_size,n_randvecs, andn_blocks.

Mask Format Requirements

The mask file format is an HDF5 file used for storing index data forthe masking process. This format supports data retrieval by index.Below are the required groups and datasets within the HDF5 file:

The required group names can be configured as input parameters.The defaults are described below.

Important: All indices in the mask file data arezero-based, matching the zero-based indices of the PLINK.bim file.

Value

A list containing:

References

Stamp, J., Pattillo Smith, S., Weinreich, D., & Crawford, L. (2025).Sparse modeling of interactions enables fast detection of genome-wideepistasis in biobank-scale studies. bioRxiv, 2025.01.11.632557.

Stamp, J., DenAdel, A., Weinreich, D., & Crawford, L. (2023).Leveraging the genetic correlation between traits improves the detection ofepistasis in genome-wide association studies.G3: Genes, Genomes, Genetics, 13(8), jkad118.

Crawford, L., Zeng, P., Mukherjee, S., & Zhou, X. (2017).Detecting epistasis with the marginal epistasis test in genetic mappingstudies of quantitative traits. PLoS genetics, 13(7), e1006869.

Examples

plink_file <- gsub("\\.bed", "", system.file("testdata", "test.bed", package="smer"))pheno_file <- system.file("testdata", "test_h2_0.5.pheno", package="smer")mask_file <- ""# Parameter inputschunk_size <- 10n_randvecs <- 10n_blocks <- 10n_threads <- 1# 1-based Indices of SNPs to be analyzedn_snps <- 100snp_indices <- 1:n_snpssme_result <- sme(  plink_file,  pheno_file,  mask_file,  snp_indices,  chunk_size,  n_randvecs,  n_blocks,  n_threads)head(sme_result$summary)

Write Data to an HDF5 Dataset

Description

This function writes new data to an existing HDF5 file. If the datasetalready exists, it will be replaced with the new data.

Usage

write_hdf5_dataset(file_name, dataset_name, new_data)

Arguments

file_name

A character string specifying the path to the HDF5 file.

dataset_name

A character string specifying the name of the datasetto be written in the HDF5 file.

new_data

The new data to write to the dataset. The data must becompatible with the dataset's structure.

Value

No return value; the function modifies the specified dataset in theHDF5 file.

Examples

data_to_write <- 1:10# Create an empty HDF5 filehdf5_file <- tempfile()create_hdf5_file(hdf5_file)# Write new data to a dataset in the HDF5 filewrite_hdf5_dataset(hdf5_file, "group/dataset", data_to_write)

[8]ページ先頭

©2009-2025 Movatter.jp