- Notifications
You must be signed in to change notification settings - Fork0
Multigranular Analysis of Regulatory Variants on the Epigenomic Landscape
License
fuxialexander/marvel
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MARVEL: Multigranular Analysis of Regulatory Variants on the Epigenomic Landscape.
MARVEL is a pipeline for noncoding regulatory variants analysis using whole-genome sequencing data and cell-type specific epigenomic profiles. The workflow of MARVEL can be summarized using the following figure:
Figure 1 Schematic overview of MARVEL. (a) Epigenomic data of relevant cell type (hNC in the case of HSCR) are integrated with a gene annotation set to identify the active regulatory elements relevant to the phenotype of interest. (b) In each regulatory element, the functional significance of genetic variants is evaluated by their perturbation to TF sequence motifs. (c) Since the perturbation effects of multiple genetic variants may not add up linearly, they are considered together to reconstruct the sample-specific sequences, based on which the overall change of TF motif match scores is determined. (d) For motifs with multiple appearances within the same regulatory element, their match scores are aggregated to give a single score. (e) At a higher level, if a gene involves multiple regulatory elements, the aggregated match scores of a motif in the different elements can be further aggregated into a single score. This is done in the gene-based analysis. (f-g) The aggregated match score matrix of all the motifs for a regulatory element/gene is used as the input of an association test, which selects a subset of the most informative motif features (f) and compares a model involving both these selected features and the covariates with a null model that involves only the covariates using likelihood ratio (LR) test (g). (h) The regulatory elements and genes identified to be significantly associated with the phenotype can be further studied by other downstream analyses, such as gene set enrichment and single-cell expression analyses. (i) TFs with recurrently perturbed match scores in different regulatory elements are collected to infer a network that highlights the phenotype-associated perturbations. Please notice thath andi are not included in this repository at the current stage, but can be obtained easily using the result produced by MARVEL andCytoscape.
The pipeline is built usingNextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
i. Installnextflow
curl -s https://get.nextflow.io| bash
and add it to your path. The reason is the normal release has a bug in conda integration.
ii. Install one ofdocker
,singularity
orconda
iii. Clone the repo and test it on a minimal datasetNotice: the test.vcf.gz and pheno_covar.txt file were temporarily removed as they were made from real genomics data.
Basically:
For test.vcf.gz: you can use bcftools to select variants in a small regions to produce a VCF with genotypes of multiple samples
For pheno_covar.txt:
- It's a TSV file
- First column is sample name (in the same order as in the VCF file)
- Second column is y/phenotype in 0, 1 coding
- Third or later columns are covariates (all numeric)
git clone https://github.com/fuxialexander/marvel.gitcd marvelnextflow main.nf -profile test,<docker/singularity/conda> -resume
iv. Look into nextflow.config and test/test.conf and modify it to start running your own analysis!
nextflow main.nf -profile<docker/singularity/conda> -resume
Seeusage docs for all of the available options when running the pipeline.
The marvel pipeline comes with documentation about the pipeline, found in thedocs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
MARVEL is implemented using a boilerplate created by the nf-core team (https://nf-co.re/).
If you use MARVEL for your analysis, please cite it as:Fu AX, Lui KN, Tang CS, Ng RK, Lai FP, Lau ST, Li Z, Garcia-Barcelo MM, Sham PC, Tam PK, Ngan ES, Yip KY. Whole-genome analysis of noncoding genetic variations identifies multiscale regulatory element perturbations associated with Hirschsprung disease. Genome Res. 2020 Nov;30(11):1618-1632. doi: 10.1101/gr.264473.120.
About
Multigranular Analysis of Regulatory Variants on the Epigenomic Landscape