- Notifications
You must be signed in to change notification settings - Fork0
Detect elevations and gaps in read coverage on metagenome contigs or assembled genomes
jlmaier12/ProActive
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
ProActive automatically detects regions of gapped and elevated readcoverage using a 2D pattern-matching algorithm.ProActive detects,characterizes and visualizes read coverage patterns in both genomes andmetagenomes. Optionally, users may provide gene annotations associatedwith their genome or metagenome in the form of a .gff file. In thiscase,ProActive will generate an additional output table containingthe gene annotations found within the detected regions of gapped andelevated read coverage. Additionally, users can search for geneannotations of interest in the output read coverage plots.
Visualizing read coverage data is important because gaps and elevationsin coverage can be indicators of a variety of biological andnon-biological scenarios, for example-
- Elevations and gaps in read coverage may be caused by some types ofstructural variants. Deletions can cause gaps while duplications cancause elevations in read coverage [1].
- Highly active and/or abundant mobile genetic elements, liketransposable elements [2] and prophage [3] for example, can createelevations in read coverage at their respective integration sites.
- Genetic regions with high mutation rates and/or high variabilitywithin the population can generate gaps in read coverage [4].
- Poor quality sequencing reads and chimeric reference sequences maycause gaps and elevations in read coverage.
Since the cause for gaps and elevations in read coverage can beambiguous, ProActive is best used as a screening method to identifygenetic regions for further investigation with other tools!
References:
- Tattini L., D’Aurizio R., & Magi A. (2015). Detection of GenomicStructural Variants from Next-Generation Sequencing Data. Frontiersin bioengineering and biotechnology, 3, 92.https://doi.org/10.3389/fbioe.2015.00092
- Kleiner M., Bushnell B., Sanderson K.E. et al. (2020)Transductomics: sequencing-based detection and analysis oftransduced DNA in pure cultures and microbial communities.Microbiome 8, 158.https://doi.org/10.1186/s40168-020-00935-5
- Kieft K., Anantharaman K. (2022). Deciphering Active Prophages fromMetagenomes. mSystems 7:e00084-22.https://doi.org/10.1128/msystems.00084-22
- Fogarty E., Moore R. (2019). Visualizing contig coverages to betterunderstand microbial population structure.https://merenlab.org/2019/11/25/visualizing-coverages/
ProActive detects read coverage patterns using a pattern-matchingalgorithm that operates on pileup files. A pileup file is a file formatwhere each row summarizes the ‘pileup’ of reads at specific genomiclocations. Pileup files can be used to generate a rolling mean of readcoverages and associated base pair positions which reduces data sizewhile preserving read coverage patterns.ProActive requires that inputpileups filesbe generated using a 100 bp window/bin size.
Pileup files can be generated by mapping sequencing reads to ametagenome or genome fasta.Read mapping should be performed using ahighminimum identity (0.97 or higher) and random mapping ofambiguous reads. The pileup files needed for ProActive are generatedusing the .bam files produced during read mapping. Some read mappers,likeBBMap,allow for the generation of pileup files in thebbmap.shcommand with use of thebincov output with thecovbinsize=100parameter/argument.Otherwise, BBMap’spileup.shcan convert .bam files produced by any read mapper to pileup filescompatible with ProActive using thebincov output withbinsize=100.
NOTE: For detailed information on input file format, please see thevignette. Users may also use the ‘sampleMetagenomePileup’ and‘sampleGenomePileup’ files that come pre-loaded with ProActive as areference.
ProActive optionally accepts a .gff file as input. The .gff file must beassociated with the same metagenome or genome used to create your pileupfile. The .gff file should be a TSV and should follow the same generalformat describedhere.
Install ProActive from CRAN with:
install.packages("ProActive")library(ProActive)
Install the development version of ProActive fromGitHub with:
if (!require("devtools",quietly=TRUE)) { install.packages("devtools")}devtools::install_github("jlmaier12/ProActive")library(ProActive)
library(ProActive)## Metagenome modeMetagenomeProActive<- ProActiveDetect(pileup=sampleMetagenomePileup,mode="metagenome",gffTSV=sampleMetagenomegffTSV)#> Preparing input file for pattern-matching...#> Starting pattern-matching...#> A quarter of the way done with pattern-matching#> Half of the way done with pattern-matching#> Almost done with pattern-matching!#> Summarizing pattern-matching results#> Finding gene predictions in elevated or gapped regions of read coverage...#> Finalizing output#> Execution time: 2.09secs#> 0 contigs were filtered out based on low read coverage#> 0 contigs were filtered out based on length (< minContigLength)#>#> Elevation Gap NoPattern#> 3 3 1MetagenomePlots<- plotProActiveResults(pileup=sampleMetagenomePileup,ProActiveResults=MetagenomeProActive)MetagenomeGeneMatches<- geneAnnotationSearch(ProActiveResults=MetagenomeProActive,pileup=sampleMetagenomePileup,gffTSV=sampleMetagenomegffTSV,geneOrProduct="product",keyWords= c("transport","chemotaxis"))#> Cleaning gff file...#> Cleaning pileup file...#> Searching for matching annotations...#> 3 contigs/chunks have gene annotations that match one or more of the provided keyWords## Genome modeGenomeProActive<- ProActiveDetect(pileup=sampleGenomePileup,mode="genome",gffTSV=sampleGenomegffTSV)#> Preparing input file for pattern-matching...#> Starting pattern-matching...#> A quarter of the way done with pattern-matching#> Half of the way done with pattern-matching#> Almost done with pattern-matching!#> Summarizing pattern-matching results#> Finding gene predictions in elevated or gapped regions of read coverage...#> Finalizing output#> Execution time: 29.7secs#> 0 contigs were filtered out based on low read coverage#> 0 contigs were filtered out based on length (< minContigLength)#>#> Elevation Gap NoPattern#> 25 3 21GenomePlots<- plotProActiveResults(pileup=sampleGenomePileup,ProActiveResults=GenomeProActive)GenomeGeneMatches<- geneAnnotationSearch(ProActiveResults=GenomeProActive,pileup=sampleGenomePileup,gffTSV=sampleGenomegffTSV,geneOrProduct="product",keyWords= c("ribosomal"),inGapOrElev=TRUE,bpRange=5000)#> Cleaning gff file...#> Cleaning pileup file...#> Searching for matching annotations...#> 8 contigs/chunks have gene annotations that match one or more of the provided keyWords
About
Detect elevations and gaps in read coverage on metagenome contigs or assembled genomes
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.