Part of the book series:Lecture Notes in Computer Science ((LNBI,volume 13883))
Included in the following conference series:
351Accesses
Abstract
Metagenomic profiling from sequencing data aims to disentangle a microbial sample at lower ranks of taxonomy, such as species and strains. Deep taxonomic profiling involving accurate estimation of strain level abundances aids in precise quantification of the microbial composition, which plays a crucial role in various downstream analyses. Existing tools primarily focus on strain/subspecies identification and limit abundance estimation to the species level. Abundance quantification of the identified strains is challenging and remains largely unaddressed by the existing approaches. We propose a novel algorithm MAGE (Microbial Abundance GaugE), for accurately identifying constituent strains and quantifying strain level relative abundances. For accurate profiling, MAGE uses read mapping information and performs a novel local search-based profiling guided by a constrained optimization based on maximum likelihood estimation. Unlike the existing approaches that often rely on strain-specific markers and homology information for deep profiling, MAGE works solely with read mapping information, which is the set of target strains from the reference collection for each mapped read. As part of MAGE, we provide an alignment-free and kmer-based read mapper that uses a compact and comprehensive index constructed using FM-index and R-index. We use a variety of evaluation metrics for validating abundances estimation quality. We performed several experiments using a variety of datasets, and MAGE exhibited superior performance compared to the existing tools on a wide range of performance metrics. (Supplementary material available athttps://doi.org/10.5281/zenodo.7746145.)
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alizon, S., de Roode, J.C., Michalakis, Y.: Multiple infections and the evolution of virulence. Ecol. Lett.16(4), 556–567 (2013)
Anyansi, C., Straub, T.J., Manson, A.L., Earl, A.M., Abeel, T.: Computational methods for strain-level microbial detection in colony and metagenome sequencing data. Front. Microbiol.11, 1925 (2020)
Balmer, O., Tanner, M.: Prevalence and implications of multiple-strain infections. Lancet Infect. Dis.11(11), 868–878 (2011)
Beghini, F., et al.: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with biobakery 3. Elife10, e65088 (2021)
Bray, N.L., Pimentel, H., Melsted, P., Pachter, L.: Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol.34(5), 525–527 (2016)
Centrifuge.https://ccb.jhu.edu/software/centrifuge/
Da Silva, K., Pons, N., Berland, M., Oñate, F.P., Almeida, M., Peterlongo, P.: Strainflair: Strain-level profiling of metagenomic samples using variation graphs. PeerJ9, e11884 (2021)
van Dijk, L.R., et al.: Strainge: A toolkit to track and characterize low-abundance strains in complex microbial communities. Genome Biol.23(1), 1–27 (2022)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)
Freitas, T.A.K., Li, P.E., Scholz, M.B., Chain, P.S.: Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucl. Acids Res.43(10), e69–e69 (2015)
Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1459–1477. SIAM (2018)
Hamady, M., Knight, R.: Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res.19(7), 1141–1152 (2009)
Huang, W., Li, L., Myers, J.R., Marth, G.T.: Art: A next-generation sequencing read simulator. Bioinformatics28(4), 593–594 (2012)
Kim, D., Song, L., Breitwieser, F.P., Salzberg, S.L.: Centrifuge: Rapid and sensitive classification of metagenomic sequences. Genome Res.26(12), 1721–1729 (2016)
Kuhnle, A., Mun, T., Boucher, C., Gagie, T., Langmead, B., Manzini, G.: Efficient construction of a complete index for pan-genomics read alignment. J. Comput. Biol.27(4), 500–513 (2020)
Li, H.: WGSIM - simulating sequence reads from a reference genome.https://github.com/lh3/wgsim (2011)
Li, H., et al.: The sequence alignment/map format and samtools. Bioinformatics25(16), 2078–2079 (2009)
Lu, J., Breitwieser, F.P., Thielen, P., Salzberg, S.L.: Bracken: Estimating species abundance in metagenomics data. PeerJ Comput. Sci.3, e104 (2017)
McIntyre, A.B., et al.: Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol.18(1), 1–19 (2017)
McIver, L.J., et al.: Biobakery: A meta’omic analysis environment. Bioinformatics34(7), 1235–1237 (2018)
MetaPhlAn2.https://github.com/biobakery/MetaPhlAn2
Neelakanta, G., Sultana, H.: The use of metagenomic approaches to analyze changes in microbial communities. Microbiol. Insights6, MBI-S10819 (2013)
Nikulin, M.S., et al.: Hellinger distance. Encyclopedia Math.78 (2001)
O’Leary, N.A., et al.: Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucl. Acids Res.44(D1), D733–D745 (2016)
Petri, M.: Fm-index-compressed full-text index.https://github.com/mpetri/FM-Index (2015)
Roberts, A., Pachter, L.: Streaming fragment assignment for real-time analysis of sequencing experiments. Nat. Methods10(1), 71–73 (2013)
Roosaare, M., et al.: Strainseeker: Fast identification of bacterial strains from raw sequencing reads using user-provided guide trees. PeerJ5, e3353 (2017)
Scholz, M., et al.: Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat. Methods13(5), 435–438 (2016)
Simon, H.Y., Siddle, K.J., Park, D.J., Sabeti, P.C.: Benchmarking metagenomics tools for taxonomic classification. Cell178(4), 779–794 (2019)
Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci.106(8), 2677–2682 (2009)
Truong, D.T., et al.: Metaphlan2 for enhanced metagenomic taxonomic profiling. Nat. Methods12(10), 902–903 (2015)
Wood, D.E., Lu, J., Langmead, B.: Improved metagenomic analysis with kraken 2. Genome Biol.20(1), 1–13 (2019)
Wood, D.E., Salzberg, S.L.: Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol.15(3), 1–12 (2014)
Walia, V., Saipradeep, V.G., Srinivasan, R., Sivadasan, N.: Supplementary Materials: MAGE (2023).https://doi.org/10.5281/zenodo.7746145
Author information
Authors and Affiliations
TCS Research, Hyderabad, India
Vidushi Walia, V. G. Saipradeep, Rajgopal Srinivasan & Naveen Sivadasan
- Vidushi Walia
You can also search for this author inPubMed Google Scholar
- V. G. Saipradeep
You can also search for this author inPubMed Google Scholar
- Rajgopal Srinivasan
You can also search for this author inPubMed Google Scholar
- Naveen Sivadasan
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toNaveen Sivadasan.
Editor information
Editors and Affiliations
Freie Universität Berlin, Berlin, Germany
Katharina Jahn
Comenius University, Bratislava, Slovakia
Tomáš Vinař
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Walia, V., Saipradeep, V.G., Srinivasan, R., Sivadasan, N. (2023). MAGE: Strain Level Profiling of Metagenome Samples. In: Jahn, K., Vinař, T. (eds) Comparative Genomics. RECOMB-CG 2023. Lecture Notes in Computer Science(), vol 13883. Springer, Cham. https://doi.org/10.1007/978-3-031-36911-7_14
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-36910-0
Online ISBN:978-3-031-36911-7
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative