Effects of GC bias in next-generation-sequencing data on de novo genome assembly
- PMID:23638157
- PMCID: PMC3639258
- DOI: 10.1371/journal.pone.0062856
Effects of GC bias in next-generation-sequencing data on de novo genome assembly
Abstract
Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.
Conflict of interest statement
Figures














Similar articles
- Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.Liu T, Tsai CH, Lee WB, Chiang JH.Liu T, et al.PLoS One. 2013 Jul 29;8(7):e69503. doi: 10.1371/journal.pone.0069503. Print 2013.PLoS One. 2013.PMID:23922726Free PMC article.
- Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome.Wang Y, Yu Y, Pan B, Hao P, Li Y, Shao Z, Xu X, Li X.Wang Y, et al.BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S21. doi: 10.1186/1752-0509-6-S3-S21. Epub 2012 Dec 17.BMC Syst Biol. 2012.PMID:23282199Free PMC article.
- SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.Stadermann KB, Weisshaar B, Holtgräwe D.Stadermann KB, et al.BMC Bioinformatics. 2015 Sep 16;16(1):295. doi: 10.1186/s12859-015-0726-6.BMC Bioinformatics. 2015.PMID:26377912Free PMC article.
- The present and future of de novo whole-genome assembly.Sohn JI, Nam JW.Sohn JI, et al.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096.Brief Bioinform. 2018.PMID:27742661Review.
- Next-generation sequencing technologies and their impact on microbial genomics.Forde BM, O'Toole PW.Forde BM, et al.Brief Funct Genomics. 2013 Sep;12(5):440-53. doi: 10.1093/bfgp/els062. Epub 2013 Jan 11.Brief Funct Genomics. 2013.PMID:23314033Review.
Cited by
- Integrated Genomic and Functional Characterization of the Anti-diabetic Potential of Arthrobacter sp. SW1.Shaligram S, Narwade NP, Kumbhare SV, Bordoloi M, Tamuli KJ, Nath S, Parimelazhagan T, Patil VS, Kapley A, Pawar SP, Dhotre DP, Muddeshwar MG, Purohit HJ, Shouche YS.Shaligram S, et al.Curr Microbiol. 2021 Jul;78(7):2577-2588. doi: 10.1007/s00284-021-02523-8. Epub 2021 May 13.Curr Microbiol. 2021.PMID:33983483
- Plastid phylogenomics uncovers multiple species in Medicago truncatula (Fabaceae) germplasm accessions.Choi IS, Wojciechowski MF, Steele KP, Hopkins A, Ruhlman TA, Jansen RK.Choi IS, et al.Sci Rep. 2022 Dec 7;12(1):21172. doi: 10.1038/s41598-022-25381-1.Sci Rep. 2022.PMID:36477422Free PMC article.
- HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints.Press WH, Hawkins JA, Jones SK Jr, Schaub JM, Finkelstein IJ.Press WH, et al.Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18489-18496. doi: 10.1073/pnas.2004821117. Epub 2020 Jul 16.Proc Natl Acad Sci U S A. 2020.PMID:32675237Free PMC article.
- RUBICON: a framework for designing efficient deep learning-based genomic basecallers.Singh G, Alser M, Denolf K, Firtina C, Khodamoradi A, Cavlak MB, Corporaal H, Mutlu O.Singh G, et al.Genome Biol. 2024 Feb 16;25(1):49. doi: 10.1186/s13059-024-03181-2.Genome Biol. 2024.PMID:38365730Free PMC article.
- A GFP expressing influenza A virus to report in vivo tropism and protection by a matrix protein 2 ectodomain-specific monoclonal antibody.De Baets S, Verhelst J, Van den Hoecke S, Smet A, Schotsaert M, Job ER, Roose K, Schepens B, Fiers W, Saelens X.De Baets S, et al.PLoS One. 2015 Mar 27;10(3):e0121491. doi: 10.1371/journal.pone.0121491. eCollection 2015.PLoS One. 2015.PMID:25816132Free PMC article.
References
- Schuster SC (2008) Next-generation sequencing transforms today's biology. Nat Methods 5: 16–18. - PubMed
- Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform 11: 457–472. - PubMed
- Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11: 31–46. - PubMed
Publication types
MeSH terms
Related information
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous