* Corresponding authors

a Department of Civil and Environmental Engineering, Yonsei University, Yonsei-ro 50, Seodaemoon-gu, Seoul, Republic of Korea
E-mail:parkj@yonsei.ac.kr
Tel: +82-2-2123-5798

Abstract

Next-generation sequencing (NGS) is a popular method for assessing the molecular diversity of microbial communities without cultivation, for identifying polymorphisms in populations, and for comparing genomes and transcriptomes. However, sequence-specific errors (SSEs) by NGS systems can result in genome mis-assembly, overestimation of diversity in microbial community analyses, and false polymorphism discovery. SSEs can be particularly problematic due to rich microbial biodiversity and genomes containing frequent repeats. In this study, SSEs in public data from all popular NGS systems were discovered using a Markov chain model and hotspots for sequence errors were identified. Deletion errors were frequently preceded by homopolymers in non-Illumina NGS systems, such as GS FLX+. Substitution errors were often related to high GC contents and long G/C homopolymers in Illumina sequencing systems such as HiSeq. After removal of long G/C homopolymers in HiSeq, the average lengths of contigs and average SNP quality increased. SSEs were selectively removed from our mock community data by quality filtering, and a bias against specific microbes was identified. Our findings provide a scientific basis for filtering poor-quality reads, correcting deletion errors, preventing genome mis-assembly, and accurately assessing microbial community compositions and polymorphisms.

Graphical abstract: Characterization of sequence-specific errors in various next-generation sequencing systems

You have access to this article
Please wait while we load your content... Something went wrong.Try again?

Supplementary files

Article information

Article type
Paper
Submitted
05 Nov 2015
Accepted
04 Jan 2016
First published
21 Jan 2016

Mol. BioSyst., 2016,12, 914-922

Characterization of sequence-specific errors in various next-generation sequencing systems

S. Shin and J. Park,Mol. BioSyst., 2016, 12, 914DOI: 10.1039/C5MB00750J

To request permission to reproduce material from this article, please go to theCopyright Clearance Center request page.

If you arean author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you arethe author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to theCopyright Clearance Center request page.

Read more abouthow to correctly acknowledge RSC content.

Search articles by author

Spotlight

Advertisements