- Article
- Published:
A single-molecule long-read survey of the human transcriptome
Nature Biotechnologyvolume 31, pages1009–1014 (2013)Cite this article
21kAccesses
472Citations
77Altmetric
AnErratum to this article was published on 10 March 2014
This article has beenupdated
Abstract
Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.
This is a preview of subscription content,access via your institution
Access options
Subscription info for Japanese customers
We have a dedicated website for our Japanese customers. Please go tonatureasia.com to subscribe to this journal.
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others

A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines
Accession codes
Change history
25 November 2013
In the version of this article initially published, the accession code for data was left out. The error has been corrected in the HTML and PDF versions of the article.
References
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing.Science320, 1344–1349 (2008).
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes.Nature456, 470–476 (2008).
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome.Science321, 956–960 (2008).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq.Nat. Methods5, 621–628 (2008).
Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.Nature453, 1239–1243 (2008).
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics.Nat. Rev. Genet.10, 57–63 (2009).
Djebali, S. et al. Landscape of transcription in human cells.Nature489, 101–108 (2012).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules.Science323, 133–138 (2009).
Quail, M.A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.BMC Genomics13, 341 (2012).
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads.Nat. Biotechnol.30, 693–700 (2012).
Au, K.F., Underwood, J.G., Lee, L. & Wong, W.H. Improving PacBio long read accuracy by short read alignment.PLoS ONE7, e46679 (2012).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project.Genome Res.22, 1760–1774 (2012).
Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection.Nucleic Acids Res.38, e159 (2010).
Tilgner, H. et al. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing.G3 (Bethesda)3, 387–397 (2013).
Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences.Bioinformatics21, 1859–1875 (2005).
R Development Core Team. R: A language and environment for statistical computinghttp://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2012).
van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Most “dark matter” transcripts are associated with known genes.PLoS Biol.8, e1000371 (2010).
Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries.Nat. Methods10, 325–327 (2013).
Pickrell, J.K., Pai, A.A., Gilad, Y. & Pritchard, J.K. Noisy splicing drives mRNA isoform diversity in human cells.PLoS Genet.6, e1001236 (2010).
Fagnani, M. et al. Functional coordination of alternative splicing in the mammalian central nervous system.Genome Biol.8, R108 (2007).
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.Genome Res.22, 1775–1789 (2012).
Parra, G., Blanco, E. & Guigó, R. GeneID inDrosophila.Genome Res.10, 511–515 (2000).
Eyras, E., Caccamo, M., Curwen, V. & Clamp, M. ESTGenes: alternative splicing from ESTs in Ensembl.Genome Res.14, 976–987 (2004).
Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.Genes Dev.25, 1915–1927 (2011).
Gingeras, T. Missing lincs in the transcriptome.Nat. Biotechnol.27, 346–347 (2009).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.Nature458, 223–227 (2009).
Himmelmann, L. & R Development Core Team. R: A language and environment for statistical computinghttp://cran.r-project.org/web/packages/HMM/HMM.pdf (R Foundation for Statistical Computing, Vienna, Austria, 2010).
Acknowledgements
We thank J. Eid and L. Hickey at Pacific Biosciences for providing alignment statistics of reads to reference genomes. We thank J. Kelley, C. Araya, D. Phanstiel, S. Shringarpure and M. Sikora at Stanford as well as J. Korlach at Pacific Biosciences for comments on this manuscript. We would like to also thank T. Daley and A. Smith at USC for advice on modeling library complexity. This work was supported by US National Institutes of Health grants 5P01GM099130-02, 5U54HG00699602-02 and 5U01HL107393-03, and by the US National Institues of Health training grant 5 T32 HD07149.
Author information
Donald Sharon and Hagen Tilgner: These authors contributed equally to this work.
Authors and Affiliations
Department of Genetics, Stanford University, Stanford, California, USA
Donald Sharon, Hagen Tilgner, Fabian Grubert & Michael Snyder
Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA
Donald Sharon
- Donald Sharon
You can also search for this author inPubMed Google Scholar
- Hagen Tilgner
You can also search for this author inPubMed Google Scholar
- Fabian Grubert
You can also search for this author inPubMed Google Scholar
- Michael Snyder
You can also search for this author inPubMed Google Scholar
Contributions
All authors proposed the project. D.S. devised and performed experiments and wrote the first version of the introduction. H.T. devised and performed analysis, prepared figures and wrote the first version of results and discussion. All authors discussed experiments and analysis and collaborated on the final version.
Corresponding author
Correspondence toMichael Snyder.
Ethics declarations
Competing interests
M.S. is on the scientific advisory board of Personalis and GenapSys. All other authors declare no competing interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–6 (PDF 904 kb)
Rights and permissions
About this article
Cite this article
Sharon, D., Tilgner, H., Grubert, F.et al. A single-molecule long-read survey of the human transcriptome.Nat Biotechnol31, 1009–1014 (2013). https://doi.org/10.1038/nbt.2705
Received:
Accepted:
Published:
Issue Date: