Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Biotechnology
  • Article
  • Published:

A single-molecule long-read survey of the human transcriptome

Nature Biotechnologyvolume 31pages1009–1014 (2013)Cite this article

Subjects

AnErratum to this article was published on 10 March 2014

This article has beenupdated

Abstract

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

This is a preview of subscription content,access via your institution

Access options

Access through your institution

Subscription info for Japanese customers

We have a dedicated website for our Japanese customers. Please go tonatureasia.com to subscribe to this journal.

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Completeness of cDNA molecules.
Figure 2: Assessment of completeness of CCS reads in controlled environments.
Figure 3: Exon-intron structure of molecules.
Figure 4: Analysis of unannotated transcripts.

Similar content being viewed by others

Accession codes

Primary accessions

European Nucleotide Archive

Change history

  • 25 November 2013

    In the version of this article initially published, the accession code for data was left out. The error has been corrected in the HTML and PDF versions of the article.

References

  1. Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing.Science320, 1344–1349 (2008).

    Article CAS  Google Scholar 

  2. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes.Nature456, 470–476 (2008).

    Article CAS  Google Scholar 

  3. Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome.Science321, 956–960 (2008).

    Article CAS  Google Scholar 

  4. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq.Nat. Methods5, 621–628 (2008).

    Article CAS  Google Scholar 

  5. Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.Nature453, 1239–1243 (2008).

    Article CAS  Google Scholar 

  6. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics.Nat. Rev. Genet.10, 57–63 (2009).

    Article CAS  Google Scholar 

  7. Djebali, S. et al. Landscape of transcription in human cells.Nature489, 101–108 (2012).

    Article CAS  Google Scholar 

  8. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules.Science323, 133–138 (2009).

    Article CAS  Google Scholar 

  9. Quail, M.A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.BMC Genomics13, 341 (2012).

    Article CAS  Google Scholar 

  10. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads.Nat. Biotechnol.30, 693–700 (2012).

    Article CAS  Google Scholar 

  11. Au, K.F., Underwood, J.G., Lee, L. & Wong, W.H. Improving PacBio long read accuracy by short read alignment.PLoS ONE7, e46679 (2012).

    Article CAS  Google Scholar 

  12. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project.Genome Res.22, 1760–1774 (2012).

    Article CAS  Google Scholar 

  13. Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection.Nucleic Acids Res.38, e159 (2010).

    Article  Google Scholar 

  14. Tilgner, H. et al. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing.G3 (Bethesda)3, 387–397 (2013).

    Article CAS  Google Scholar 

  15. Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences.Bioinformatics21, 1859–1875 (2005).

    Article CAS  Google Scholar 

  16. R Development Core Team. R: A language and environment for statistical computinghttp://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2012).

  17. van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Most “dark matter” transcripts are associated with known genes.PLoS Biol.8, e1000371 (2010).

    Article  Google Scholar 

  18. Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries.Nat. Methods10, 325–327 (2013).

    Article CAS  Google Scholar 

  19. Pickrell, J.K., Pai, A.A., Gilad, Y. & Pritchard, J.K. Noisy splicing drives mRNA isoform diversity in human cells.PLoS Genet.6, e1001236 (2010).

    Article  Google Scholar 

  20. Fagnani, M. et al. Functional coordination of alternative splicing in the mammalian central nervous system.Genome Biol.8, R108 (2007).

    Article  Google Scholar 

  21. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.Genome Res.22, 1775–1789 (2012).

    Article CAS  Google Scholar 

  22. Parra, G., Blanco, E. & Guigó, R. GeneID inDrosophila.Genome Res.10, 511–515 (2000).

    Article CAS  Google Scholar 

  23. Eyras, E., Caccamo, M., Curwen, V. & Clamp, M. ESTGenes: alternative splicing from ESTs in Ensembl.Genome Res.14, 976–987 (2004).

    Article CAS  Google Scholar 

  24. Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.Genes Dev.25, 1915–1927 (2011).

    Article CAS  Google Scholar 

  25. Gingeras, T. Missing lincs in the transcriptome.Nat. Biotechnol.27, 346–347 (2009).

    Article CAS  Google Scholar 

  26. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.Nature458, 223–227 (2009).

    Article CAS  Google Scholar 

  27. Himmelmann, L. & R Development Core Team. R: A language and environment for statistical computinghttp://cran.r-project.org/web/packages/HMM/HMM.pdf (R Foundation for Statistical Computing, Vienna, Austria, 2010).

Download references

Acknowledgements

We thank J. Eid and L. Hickey at Pacific Biosciences for providing alignment statistics of reads to reference genomes. We thank J. Kelley, C. Araya, D. Phanstiel, S. Shringarpure and M. Sikora at Stanford as well as J. Korlach at Pacific Biosciences for comments on this manuscript. We would like to also thank T. Daley and A. Smith at USC for advice on modeling library complexity. This work was supported by US National Institutes of Health grants 5P01GM099130-02, 5U54HG00699602-02 and 5U01HL107393-03, and by the US National Institues of Health training grant 5 T32 HD07149.

Author information

Author notes
  1. Donald Sharon and Hagen Tilgner: These authors contributed equally to this work.

Authors and Affiliations

  1. Department of Genetics, Stanford University, Stanford, California, USA

    Donald Sharon, Hagen Tilgner, Fabian Grubert & Michael Snyder

  2. Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA

    Donald Sharon

Authors
  1. Donald Sharon

    You can also search for this author inPubMed Google Scholar

  2. Hagen Tilgner

    You can also search for this author inPubMed Google Scholar

  3. Fabian Grubert

    You can also search for this author inPubMed Google Scholar

  4. Michael Snyder

    You can also search for this author inPubMed Google Scholar

Contributions

All authors proposed the project. D.S. devised and performed experiments and wrote the first version of the introduction. H.T. devised and performed analysis, prepared figures and wrote the first version of results and discussion. All authors discussed experiments and analysis and collaborated on the final version.

Corresponding author

Correspondence toMichael Snyder.

Ethics declarations

Competing interests

M.S. is on the scientific advisory board of Personalis and GenapSys. All other authors declare no competing interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 (PDF 904 kb)

Rights and permissions

About this article

Cite this article

Sharon, D., Tilgner, H., Grubert, F.et al. A single-molecule long-read survey of the human transcriptome.Nat Biotechnol31, 1009–1014 (2013). https://doi.org/10.1038/nbt.2705

Download citation

Access through your institution
Buy or subscribe

Advertisement

Search

Advanced search

Quick links

Nature Briefing

Sign up for theNature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox.Sign up for Nature Briefing

[8]ページ先頭

©2009-2025 Movatter.jp