Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature
  • Perspective
  • Published:

The status of the human gene catalogue

Naturevolume 622pages41–47 (2023)Cite this article

Subjects

Abstract

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.

This is a preview of subscription content,access via your institution

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

9,800 Yen / 30 days

cancel any time

Subscription info for Japanese customers

We have a dedicated website for our Japanese customers. Please go tonatureasia.com to subscribe to this journal.

Buy this article

  • Purchase on SpringerLink
  • Instant access to the full article PDF.

¥ 4,980

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A major challenge for gene annotation is how to capture the diversity of gene products deriving from each gene locus.
Fig. 2: Predicted and observed human gene counts over time.

Similar content being viewed by others

ArticleOpen access25 July 2024

References

  1. Understanding our Genetic Inheritance: The US Human Genome Project, The First Five Years 1991-1995 (US Department of Health and Human Services, US Department of Energy, 1990).

  2. Nurk, S. et al. The complete sequence of a human genome.Science376, 44–53 (2022).Describes the first complete gap-free assembly and annotation of a human genome, which added 140 protein-coding genes and several thousand additional non-coding genes to the human gene catalogue.

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  3. The Encode Project Consortium. An integrated encyclopedia of DNA elements in the human genome.Nature489, 57–74 (2012).

    Article ADS PubMed Central  Google Scholar 

  4. Kawaji, H., Kasukawa, T., Forrest, A., Carninci, P. & Hayashizaki, Y. The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types.Sci. Data4, 170113 (2017).

    Article CAS PubMed PubMed Central  Google Scholar 

  5. Fields, C., Adams, M. D., White, O. & Venter, J. C. How many genes in the human genome? Nat. Genet.7, 345–346 (1994).

    Article CAS PubMed  Google Scholar 

  6. Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome.Proc. Natl Acad. Sci. USA104, 19428–19433 (2007).

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  7. Carninci, P. et al. The transcriptional landscape of the mammalian genome.Science309, 1559–1563 (2005).Demonstrated that transcription is far more complex than previously thought, including large numbers of isoforms and more lncRNAs than protein-coding genes.

  8. Katayama, S. et al. Antisense transcription in the mammalian transcriptome.Science309, 1564–1566 (2005).

  9. Salzberg, S. L. Next-generation genome annotation: we still struggle to get it right.Genome Biol.20, 92 (2019).

    Article PubMed PubMed Central  Google Scholar 

  10. Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023.Nucleic Acids Res.51, D942–D949 (2023).

    Article CAS PubMed  Google Scholar 

  11. O'Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.Nucleic Acids Res.44, D733–745 (2016).

    Article CAS PubMed  Google Scholar 

  12. Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise.Genome Biol.19, 208 (2018).Presents an enhanced and comprehensive catalogue of human genes and transcripts based on very deep RNA-seq across a broad sample of human tissues.

  13. UniProt Consortium. UniProt: the universal protein knowledgebase in 2021.Nucleic Acids Res.49, D480–D489 (2021).

    Article  Google Scholar 

  14. Pockrandt, C., Steinegger, M. & Salzberg, S. L. PhyloCSF++: a fast and user-friendly implementation of PhyloCSF with annotation tools.Bioinformaticshttps://doi.org/10.1093/bioinformatics/btab756 (2021).

  15. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.Genome Res.15, 1034–1050 (2005).

    Article CAS PubMed PubMed Central  Google Scholar 

  16. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies.Genome Res.20, 110–121 (2010).

    Article CAS PubMed PubMed Central  Google Scholar 

  17. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome.Nature409, 860–921 (2001).

    Article ADS  Google Scholar 

  18. Venter, J. C. et al. The sequence of the human genome.Science291, 1304–1351 (2001).

    Article ADS CAS PubMed  Google Scholar 

  19. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome.Nature431, 931–945 (2004).

    Article ADS  Google Scholar 

  20. Pertea, M. & Salzberg, S. L. Between a chicken and a grape: estimating the number of human genes.Genome Biol.11, 206 (2010).Reviews the history of efforts to estimate the human gene count and highlights different computational methods that were used to help with the human gene annotation.

    Article PubMed PubMed Central  Google Scholar 

  21. Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes.Genome Res.19, 1316–1323 (2009).Describes a joint effort among three genome annotation centres to converge on coding regions for the annotation of the human and mouse reference genomes.

    Article CAS PubMed PubMed Central  Google Scholar 

  22. Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.Nature604, 310–315 (2022).Describes a project to create uniform transcript annotations for every protein-coding gene, therefore enhancing the precision of genomic medicine through the accurate identification of genomic variations.

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  23. Alioto, T. S. U12DB: a database of orthologous U12-type spliceosomal introns.Nucleic Acids Res.35, D110–115 (2007).

    Article CAS PubMed  Google Scholar 

  24. Mudge, J. M. et al. Standardized annotation of translated open reading frames.Nat. Biotechnol.40, 994–999 (2022).Outlines a community-led effort to produce a standardized catalogue of human ORFs identified through ribosome profiling.

    Article CAS PubMed PubMed Central  Google Scholar 

  25. The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues.Science369, 1318–1330 (2020).

  26. Troskie, R. L. et al. Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome.Genome Biol.22, 146 (2021).

    Article CAS PubMed PubMed Central  Google Scholar 

  27. Sun, M. et al. Systematic functional interrogation of human pseudogenes using CRISPRi.Genome Biol.22, 240 (2021).

    Article CAS PubMed PubMed Central  Google Scholar 

  28. Xu, J. & Zhang, J. Are human translated pseudogenes functional?Mol. Biol. Evol.33, 755–760 (2016).

    Article CAS PubMed  Google Scholar 

  29. Ramilowski, J. A. et al. Functional annotation of human long noncoding RNAs via molecular phenotyping.Genome Res.30, 1060–1072 (2020).

    Article CAS PubMed PubMed Central  Google Scholar 

  30. Cech, T. R. & Steitz, J. A. The noncoding RNA revolution—trashing old rules to forge new ones.Cell157, 77–94 (2014).

    Article CAS PubMed  Google Scholar 

  31. Mattick, J. S. et al. Long non-coding RNAs: definitions, functions, challenges and recommendations.Nat. Rev. Mol. Cell Biol.https://doi.org/10.1038/s41580-022-00566-8 (2023).

  32. Michelini, F. et al. Damage-induced lncRNAs control the DNA damage response through interaction with DDRNAs at individual double-strand breaks.Nat. Cell Biol.19, 1400–1411 (2017).

    Article CAS PubMed PubMed Central  Google Scholar 

  33. Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.Nat. Genet.49, 1731–1740 (2017).Describes a large-scale application of capturing rare RNA species with antisense probes and sequencing them with long-read technology, which revealed a large number of isoforms that were not otherwise detectable.

    Article CAS PubMed PubMed Central  Google Scholar 

  34. Uszczynska-Ratajczak, B., Lagarde, J., Frankish, A., Guigó, R. & Johnson, R. Towards a complete map of the human long non-coding RNA transcriptome.Nat. Rev. Genet.19, 535–548 (2018).

    Article CAS PubMed PubMed Central  Google Scholar 

  35. The RNAcentral Consortium. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases.Nucleic Acids Res.49, D212–220 (2021).

    Article  Google Scholar 

  36. Liu, Y. et al. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial CITE-seq.Nat. Biotechnol.https://doi.org/10.1038/s41587-023-01676-0 (2023).

  37. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.Genome Res.22, 1775–1789 (2012).

    Article CAS PubMed PubMed Central  Google Scholar 

  38. Stokes, T. et al. Transcriptomics for clinical and experimental biology research: hang on a seq.Adv. Genet.4, 2200024 (2023).

  39. Deveson, I. W. et al. Universal alternative splicing of noncoding exons.Cell Syst.6, 245–255 (2018).Describes widespread alternative splicing in non-coding exons, suggesting that non-coding exons are functionally modular and produce a seemingly limitless variety of isoforms.

    Article CAS PubMed  Google Scholar 

  40. Mudge, J. M. et al. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.Genome Res.29, 2073–2087 (2019).

    Article CAS PubMed PubMed Central  Google Scholar 

  41. Lewandowski, J. P. et al. TheTug1 lncRNA locus is essential for male fertility.Genome Biol.21, 237 (2020).

    Article CAS PubMed PubMed Central  Google Scholar 

  42. Broadwell, L. J. et al. Myosin 7b is a regulatory long noncoding RNA (lncMYH7b) in the human heart.J. Biol. Chem.296, 100694 (2021).

    Article CAS PubMed PubMed Central  Google Scholar 

  43. He, Y. et al. Transcriptional-readthrough RNAs reflect the phenomenon of “a gene contains gene(s)” or “gene(s) within a gene” in the human genome, and thus are not chimeric RNAs.Genes9, 40 (2018).

  44. Wang, Y. et al. Identification of the cross-strand chimeric RNAs generated by fusions of bi-directional transcripts.Nat. Commun.12, 4645 (2021).

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  45. de Hoon, M., Shin, J. W. & Carninci, P. Paradigm shifts in genomics through the FANTOM projects.Mamm. Genome26, 391–402 (2015).

    Article PubMed PubMed Central  Google Scholar 

  46. Yip, C. W. et al. Antisense-oligonucleotide-mediated perturbation of long non-coding RNA reveals functional features in stem cells and across cell types.Cell Rep.41, 111893 (2022).

    Article CAS PubMed  Google Scholar 

  47. Seal, R. L. et al. A guide to naming human non-coding RNA genes.EMBO J.39, e103777 (2020).

    Article CAS PubMed PubMed Central  Google Scholar 

  48. Amberger, J. S., Bocchini, C. A., Scott, A. F. & Hamosh, A. OMIM.org: leveraging knowledge across phenotype-gene relationships.Nucleic Acids Res.47, D1038–D1043 (2019).

    Article CAS PubMed  Google Scholar 

  49. Cline, M. S. et al. BRCA challenge: BRCA exchange as a global resource for variants inBRCA1 andBRCA2.PLoS Genet.14, e1007752 (2018).

    Article PubMed PubMed Central  Google Scholar 

  50. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.Nucleic Acids Res.38, e164 (2010).

    Article PubMed PubMed Central  Google Scholar 

  51. Hunt, S. E. et al. Annotating and prioritizing genomic variants using the Ensembl Variant Effect Predictor-A tutorial.Hum. Mutat.43, 986–997 (2022).

    Article PubMed  Google Scholar 

  52. Schoch, K. et al. Alternative transcripts in variant interpretation: the potential for missed diagnoses and misdiagnoses.Genet. Med.22, 1269–1275 (2020).A potent example of the considerable impact that precise gene model annotation has on genetic diagnostics, demonstrating how inaccuracies can yield false negatives or positives and potentially compromising the diagnosis of rare disease patients.

    Article CAS PubMed PubMed Central  Google Scholar 

  53. Steward, C. A. et al. Re-annotation of 191 developmental and epileptic encephalopathy-associated genes unmasks de novo variants inSCN1A.NPJ Genom. Med.4, 31 (2019).

    Article PubMed PubMed Central  Google Scholar 

  54. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA.Science337, 1190–1195 (2012).

  55. Bartonicek, N. et al. Intergenic disease-associated regions are abundant in novel transcripts.Genome Biol.18, 241 (2017).

    Article CAS PubMed PubMed Central  Google Scholar 

  56. Aznaourova, M., Schmerer, N., Schmeck, B. & Schulte, L. N. Disease-causing mutations and rearrangements in long non-coding RNA gene loci.Front. Genet.11, 527484 (2020).

    Article CAS PubMed PubMed Central  Google Scholar 

  57. den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update.Hum. Mutat.37, 564–569 (2016).

    Article  Google Scholar 

  58. Shumate, A. et al. Assembly and annotation of an Ashkenazi human reference genome.Genome Biol.21, 129 (2020).

    Article CAS PubMed PubMed Central  Google Scholar 

  59. Zimin, A. V. et al. A reference-quality, fully annotated genome from a Puerto Rican individual.Genetics220, iyab227 (2022).

  60. Chao, K. H., Zimin, A. V., Pertea, M. & Salzberg, S. L. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual.G3: Genes, Genomes, Genetic0s13,jkac321 (2023).

  61. Liao, W. W. et al. A draft human pangenome reference.Nature617, 312–324 (2023).

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  62. The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas.Nature507, 462–470 (2014).

    Article ADS  Google Scholar 

  63. Gonzàlez-Porta, M., Frankish, A., Rung, J., Harrow, J. & Brazma, A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.Genome Biol.14, R70 (2013).

    Article PubMed PubMed Central  Google Scholar 

  64. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.Nature420, 563–573 (2002).

    Article ADS PubMed  Google Scholar 

  65. Babarinde, I. A. & Hutchins, A. P. The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome.BMC Genom.23, 487 (2022).

    Article CAS  Google Scholar 

  66. Weatheritt, R. J., Sterne-Weiler, T. & Blencowe, B. J. The ribosome-engaged landscape of alternative splicing.Nat. Struct. Mol. Biol.23, 1117–1123 (2016).

    Article CAS PubMed PubMed Central  Google Scholar 

  67. van Heesch, S. et al. The translational landscape of the human heart.Cell178, 242–260 (2019).Shows that combining ribosome profiling with deep proteomic analysis can detect peptide products translated from a large number of 5′-UTRs and annotated lncRNAs.

    Article PubMed  Google Scholar 

  68. Duffy, E. E. et al. Developmental dynamics of RNA translation in the human brain.Nat. Neurosci.25, 1353–1365 (2022).

    Article CAS PubMed PubMed Central  Google Scholar 

  69. Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome.Nat. Methods16, 1297–1305 (2019).

    Article CAS PubMed PubMed Central  Google Scholar 

  70. Mulroney, L. et al. Identification of high-confidence human poly(A) RNA isoform scaffolds using nanopore sequencing.RNA28, 162–176 (2022).

    Article CAS PubMed PubMed Central  Google Scholar 

  71. Grapotte, M. et al. Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.Nat. Commun.12, 3297 (2021).

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  72. Sinitcyn, P. et al. Global detection of human variants and isoforms by deep proteome sequencing.Nat. Biotechnol.https://doi.org/10.1038/s41587-023-01714-x (2023).Establishes a valuable resource for the identification of isoforms at the proteome level, and provides direct evidence that most frame-preserving alternatively spliced isoforms are translated.

  73. Glinos, D. A. et al. Transcriptome variation in human tissues revealed by long-read sequencing.Nature608, 353–359 (2022).

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  74. Mercer, T. R. et al. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq.Nat. Protoc.9, 989–1009 (2014).

    Article CAS PubMed  Google Scholar 

  75. Curion, F. et al. Targeted RNA sequencing enhances gene expression profiling of ultra-low input samples.RNA Biol.17, 1741–1753 (2020).

    Article CAS PubMed PubMed Central  Google Scholar 

  76. Zhao, L. et al. NONCODEV6: an updated database dedicated to long non-coding RNA annotation in both animals and plants.Nucleic Acids Res.49, D165–D171 (2021).

    Article CAS PubMed  Google Scholar 

  77. Hon, C.-C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends.Nature543, 199–204 (2017).

    Article ADS CAS PubMed PubMed Central  Google Scholar 

  78. Volders, P.-J. et al. LNCipedia 5: towards a reference set of human long non-coding RNAs.Nucleic Acids Res.47, D135–139 (2019).

    Article CAS PubMed  Google Scholar 

  79. Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome.Nat. Genet.47, 199–208 (2015).

    Article CAS PubMed PubMed Central  Google Scholar 

  80. Ma, L. et al. LncBook: a curated knowledgebase of human long non-coding RNAs.Nucleic Acids Res.47, 2699–2699 (2019).

    Article CAS PubMed PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the staff at the Banbury Center at Cold Spring Harbor Laboratory and the Cold Spring Harbor Laboratory Corporate Sponsor Program for supporting a workshop that all authors of this work attended. This work was supported in part by the US National Institutes of Health (NIH) under grants R01-HG006677 (to M.P., S.L.S. and A.V.), R01-MH123567 (to M.P. and S.L.S.), R35-GM130151 (to S.L.S.), U41-HG007234 (to A.F.) and U24-HG007234 (to R.G. and S.C.-S.); the Wellcome Trust under grant WT222155/Z/20/Z (to A.F.); the European Molecular Biology Laboratory (to A.F.); the US National Science Foundation under grant DBI-1759518 (to M.P.); the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under grant T2EDK-00391 (to A.G.H.); Science Foundation Ireland through Future Research Leaders award 18/FRL/6194 and the Irish Research Council through Consolidator Laureate award (IRCLA/2022/2500; to R.J.); the National Center for Biotechnology Information of the National Library of Medicine, NIH (to T.D.M., K.D.P. and S.P.); the National Health and Medical Research Council (NHMRC) APP1186371 (to C.A.W.); the Center for Genomic Medicine at the University of Utah Health, and the H.A. & Edna Benning Foundation (to M.Y.); the Spanish Ministry of Science and Innovation to the EMBL partnership, Centro de Excelencia Severo Ochoa and CERCA Programme/Generalitat de Catalunya (to R.G. and S.C.-S.); the RIKEN Center for Integrative Medical Sciences (to P.C. and H.T.); and Human Technopole (to P.C.).

Author information

Authors and Affiliations

  1. INSPER Institute of Education and Research, Sao Paulo, Brazil

    Paulo Amaral

  2. Centre for Genomic Regulation (CRG), Barcelona, Spain

    Silvia Carbonell-Sala & Roderic Guigo

  3. Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA

    Francisco M. De La Vega

  4. Tempus Labs, Chicago, IL, USA

    Francisco M. De La Vega

  5. Nature Genetics, San Francisco, CA, USA

    Tiago Faial

  6. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK

    Adam Frankish

  7. Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA

    Thomas Gingeras

  8. Universitat Pompeu Fabra (UPF), Barcelona, Spain

    Roderic Guigo

  9. Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK

    Jennifer L. Harrow

  10. Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece

    Artemis G. Hatzigeorgiou

  11. Hellenic Pasteur Institute, Athens, Greece

    Artemis G. Hatzigeorgiou

  12. School of Biology and Environmental Science, University College Dublin, Dublin, Ireland

    Rory Johnson

  13. Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland

    Rory Johnson

  14. Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland

    Rory Johnson

  15. Department for BioMedical Research, University of Bern, Bern, Switzerland

    Rory Johnson

  16. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA

    Terence D. Murphy, Kim D. Pruitt & Shashikant Pujar

  17. Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA

    Mihaela Pertea, Ales Varabyou & Steven L. Salzberg

  18. Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA

    Mihaela Pertea & Steven L. Salzberg

  19. Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA

    Mihaela Pertea, Ales Varabyou & Steven L. Salzberg

  20. Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan

    Hazuki Takahashi & Piero Carninci

  21. Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel

    Igor Ulitsky

  22. Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel

    Igor Ulitsky

  23. Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia

    Christine A. Wells

  24. Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA

    Mark Yandell

  25. Human Technopole, Milan, Italy

    Piero Carninci

  26. Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA

    Steven L. Salzberg

Authors
  1. Paulo Amaral
  2. Silvia Carbonell-Sala
  3. Francisco M. De La Vega
  4. Tiago Faial
  5. Adam Frankish
  6. Thomas Gingeras
  7. Roderic Guigo
  8. Jennifer L. Harrow
  9. Artemis G. Hatzigeorgiou
  10. Rory Johnson
  11. Terence D. Murphy
  12. Mihaela Pertea
  13. Kim D. Pruitt
  14. Shashikant Pujar
  15. Hazuki Takahashi
  16. Igor Ulitsky
  17. Ales Varabyou
  18. Christine A. Wells
  19. Mark Yandell
  20. Piero Carninci
  21. Steven L. Salzberg

Contributions

P.A., S.C.-S., F.M.D.L.V., T.F., A.F., T.G., R.G., J.L.H., A.G.H., R.J., T.D.M., M.P., K.D.P., S.P., H.T., I.U., A.V., C.A.W., M.Y., P.C. and S.L.S. participated in discussions at a Banbury Conference at Cold Spring Harbor Laboratory, providing the source material for this paper. All authors contributed to writing, editing and reviewing the paper.

Corresponding authors

Correspondence toPiero Carninci orSteven L. Salzberg.

Ethics declarations

Competing interests

T.F. is chief editor atNature Genetics, a Nature Portfolio journal. T.F. was not involved in the editorial handling of this Nature paper (journals within the Nature Portfolio are editorially independent). The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Tuuli Lappalainen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amaral, P., Carbonell-Sala, S., De La Vega, F.M.et al. The status of the human gene catalogue.Nature622, 41–47 (2023). https://doi.org/10.1038/s41586-023-06490-x

Download citation

This article is cited by

Comments

Commenting on this article is now closed.

  1. Laurence A. Moran

    The best definition of a molecular gene is a DNA sequence that's transcribed to produce a functional product. (There are exceptions.) I emphasize "function," as do the authors. A DNA sequence that's merely transcribed isn't a gene, by definition, until the product has been identified with a biological function. We know that a large number of long transcripts are very likely to be spurious non-functional transcripts transcribed from junk DNA.

    The problem with this paper is that the authors identify somewhere between 18,000 and 96,000 lncRNA "genes" when they must know that only a tiny subset of these transcripts have a known function and very few are conserved. (Conservations is a reasonable proxy for function.) The authors should have made a much more serious attempt to identify the small number of transcripts that are known to be functional instead of contributing to the myth that there are more noncoding genes than protein-coding genes.

  2. Laurence A. Moran

    Many very well known experts were predicting about 30,000 genes back in 1970. The final number may turn out to be about 25,000 so those experts were fairly accurate.

    It's time to give those experts the credit they deserve instead of emphasizing Gilbert's back-of-the-envelope estimate of 100,000 that wasn't based on evidence - indeed, it ignored all the evidence that was available at the time.

Access through your institution
Buy or subscribe

Advertisement

Search

Advanced search

Quick links

Nature Briefing: Translational Research

Sign up for theNature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly.Sign up for Nature Briefing: Translational Research

[8]ページ先頭

©2009-2026 Movatter.jp