- Analysis
- Published:
Promoter prediction analysis on the whole human genome
Nature Biotechnologyvolume 22, pages1467–1473 (2004)Cite this article
2016Accesses
118Citations
Abstract
Promoter prediction programs (PPPs) are important forin silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.
This is a preview of subscription content,access via your institution
Access options
Subscription info for Japanese customers
We have a dedicated website for our Japanese customers. Please go tonatureasia.com to subscribe to this journal.
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lander, E.S. et al. Initial sequencing and analysis of the human genome.Nature409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome.Science291, 1304–1351 (2001).
Weinzierl, R.O.J.Mechanisms of Gene Expression: Structure, Function, and Evolution of the Basal Transcriptional Machinery (Imperial College Press, London, 1999).
Pedersen, A.G., Baldi, P., Chauvin, Y. & Brunak, S. The biology of eukaryotic promoter prediction—a review.Comput. Chem.23, 191–207 (1999).
Bajic, V.B. et al. Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates.J. Mol. Graph. Model.21, 323–332 (2003).
Bajic, V.B. & Seah, S.H. Dragon Gene Start Finder identifies approximate locations of the 5′ ends of genes.Nucleic Acids Res.31, 3560–3563 (2003).
Bajic, V.B. & Seah, S.H. Dragon Gene Start Finder: an advanced system for finding approximate locations of the start of gene transcriptional units.Genome Res.13, 1923–1929 (2003).
Davuluri, R.V., Grosse, I. & Zhang, M.Q. Computational identification of promoters and first exons in the human genome.Nat. Genet.29, 412–417 (2001).
Down, T.A. & Hubbard, T.J. Computational detection and location of transcription start sites in mammalian genomic DNA.Genome Res.12, 458–461 (2002).
Reese, M.G. Application of a time-delay neural network to promoter annotation in theDrosophila melanogaster genome.Comput. Chem.26, 51–56 (2001).
Knudsen, S. Promoter2.0: for the recognition of PolII promoter sequences.Bioinformatics15, 356–361 (1999).
Ohler, U., Liao, G.C., Niemann, H. & Rubin, G.M. Computational analysis of core promoters in theDrosophila genome.Genome Biol.3(12), RESEARCH0087. Epub 2002 Dec 20 (2002).
Ohler, U., Stemmer, G., Harbeck, S. & Niemann, H. Stochastic segment models of eukaryotic promoter regions.Proc. Pac. Symp. Biocomput.5, 380–391 (2000).
Ponger, L. & Mouchiroud, D. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences.Bioinformatics18, 631–633 (2002).
Hannenhalli, S. & Levy, S. Promoter prediction in the human genome.Bioinformatics17, S90–S96 (2001).
Ioshikhes, I.P. & Zhang, M.Q. Large-scale human promoter mapping using CpG islands.Nat. Genet.26, 61–63 (2000).
Scherf, M., Klingenhoff, A. & Werner, T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach.J. Mol. Biol.297, 599–606 (2000).
Solovyev, V.V. & Shahmuradov, I.A. PromH: Promoters identification using orthologous genomic sequences.Nucleic Acids Res.31, 3540–3545 (2003).
Fickett, J.W. & Hatzigeorgiou, A.G. Eukaryotic promoter recognition.Genome Res.7, 861–878 (1997).
Prestridge, D.S. Computer software for eukaryotic promoter analysis.Methods Mol. Biol.130, 265–295 (2000).
Bajic, V.B. Comparing the success of different prediction software in sequence analysis: a review.Brief. Bioinform.1, 214–228 (2000).
Liu, R. & States, D.J. Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling.Genome Res.12, 462–469 (2002).
Suzuki, Y. et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites.EMBO Rep.2, 388–393 (2001).
Scherf, M. et al. First pass annotation of promoters on human chromosome 22.Genome Res.11, 333–340 (2001).
Suzuki, Y. et al. DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs.Nucleic Acids Res.30, 328–331 (2002).
Ripley, B.D.Pattern Recognition and Neural Networks (Cambridge University Press, Cambridge, UK, 1996).
Murakami, K. & Takagi, T. Gene recognition by combination of several gene-finding programs.Bioinformatics14, 665–675 (1998).
Rogic, S., Ouellette, B.F. & Mackworth, A.K. Improving gene recognition accuracy by combining predictions from two gene-finding programs.Bioinformatics18, 1034–1045 (2002).
Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles.Nucleic Acids Res.31(1), 374–8 (2003).
Acknowledgements
We are grateful to Riu Yamashita and Kenta Nakai for assisting in constructing and maintaining DBTSS.
Author information
Authors and Affiliations
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Vladimir B Bajic & Sin Lam Tan
Human Genome Center, University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo, 108-8639, Japan
Yutaka Suzuki & Sumio Sugano
- Vladimir B Bajic
You can also search for this author inPubMed Google Scholar
- Sin Lam Tan
You can also search for this author inPubMed Google Scholar
- Yutaka Suzuki
You can also search for this author inPubMed Google Scholar
- Sumio Sugano
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toVladimir B Bajic.
Ethics declarations
Competing interests
The employer of Vladimir B. Bajic and Sin Lam Tan has licensed Dragon Promoter Finder and Dragon Gene Start Finder to Biobase, Germany. Vladimir B. Bajic receives royalty for these two programs.
Supplementary information
Supplementary Figure 1
Distribution of clustered predictions for seven analyzed PPPs.
Supplementary Table 1
Results of promoter prediction on human chromosomes 21 and 22.
Supplementary Table 2
Results of promoter prediction on human chromosomes 4, 21 and 22.
Supplementary Table 3
Results of promoter prediction on HG for different distance criteria.
Rights and permissions
About this article
Cite this article
Bajic, V., Tan, S., Suzuki, Y.et al. Promoter prediction analysis on the whole human genome.Nat Biotechnol22, 1467–1473 (2004). https://doi.org/10.1038/nbt1032
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
Nonlinear physics opens a new paradigm for accurate transcription start site prediction
- José Antonio Barbero-Aparicio
- Santiago Cuesta-Lopez
- Nicolás García-Pedrajas
BMC Bioinformatics (2022)
Sequence-based evaluation of promoter context for prediction of transcription start sites in Arabidopsis and rice
- Tosei Hiratsuka
- Yuko Makita
- Yoshiharu Y. Yamamoto
Scientific Reports (2022)
Natural selection in a population of Drosophila melanogaster explained by changes in gene expression caused by sequence variation in core promoter regions
- Mitsuhiko P. Sato
- Takashi Makino
- Masakado Kawata
BMC Evolutionary Biology (2016)
Small nucleolar RNA 113–1 suppresses tumorigenesis in hepatocellular carcinoma
- Gang Xu
- Fang Yang
- Zhong-Tian Qi
Molecular Cancer (2014)
GPMiner: an integrated system for mining combinatorial cis-regulatory elements in mammalian gene group
- Tzong-Yi Lee
- Wen-Chi Chang
- Dray-Ming Shien
BMC Genomics (2012)