Ab initio identification of putative human transcription factor binding sites by comparative genomics
- PMID:15865625
- PMCID: PMC1097714
- DOI: 10.1186/1471-2105-6-110
Ab initio identification of putative human transcription factor binding sites by comparative genomics
Abstract
Background: Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome.
Results: By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation. In this way we find various known motifs and also some new candidate binding sites.
Conclusion: We have discussed a new integrated algorithm for the "ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results.
Figures
Similar articles
- Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data.Kim SY, Kim Y.Kim SY, et al.BMC Bioinformatics. 2006 Jul 4;7:330. doi: 10.1186/1471-2105-7-330.BMC Bioinformatics. 2006.PMID:16817975Free PMC article.
- Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs.Corà D, Di Cunto F, Provero P, Silengo L, Caselle M.Corà D, et al.BMC Bioinformatics. 2004 May 11;5:57. doi: 10.1186/1471-2105-5-57.BMC Bioinformatics. 2004.PMID:15137914Free PMC article.
- STOP: searching for transcription factor motifs using gene expression.Hertzberg L, Izraeli S, Domany E.Hertzberg L, et al.Bioinformatics. 2007 Jul 15;23(14):1737-43. doi: 10.1093/bioinformatics/btm249. Epub 2007 May 8.Bioinformatics. 2007.PMID:17488754
- Computational biology: toward deciphering gene regulatory information in mammalian genomes.Ji H, Wong WH.Ji H, et al.Biometrics. 2006 Sep;62(3):645-63. doi: 10.1111/j.1541-0420.2006.00625.x.Biometrics. 2006.PMID:16984301Review.
- Cluster analysis and promoter modelling as bioinformatics tools for the identification of target genes from expression array data.Werner T.Werner T.Pharmacogenomics. 2001 Feb;2(1):25-36. doi: 10.1517/14622416.2.1.25.Pharmacogenomics. 2001.PMID:11258194Review.
Cited by
- Identification of functional TFAP2A and SP1 binding sites in new TFAP2A-modulated genes.Orso F, Corà D, Ubezio B, Provero P, Caselle M, Taverna D.Orso F, et al.BMC Genomics. 2010 Jun 3;11:355. doi: 10.1186/1471-2164-11-355.BMC Genomics. 2010.PMID:20525283Free PMC article.
- A new computational approach to analyze human protein complexes and predict novel protein interactions.Zanivan S, Cascone I, Peyron C, Molineris I, Marchio S, Caselle M, Bussolino F.Zanivan S, et al.Genome Biol. 2007;8(12):R256. doi: 10.1186/gb-2007-8-12-r256.Genome Biol. 2007.PMID:18053208Free PMC article.
- Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.Gruel J, LeBorgne M, LeMeur N, Théret N.Gruel J, et al.BMC Bioinformatics. 2011 Sep 12;12:365. doi: 10.1186/1471-2105-12-365.BMC Bioinformatics. 2011.PMID:21910886Free PMC article.
- Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions.Corà D, Di Cunto F, Caselle M, Provero P.Corà D, et al.BMC Bioinformatics. 2007 May 24;8:174. doi: 10.1186/1471-2105-8-174.BMC Bioinformatics. 2007.PMID:17524134Free PMC article.
- Comparative genomics and experimental promoter analysis reveal functional liver-specific elements in mammalian hepatic lipase genes.van Deursen D, Botma GJ, Jansen H, Verhoeven AJ.van Deursen D, et al.BMC Genomics. 2007 Apr 11;8:99. doi: 10.1186/1471-2164-8-99.BMC Genomics. 2007.PMID:17428321Free PMC article.
References
Publication types
MeSH terms
Substances
Related information
LinkOut - more resources
Full Text Sources
Miscellaneous