Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Wiley full text link Wiley Free PMC article
Full text links

Actions

Review
.2012:2012:917540.
doi: 10.6064/2012/917540. Epub 2012 Oct 23.

Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction

Affiliations
Review

Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction

Xuhua Xia. Scientifica (Cairo).2012.

Abstract

Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
What Gibbs sampler does. The intron sequences in the top panel represent the input information to the Gibbs sampler. The bottom panel represents part of the output showing the identified motif (i.e., TAATAAC, in red) shared among the sequences. Output from DAMBE [53, 54]. The input intron sequence file (YeastAllIntron.fas) is in DAMBE installation directory in FASTA format.
Figure 2
Figure 2
The erythroid sequences [85] for illustrating the Gibbs sampler algorithm, with the 3′-end trimmed to the maximum length 50 bases to fit the page.
Figure 3
Figure 3
PWMS from random sequences follows approximately the normal distribution, based on 1000 random sequences of length 17 drawn from the pool of nucleotides with frequencies of A, C, G, and T equal to 0.3279, 0.1915, 0.2043, and 0.2763, respectively. The distribution has mean equal to 0.068884 and standard deviation equal to 0.314714254.
Figure 4
Figure 4
Extreme value distribution as specified in (17), withμ = 0.068884,σ = 0.314714254, andN = 984.
See this image and copyright information in PMC

References

    1. Ptashne M. A Genetic Switch: Gene Control and Phage Lambda. Cambridge, Mass, USA: Cell Press and Blackwell Scientific; 1986.
    1. Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research. 1984;12(1):505–519. - PMC - PubMed
    1. Stormo GD, Schneider TD, Gold L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Research. 1986;14(16):6661–6679. - PMC - PubMed
    1. Hertz GZ, Hartzell GW, Stormo GD., III Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Computer Applications in the Biosciences. 1990;6(2):81–92. - PubMed
    1. Claverie JM, Audio S. The statistical significance of nucleotide position-weight matrix matches. Computer Applications in the Biosciences. 1996;12(5):431–439. - PubMed

Publication types

LinkOut - more resources

Full text links
Wiley full text link Wiley Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp