Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Silverchair Information Systems full text link Silverchair Information Systems Free PMC article
Full text links

Actions

Share

.2013 May;30(5):1196-205.
doi: 10.1093/molbev/mst030. Epub 2013 Feb 18.

FUBAR: a fast, unconstrained bayesian approximation for inferring selection

Affiliations

FUBAR: a fast, unconstrained bayesian approximation for inferring selection

Ben Murrell et al. Mol Biol Evol.2013 May.

Abstract

Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection--an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/).

PubMed Disclaimer

Figures

<b>F<sc>ig</sc>. 1</b>.
Fig. 1.
The synonymous and nonsynonymous rates (formula image) are continuous model parameters that vary from one site to another, illustrated by a hypothetical distribution in (A). Typical random effects models, as exemplified by Dual REL (Kosakovsky Pond and Muse 2005) in (B) use a small number of discrete categories to approximate this continuous distribution, allowing the location (represented by the green bars) and the probability mass of the discrete points to vary; a change in the location of a point necessitates a re-evaluation of the phylogenetic likelihood function. FUBAR, in (C), uses a much denser grid of values chosen a priori, relying on the grid density to circumvent the need to move the parameter locations, and on MCMC to sample the weights assigned to each point. Without the need for movable grid lines, FUBAR needs to compute the conditional likelihood associated with each point only once, eliminating the bottleneck hindering traditional random effects models. Note that the uniform grid spacing depicted here is stylized. As the uncertainty in the selection parameters grows with their magnitude, FUBAR uses larger spacing for larger values (see text for details).
F<sc>ig</sc>. 2.
Fig. 2.
Execution times for FEL and FUBAR as a function of the number of codon sites (top) and number of taxa (bottom).
F<sc>ig</sc>. 3.
Fig. 3.
Site-specific inference under misspecified models. (Top) 100 log-likelihood curves as functions offormula image for a set of simulated sites (see text for description). Vertical lines indicate the valueformula image under which the sites were simulated, along with the values for the neutral and positive selection site categories (formula image andformula image, respectively) used by the M2a model in PAML. The value of the positive selection site category does not match that under which the sites were simulated, due to the presence of other sites under stronger positive selection. The only evidence considered by M2a when classifying a site into the neutral or positive selection category is the value of the likelihood function atformula image and the value atformula image. With the peaks of the likelihood functions between these options, the model becomes overconfident, assigning strong evidence either for positive selection (exemplified by the blue curve) or against it (exemplified by the red curve), even when this conclusion is incorrect. (Bottom) Histograms of posterior site-specific probabilities of positive selection calculated for sites simulated under a trueformula image. M2a (left) confidently identifies positive selection in nearly half of these sites, but also incorrectly declares strong evidence against positive selection in half. FUBAR (right) detects most of the sites, and does not claim strong evidence for incorrect conclusions.
F<sc>ig</sc>. 4.
Fig. 4.
Influenza hemagglutinin analysis. (Top) The H3 phylogeny with 3,142 coding sequences. (Middle) The smoothed histogram offormula image across H3, with the greatest density at mild purifying selectionformula image, and fewer sites under positive selectionformula image. The notches depict sites with posteriors greater than 0.9 for positive (red) or purifying (blue) selection. (Bottom) The inferredformula image values mapped to the HA protein (PDB 3ZTJ; Corti et al. 2011), displayed from two viewpoints. Red regions with stronger diversifying selection are likely involved in immune escape. These primarily occur on the “head” of the protein, with mostly purifying selection on the membrane proximal stem. See text for further detail.
See this image and copyright information in PMC

Similar articles

See all similar articles

Cited by

See all "Cited by" articles

References

    1. Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009;26:255–271. - PubMed
    1. Bush RM, Fitch WM, Bender CA, Cox NJ. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol Biol Evol. 1999;16:1457–1465. - PubMed
    1. Cadar D, Cságola A, Kiss T, Tuboly T. Capsid protein evolution and comparative phylogeny of novel porcine parvoviruses. Mol Phylogenet Evol. 2013;66:243–253. - PubMed
    1. Caton AJ, Brownlee GG, Yewdell JW, Gerhard W. The antigenic structure of the influenza virus A/PR/8/34 hemagglutinin (H1 subtype) Cell. 1982;31:417–427. - PubMed
    1. Corti D, Voss J, Gamblin SJ, et al. (23 co-authors) A neutralizing antibody selected from plasma cells that binds to group 1 and group 2 influenza A hemagglutinins. Science. 2011;333:850–856. - PubMed

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full text links
Silverchair Information Systems full text link Silverchair Information Systems Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp