Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Public Library of Science full text link Public Library of Science Free PMC article
Full text links

Actions

Share

.2016 May 4;12(5):e1004842.
doi: 10.1371/journal.pcbi.1004842. eCollection 2016 May.

Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes

Affiliations

Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes

Jerome Kelleher et al. PLoS Comput Biol..

Abstract

A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Example oriented trees.
From left-to-right, these trees are defined by the sequences 〈5, 4, 4, 5, 0〉, 〈4, 4, 4, 0〉 and 〈4, 4, 5, 5, 0〉, respectively.
Fig 2
Fig 2. The mean number of recombination events in Hudson’s algorithm over 100 replicates for varying sequence length and sample size.
In the left panel we fixn = 1000 and vary the sequence length. Shown in dots is a quadratic fitted to these data, which has a leading coefficient of 8.4 × 10−3. In the right panel we fix the sequence length at 50 megabases and vary the sample size.
Fig 3
Fig 3. Comparison of the average running time over 100 replicates for various coalescent simulators with varying sequence length and sample size.
msms [34] is the most efficient published simulator based on Hudson’s algorithm that can output genealogies.MaCS [14] is a popular SMC based simulator, andscrm [16] is the most efficient sequential simulator currently available. BothMaCS andscrm were run in SMC′ mode. Two results are shown formsprime; one outputting Newick trees and another outputting the native HDF5 based format.
Fig 4
Fig 4. Coalescence records and corresponding marginal trees.
Thex-axis represents genomic coordinates, andy-axis represents time (with the present at the top). Each line segment in the top section of the figure represents a coalescence record; e.g., the first segment corresponds to the coalescence record (2, 10, 5, (3, 4), 0.071). The lower section of the figure shows the corresponding trees in pictorial and sparse tree form. We have omitted commas and brackets from this sequence representation for compactness.
Fig 5
Fig 5. A prune and regraft not involving the root requires three records.
(i) We begin with two subtrees rooted atx andy, and we wish to prune the subtree rooted atb and graft it in the branch joininge toy. (ii) We remove the assignments (a,b) →α, (α,c) →x and (d,e) →y. After this operation, the subtreesa, …,e are disconnected from the main tree. The main trunk the tree rooted atz is unaffected, as are the subtrees belowa, …,e. (iii) We add the records (a,c) →x, (b,e) →β and (d,β) →y, completing the transition.
See this image and copyright information in PMC

Similar articles

See all similar articles

Cited by

See all "Cited by" articles

References

    1. Kingman JFC. The coalescent. Stoch Proc Appl. 1982;13(3):235–248. 10.1016/0304-4149(82)90011-4 - DOI
    1. Hudson RR. Testing the constant-rate neutral allele model with protein sequence data. Evolution. 1983;37(1):203–217. 10.2307/2408186 - DOI - PubMed
    1. Wakeley J. Coalescent theory: an introduction. Englewood, Colorado: Roberts and Company; 2008.
    1. Hudson RR. Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology. 1990;7:1–44.
    1. Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983;23:183–201. 10.1016/0040-5809(83)90013-8 - DOI - PubMed

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full text links
Public Library of Science full text link Public Library of Science Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp