Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Coalescent theory

From Wikipedia, the free encyclopedia
Model for tracing the history of genetic variation

Coalescent theory is amodel of howalleles sampled from apopulation may have originated from acommon ancestor. In the simplest case, coalescenttheory assumes norecombination, nonatural selection, and nogene flow orpopulation structure, meaning that each variant is equally likely to have been passed from one generation to the next. The model looks backward in time, merging alleles into a single ancestral copy according to a random process in coalescence events. Under this model, the expected time between successive coalescence events increases almostexponentially back in time (with widevariance). Variance in the model comes from both the random passing of alleles from one generation to the next, and the random occurrence ofmutations in these alleles.

The mathematical theory of the coalescent was developed independently by several groups in the early 1980s as a natural extension of classicalpopulation genetics theory and models,[1][2][3][4] but can be primarily attributed toJohn Kingman.[5] Advances in coalescent theory include recombination, selection, overlapping generations and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis.

The model can be used to produce many theoretical genealogies, and then compare observed data to these simulations to test assumptions about the demographic history of a population. Coalescent theory can be used to make inferences about population genetic parameters, such as migration, population size and recombination.

Theory

[edit]

Time to coalescence

[edit]

Consider a single gene locus sampled from twohaploid individuals in a population. The ancestry of this sample is traced backwards in time to the point where these two lineagescoalesce in theirmost recent common ancestor (MRCA). Coalescent theory seeks to estimate the expectation of this time period and its variance.

The probability that twolineages coalesce in the immediately preceding generation is the probability that they share a parentalDNA sequence. In a population with a constanteffective population size with 2Ne copies of each locus, there are 2Ne "potential parents" in the previous generation. Under arandom mating model, the probability that twoalleles originate from the same parental copy is thus 1/(2Ne) and, correspondingly, the probability that they donot coalesce is 1 − 1/(2Ne).

At each successive preceding generation, the probability of coalescence isgeometrically distributed—that is, it is the probability ofnoncoalescence at thet − 1 preceding generations multiplied by the probability of coalescence at the generation of interest:

Pc(t)=(112Ne)t1(12Ne).{\displaystyle P_{c}(t)=\left(1-{\frac {1}{2N_{e}}}\right)^{t-1}\left({\frac {1}{2N_{e}}}\right).}

For sufficiently large values ofNe, this distribution is well approximated by the continuously definedexponential distribution

Pc(t)=12Neet12Ne.{\displaystyle P_{c}(t)={\frac {1}{2N_{e}}}e^{-{\frac {t-1}{2N_{e}}}}.}

This is mathematically convenient, as the standard exponential distribution has both theexpected value and thestandard deviation equal to 2Ne. Therefore, although theexpected time to coalescence is 2Ne, actual coalescence times have a wide range of variation. Note that coalescent time is the number of preceding generations where the coalescence took place and not calendar time, though an estimation of the latter can be made multiplying 2Ne with the average time between generations. The above calculations apply equally to adiploid population of effective sizeNe (in other words, for a non-recombining segment of DNA, eachchromosome can be treated as equivalent to an independenthaploid individual; in the absence of inbreeding, sister chromosomes in a single individual are no more closely related than two chromosomes randomly sampled from the population). Some effectively haploid DNA elements, such asmitochondrial DNA, however, are only passed on by one sex, and therefore have one quarter the effective size of the equivalent diploid population (Ne/2)

The mathematical object one formally obtains by lettingNe go to infinity is known as theKingman coalescent.[1]

Neutral variation

[edit]

Coalescent theory can also be used to model the amount of variation inDNA sequences expected from genetic drift and mutation. This value is termed the meanheterozygosity, represented asH¯{\displaystyle {\bar {H}}}. Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages:2μ{\displaystyle 2\mu }. Thus the mean heterozygosity is equal to

H¯=2μ2μ+12Ne=4Neμ1+4Neμ=θ1+θ{\displaystyle {\begin{aligned}{\bar {H}}&={\frac {2\mu }{2\mu +{\frac {1}{2N_{e}}}}}\\[6pt]&={\frac {4N_{e}\mu }{1+4N_{e}\mu }}\\[6pt]&={\frac {\theta }{1+\theta }}\end{aligned}}}

For4Neμ1{\displaystyle 4N_{e}\mu \gg 1}, the vast majority of allele pairs have at least one difference innucleotide sequence.

Extensions

[edit]

There are numerous extensions to the coalescent model, such as the Λ-coalescent which allows for the possibility of multifurcations.[6]

Graphical representation

[edit]

Coalescents can be visualised usingdendrograms which show the relationship of branches of the population to each other. The point where two branches meet indicates a coalescent event.

Applications

[edit]

Disease gene mapping

[edit]

The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation; although the application of the theory is still in its infancy, there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory.[7][8][9]

A considerable number of human diseases can be attributed to genetics, from simpleMendelian diseases likesickle-cell anemia andcystic fibrosis, to more complicated maladies like cancers and mental illnesses. The latter are polygenic diseases, controlled by multiple genes that may occur on different chromosomes, but diseases that are precipitated by a single abnormality are relatively simple to pinpoint and trace – although not so simple that this has been achieved for all diseases. It is immensely useful in understanding these diseases and their processes to know where they are located onchromosomes, and how they have been inherited through generations of a family, as can be accomplished through coalescent analysis.[2]

Genetic diseases are passed from one generation to another just like other genes. While any gene may be shuffled from one chromosome to another duringhomologous recombination, it is unlikely that one gene alone will be shifted. Thus, other genes that are close enough to the disease gene to belinked to it can be used to trace it.[2]

Polygenic diseases have a genetic basis even though they don't follow Mendelian inheritance models, and these may have relatively high occurrence in populations, and have severe health effects. Such diseases may have incompletepenetrance, and tend to bepolygenic, complicating their study. These traits may arise due to many small mutations, which together have a severe and deleterious effect on the health of the individual.[3]

Linkage mapping methods, including Coalescent theory can be put to work on these diseases, since they use family pedigrees to figure out which markers accompany a disease, and how it is inherited. At the very least, this method helps narrow down the portion, or portions, of the genome on which the deleterious mutations may occur. Complications in these approaches includeepistatic effects, the polygenic nature of the mutations, and environmental factors. That said, genes whose effects are additive carry a fixed risk of developing the disease, and when they exist in a disease genotype, they can be used to predict risk and map the gene.[3] Both the regular coalescent and the shattered coalescent (which allows that multiple mutations may have occurred in the founding event, and that the disease may occasionally be triggered by environmental factors) have been put to work in understanding disease genes.[2]

Studies have been carried out correlating disease occurrence in fraternal and identical twins, and the results of these studies can be used to inform coalescent modeling. Since identical twins share all of their genome, but fraternal twins only share half their genome, the difference in correlation between the identical and fraternal twins can be used to work out if a disease is heritable, and if so how strongly.[3]

The genomic distribution of heterozygosity

[edit]

The humansingle-nucleotide polymorphism (SNP) map has revealed large regional variations in heterozygosity, more so than can be explained on the basis of (Poisson-distributed) random chance.[10] In part, these variations could be explained on the basis of assessment methods, the availability of genomic sequences, and possibly the standard coalescent population genetic model. Population genetic influences could have a major influence on this variation: some loci presumably would have comparatively recent common ancestors, others might have much older genealogies, and so the regional accumulation of SNPs over time could be quite different. The local density of SNPs along chromosomes appears to cluster in accordance with avariance to mean power law and to obey theTweedie compound Poisson distribution.[11] In this model the regional variations in the SNP map would be explained by the accumulation of multiple small genomic segments through recombination, where the mean number of SNPs per segment would begamma distributed in proportion to a gamma distributed time to the most recent common ancestor for each segment.[12]

History

[edit]

Coalescent theory is a natural extension of the more classicalpopulation genetics concept ofneutral evolution and is an approximation to theFisher–Wright (or Wright–Fisher) model for large populations. It was discovered independently by several researchers in the 1980s.[13][14][15][16]

Software

[edit]

A large body of software exists for both simulating data sets under the coalescent process as well as inferring parameters such as population size and migration rates from genetic data.

  • BEAST andBEAST 2Bayesian inference package viaMCMC with a wide range of coalescent models including the use of temporally sampled sequences.[17]
  • BPP – software package for inferring phylogeny and divergence times among populations under a multispecies coalescent process.
  • CoaSim – software for simulating genetic data under the coalescent model.
  • DIYABC – a user-friendly approach toABC for inference on population history using molecular markers.[18]
  • DendroPy – a Python library for phylogenetic computing, with classes and methods for simulating pure (unconstrained) coalescent trees as well as constrained coalescent trees under the multispecies coalescent model (i.e., "gene trees in species trees").
  • GeneRecon – software for the fine-scale mapping oflinkage disequilibrium mapping of disease genes using coalescent theory based on aBayesianMCMC framework.
  • genetreeArchived 2012-02-05 at theWayback Machine software for estimation ofpopulation genetics parameters using coalescent theory and simulation (theR package "popgen"). See alsoOxford Mathematical Genetics and Bioinformatics Group
  • GENOME – rapid coalescent-based whole-genome simulation[19]
  • IBDSim – a computer package for the simulation of genotypic data under general isolation by distance models.[20]
  • IMa – IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models. IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790). IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.
  • Lamarc – software for estimation of rates of population growth, migration, and recombination.
  • Migraine – a program which implements coalescent algorithms for a maximum likelihood analysis (usingImportance Sampling algorithms) of genetic data with a focus on spatially structured populations.[21]
  • Migratemaximum likelihood andBayesian inference of migration rates under then-coalescent. The inference is implemented usingMCMC
  • MaCS – Markovian Coalescent Simulator – simulates genealogies spatially across chromosomes as a Markovian process. Similar to the SMC algorithm of McVean and Cardin, and supports all demographic scenarios found in Hudson's ms.
  • ms & msHOT – Richard Hudson's original program for generating samples under neutral models[22] and an extension which allowsrecombination hotspots.[23]
  • msms – an extended version of ms that includes selective sweeps.[24]
  • msprime – a fast and scalable ms-compatible simulator, allowing demographic simulations, producing compact output files for thousands or millions of genomes.
  • PhyloCoalSimulations - a Julia package to simulate gene trees under the coalescent along a phylogenetic network / admixture graph. The model allows for possible correlated inheritance at reticulations, which represent introgression, gene flow or hybridization events.
  • Recodon and NetRecodon – software to simulate coding sequences with inter/intracodon recombination, migration, growth rate and longitudinal sampling.[25][26]
  • CoalEvol and SGWE – software to simulate nucleotide, coding and amino acid sequences under the coalescent with demographics, recombination, population structure with migration and longitudinal sampling.[27]
  • SARG – structure Ancestral Recombination Graph by Magnus Nordborg
  • simcoal2 – software to simulate genetic data under the coalescent model with complex demography and recombination
  • TreesimJ – forward simulation software allowing sampling of genealogies and data sets under diverse selective and demographic models.

References

[edit]
  1. ^Etheridge, Alison (2011-01-07).Some Mathematical Models from Population Genetics: École D'Été de Probabilités de Saint-Flour XXXIX-2009. Springer Science & Business Media.ISBN 978-3-642-16631-0.
  2. ^abcMorris, A., Whittaker, J., & Balding, D. (2002). Fine-Scale Mapping of Disease Loci via Shattered Coalescent Modeling of Genealogies.The American Journal of Human Genetics,70(3), 686–707.doi:10.1086/339271
  3. ^abcRannala, B. (2001). Finding genes influencing susceptibility to complex diseases in the post-genome era.American journal of pharmacogenomics,1(3), 203–221.

Sources

[edit]

Articles

[edit]
  • ^ Arenas, M. and Posada, D. (2014) Simulation of Genome-Wide Evolution under Heterogeneous Substitution Models and Complex Multispecies Coalescent Histories.Molecular Biology and Evolution31(5): 1295–1301
  • ^ Arenas, M. and Posada, D. (2007) Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography.BMC Bioinformatics8: 458
  • ^ Arenas, M. and Posada, D. (2010) Coalescent simulation of intracodon recombination.Genetics184(2): 429–437
  • ^ Browning, S.R. (2006) Multilocus association mapping using variable-length markov chains.American Journal of Human Genetics78:903–913[permanent dead link]
  • ^ Cornuet J.-M., Pudlo P., Veyssier J., Dehne-Garcia A., Gautier M., Leblois R., Marin J.-M., Estoup A. (2014) DIYABC v2.0: a software to make Approximate Bayesian Computation inferences about population history using Single Nucleotide Polymorphism, DNA sequence and microsatellite data.Bioinformatics '30': 1187–1189
  • ^ Degnan, JH and LA Salter. 2005. Gene tree distributions under the coalescent process. Evolution 59(1): 24–37.pdf from coaltree.net/
  • ^ Donnelly, P., Tavaré, S. (1995) Coalescents and genealogical structure under neutrality.Annual Review of Genetics29:401–421
  • ^Drummond A, Suchard MA, Xie D, Rambaut A (2012)."Bayesian phylogenetics with BEAUti and the BEAST 1.7".Molecular Biology and Evolution.29 (8):1969–1973.doi:10.1093/molbev/mss075.PMC 3408070.PMID 22367748.
  • ^ Ewing, G. and Hermisson J. (2010), MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus,Bioinformatics26:15
  • ^ Hellenthal, G., Stephens M. (2006) msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspotsBioinformaticsAOP
  • ^Hudson, Richard R. (1983a). "Testing the Constant-Rate Neutral Allele Model with Protein Sequence Data".Evolution.37 (1):203–17.doi:10.2307/2408186.ISSN 1558-5646.JSTOR 2408186.PMID 28568026.
  • ^ Hudson RR (1983b) Properties of a neutral allele model with intragenic recombination.Theoretical Population Biology23:183–201.
  • ^ Hudson RR (1991)Gene genealogies and the coalescent process.Oxford Surveys in Evolutionary Biology7: 1–44
  • ^ Hudson RR (2002) Generating samples under a Wright–Fisher neutral model.Bioinformatics18:337–338
  • ^ Kendal WS (2003) An exponential dispersion model for the distribution of human single nucleotide polymorphisms.Mol Biol Evol20: 579–590
  • Hein, J., Schierup, M., Wiuf C. (2004)Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory Oxford University PressISBN 978-0-19-852996-5
  • ^ Kaplan, N.L., Darden, T., Hudson, R.R. (1988) The coalescent process in models with selection.Genetics120:819–829
  • ^Kingman, J. F. C. (1982). "On the Genealogy of Large Populations".Journal of Applied Probability.19:27–43.CiteSeerX 10.1.1.552.1429.doi:10.2307/3213548.ISSN 0021-9002.JSTOR 3213548.S2CID 125055288.
  • ^ Kingman, J.F.C. (2000) Origins of the coalescent 1974–1982.Genetics156:1461–1463
  • ^ Leblois R., Estoup A. and Rousset F. (2009) IBDSim: a computer program to simulate genotypic data under isolation by distanceMolecular Ecology Resources9:107–109
  • ^ Liang L., Zöllner S., Abecasis G.R. (2007) GENOME: a rapid coalescent-based whole genome simulator.Bioinformatics23: 1565–1567
  • ^ Mailund, T., Schierup, M.H., Pedersen, C.N.S., Mechlenborg, P. J. M., Madsen, J.N., Schauser, L. (2005) CoaSim: A Flexible Environment for Simulating Genetic Data under Coalescent ModelsBMC Bioinformatics6:252
  • ^ Möhle, M., Sagitov, S. (2001) A classification of coalescent processes for haploid exchangeable population modelsThe Annals of Probability29:1547–1562
  • ^ Morris, A. P., Whittaker, J. C., Balding, D. J. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogiesAmerican Journal of Human Genetics70:686–707[permanent dead link]
  • ^Neuhauser, C., Krone, S.M. (1997) The genealogy of samples in models with selectionGenetics145 519–534
  • ^ Pitman, J. (1999) Coalescents with multiple collisionsThe Annals of Probability27:1870–1902
  • ^ Harding, Rosalind, M. 1998. New phylogenies: an introductory look at the coalescent. pp. 15–22, in Harvey, P. H., Brown, A. J. L., Smith, J. M., Nee, S. New uses for new phylogenies. Oxford University Press (ISBN 0198549849)
  • ^ Rosenberg, N.A., Nordborg, M. (2002) Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms.Nature Reviews Genetics3:380–390
  • ^ Sagitov, S. (1999) The general coalescent with asynchronous mergers of ancestral linesJournal of Applied Probability36:1116–1125
  • ^ Schweinsberg, J. (2000) Coalescents with simultaneous multiple collisionsElectronic Journal of Probability5:1–50
  • ^ Slatkin, M. (2001) Simulating genealogies of selected alleles in populations of variable sizeGenetic Research145:519–534
  • ^ Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in finite populations.Genetics105:437–460
  • ^ Tavare S, Balding DJ, Griffiths RC & Donnelly P. 1997. Inferring coalescent times from DNA sequence data.Genetics145: 505–518.
  • ^ The international SNP map working group. 2001. A map of human genome variation containing 1.42 million single nucleotide polymorphisms.Nature409: 928–933.
  • ^ Zöllner S. andPritchard J.K. (2005) Coalescent-Based Association Mapping and Fine Mapping of Complex Trait LociGenetics169:1071–1092
  • ^ Rousset F. and Leblois R. (2007) Likelihood and Approximate Likelihood Analyses of Genetic Structure in a Linear Habitat: Performance and Robustness to Model Mis-SpecificationMolecular Biology and Evolution24:2730–2745

Books

[edit]

External links

[edit]
Key concepts
Selection
Effects of selection
on genomic variation
Genetic drift
Founders
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Coalescent_theory&oldid=1263293144"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp