Movatterモバイル変換

Least squares inference in phylogeny

From Wikipedia, the free encyclopedia

Generation of phylogenetic trees based on an observed matrix of pairwise genetic distances

Least squares inference in phylogeny generates aphylogenetic tree based on anobserved matrix of pairwisegenetic distances andoptionally a weightmatrix. The goal is to find a tree which satisfies the distance constraints asbest as possible.

Ordinary and weighted least squares

[edit]

The discrepancy between the observed pairwise distances $D_{ij}$ and the distances $T_{ij}$ over a phylogenetic tree (i.e. the sumof the branch lengths in the path from leaf $i {\displaystyle i}$ to leaf $j {\displaystyle j}$ ) is measured by

S=\sum _{ij}w_{ij}(D_{ij}-T_{ij})^{2}

where the weights $w_{ij}$ depend on the least squares method used.Least squaresdistance tree construction aims to find the tree (topology and branch lengths)with minimal S. This is a non-trivial problem. It involves searching thediscrete space of unrooted binary tree topologies whose size is exponential inthe number of leaves. For n leaves there are1 • 3 • 5 • ... • (2n-3)different topologies. Enumerating them is not feasible already for a smallnumber of leaves. Heuristic search methods are used to find a reasonablygood topology. The evaluation of S for a given topology (which includes thecomputation of the branch lengths) is alinear least squares problem.There are several ways to weight the squared errors $(D_{ij}-T_{ij})^{2}$ ,depending on the knowledge and assumptions about the variances of the observeddistances. When nothing is known about the errors, or if they are assumed to beindependently distributed and equal for all observed distances, then all theweights $w_{ij}$ are set to one. This leads to an ordinary leastsquares estimate.In the weighted least squares case the errors are assumed to be independent(or their correlations are not known). Given independent errors, a particularweight should ideally be set to the inverse of the variance of the corresponding distanceestimate. Sometimes the variances may not be known, but theycan be modeled as a function of the distance estimates. In the Fitch andMargoliash method^[1]for instance it is assumed that the variances are proportional to the squareddistances.

Generalized least squares

[edit]

The ordinary and weighted least squares methods described aboveassume independent distance estimates. If the distancesare derived from genomic data their estimates covary, because evolutionaryevents on internalbranches (of the true tree) can push several distances up or down atthe same time. The resulting covariances can be taken into account using themethod of generalized least squares, i.e. minimizing the following quantity

\sum _{ij,kl}w_{ij,kl}(D_{ij}-T_{ij})(D_{kl}-T_{kl})

where $w_{ij,kl}$ are the entries of the inverse of thecovariance matrix of the distance estimates.

Computational Complexity

[edit]

Finding the tree and branch lengths minimizing the least squares residual is anNP-complete problem.^[2] However, for a given tree, the optimal branch lengths can be determined in $O(n^{2})$ time for ordinary least squares, $O(n^{3})$ time for weighted least squares, and $O(n^{4})$ time for generalised least squares (given the inverse of thecovariance matrix).^[3]

External links

[edit]

PHYLIP, a freely distributed phylogenetic analysis package containing an implementation of the weighted least squares method
PAUP, a similar package available for purchase
Darwin, a programming environment with a library of functions for statistics, numerics, sequence and phylogenetic analysis

References

[edit]

^Fitch WM, Margoliash E. (1967). Construction of phylogenetic trees.Science 155: 279-84.
^William H.E. Day,Computational complexity of inferring phylogenies from dissimilarity matrices, Bulletin of Mathematical Biology, Volume 49, Issue 4, 1987, Pages 461-467, ISSN 0092-8240,doi:10.1016/S0092-8240(87)80007-1.
^David Bryant, Peter Waddell,Rapid Evaluation of Least-Squares and Minimum-Evolution Criteria on Phylogenetic Trees^{[dead link]}, Mol Biol Evol (1998) 15(10): 1346

v t e Phylogenetics
Relevant fields	Computational phylogenetics Molecular phylogenetics Cladistics Taxonomy Evolutionary taxonomy Systematics	Evolutionary biology portal
Basic concepts	Phylogenesis Cladogenesis Phylogenetic tree Cladogram Phylogenetic network Long branch attraction Clade vsGrade Lineage Ghost lineage Ghost population
Inference methods	Maximum parsimony Phylogenetic reconciliation Probabilistic methods Maximum likelihood Bayesian inference Distance-matrix methods Neighbor-joining UPGMA Least squares Three-taxon analysis
Current topics	PhyloCode DNA barcoding Molecular phylogenetics Phylogenetic comparative methods Phylogenetic niche conservatism Phylogenetic signal Phylogenetics software Phylogenomics Phylogeography
Group traits	Primitive Plesiomorphy Symplesiomorphy Derived Apomorphy Synapomorphy Autapomorphy
Group types	Monophyly Paraphyly Polyphyly
Nomenclature	Phylogenetic nomenclature Crown group Sister group Basal Supertree
Category Commons

Retrieved from "https://en.wikipedia.org/w/index.php?title=Least_squares_inference_in_phylogeny&oldid=1021986251"

Category:

Computational phylogenetics

Hidden categories:

[8]ページ先頭