- Notifications
You must be signed in to change notification settings - Fork6
Calculate distances between phylogenetic trees in R
ms609/TreeDist
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
'TreeDist' is an R package that implements a suite of metrics that quantify thetopological distance between pairs of unweighted phylogenetic trees.It also includes a simple 'Shiny' application to allow the visualization ofdistance-based tree spaces, and functions to calculate the information contentof trees and splits.
'TreeDist' primarily employs metrics in the category of'generalized Robinson–Foulds distances': they are based on comparing splits(bipartitions) between trees, and thus reflect the relationship data withintrees, with no reference to branch lengths.
TheRobinson-Foulds distancesimply tallies the number of non-trivial splits (sometimes inaccuratelytermed clades, nodes or edges) that occur in both trees – any splits that arenot perfectly identical contribute one point to the distance score of zero,however similar or different they are.By overlooking potential similarities between almost-identical splits,this conservative approach has undesirable properties.
'Generalized' RF metricsgeneratematchings that pair splits in one tree with similar splits inthe other.Each pair of splits is assigned a similarity score; the sum of these scores inthe optimal matching then quantifies the similarity between two trees.
Different ways of calculating the the similarity between a pair of splitslead to different tree distance metrics, implemented in the functions below:
MutualClusteringInfo(),SharedPhylogeneticInfo()Smith (2020) scores matchings based on the amount of informationthat one partition contains about the other. The Mutual PhylogeneticInformation assigns zero similarity to split pairs that cannotboth exist on a single tree; The Mutual Clustering Information metric ismore forgiving, and exhibits more desirable behaviour; it is therecommended metric for tree comparison.(Its complement,
ClusteringInfoDistance(),returns a tree distance.)Nyeet al. (2006) score matchings according to the size of the largestsplit that is consistent with both of them, normalized againstthe Jaccard index. This approach is extended by Böckeret al. (2013)with the Jaccard-Robinson-Foulds metric (function
JaccardRobinsonFoulds()).Bogdanowicz and Giaro (2012) and Linet al. (2012) independently proposedcounting the number of 'mismatched' leaves in a pair of splits.
MatchingSplitInfoDistance()provides an information-based equivalent (Smith 2020).
The package also implements the variation of the path distanceproposed by Kendal and Colijn (2016) (functionKendallColijn()),approximations of the Nearest-Neighbour Interchange (NNI) distance (functionNNIDist();following Liet al. (1996)), and calculates the size (functionMASTSize()) andinformation content (functionMASTInfo()) of theMaximum Agreement Subtree.
For an implementation of the Tree Bisection and Reconnection (TBR) distance, seethe package 'TBRDist'.
Install and load the library from CRAN as follows:
install.packages('TreeDist')library('TreeDist')
You can install the development version of the package with:
if(!require("curl")) install.packages("curl")if(!require("remotes")) install.packages("remotes")remotes::install_github("ms609/TreeDist")
Construct tree spaces and readily visualize projected landscapes, avoidingcommon analytical pitfalls (Smith, 2022),using the inbuilt graphical user interface (Shiny GUI):
TreeDist::MapTrees()
Serious analysts should consult thevignettefor a command-line interface.
Other R packages implementing tree distance functions include:
- 'ape':
cophenetic.phylo(): Cophenetic distancedist.topo(): Path (topological) distance, Robinson-Foulds distance.
- 'phangorn'
treedist(): Path, Robinson-Foulds and approximate SPR distances.
- 'Quartet': Triplet and Quartet distances,using the tqDist algorithm.
- 'TBRDist': TBR and SPR distances onunrooted trees, using the 'uspr' C library.
- 'treespace': Kendall-Colijndistance and tree space visualizations.
- 'distory' (unmaintained):Geodesic distance
Böcker, S.et al. (2013)The Generalized Robinson-Fouldsmetric.Algorithms in Bioinformatics. WABI 2013.Lecture Notes in Computer Science, 8126, 156–69.
Bogdanowicz, D. and Giaro, K. (2012)Matching split distance for unrootedbinary phylogenetic trees.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, 150–160.
Kendall, M. and Colijn, C. (2016)Mapping phylogenetic trees to revealdistinct patterns of evolution.Mol Biol Evol, 33, 2735–2743.
Li, M., Tromp, J. and Zhang, L.-X. (1996)Some notes on the nearest neighbourinterchange distance.Computing and Combinatorics, Goos, G., Hartmanis, J., Leeuwen, J., Cai, J.-Y.,and Wong, C. K., eds. Springer, Berlin. 343–351.
Nye, T.M.W.et al. (2006)A novel algorithm and web-based tool forcomparing two alternative phylogenetictrees.Bioinformatics, 22, 117–119.
Smith, M.R. (2020)Information theoretic Generalized Robinson-Fouldsmetrics for comparing phylogenetictrees.Bioinformatics, 36, 5007–5013.
Smith, M.R. (2022)Robust analysis of phylogenetic treespace.Systematic Biology, 71, 1255–1270.
Please note that the 'TreeDist' project is released with aContributor Code of Conduct.By contributing to this project, you agree to abide by its terms.
About
Calculate distances between phylogenetic trees in R
Topics
Resources
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors5
Uh oh!
There was an error while loading.Please reload this page.

