| Version: | 5.8-1 |
| Date: | 2024-12-10 |
| Title: | Analyses of Phylogenetics and Evolution |
| Depends: | R (≥ 3.2.0) |
| Suggests: | gee, expm, igraph, phangorn, xml2 |
| Imports: | nlme, lattice, graphics, methods, stats, utils, parallel, Rcpp(≥ 0.12.0), digest |
| LinkingTo: | Rcpp |
| ZipData: | no |
| Description: | Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R. |
| License: | GPL-2 |GPL-3 |
| URL: | https://github.com/emmanuelparadis/ape |
| BugReports: | https://github.com/emmanuelparadis/ape/issues |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Packaged: | 2024-12-10 17:29:50 UTC; paradis |
| Author: | Emmanuel Paradis |
| Maintainer: | Emmanuel Paradis <Emmanuel.Paradis@ird.fr> |
| Repository: | CRAN |
| Date/Publication: | 2024-12-16 00:00:02 UTC |
Analyses of Phylogenetics and Evolution
Description
ape provides functions for reading, writing, manipulating,analysing, and simulating phylogenetic trees and DNA sequences,computing DNA distances, translating into AA sequences, estimatingtrees with distance-based methods, and a range of methods forcomparative analyses and analysis of diversification. Functionalitiesare also provided for programming new phylogenetic methods.
The complete list of functions can be displayed withlibrary(help = ape).
More information onape can be found athttps://emmanuelparadis.github.io.
Author(s)
Emmanuel Paradis, Ben Bolker, Julien Claude, Hoa Sien Cuong, RichardDesper, Benoit Durand, Julien Dutheil, Olivier Gascuel, ChristophHeibl, Daniel Lawson, Vincent Lefort, Pierre Legendre, Jim Lemon,Yvonnick Noel, Johan Nylander, Rainer Opgen-Rhein, Andrei-AlinPopescu, Klaus Schliep, Korbinian Strimmer, Damien de Vienne
Maintainer: Emmanuel Paradis <Emmanuel.Paradis@ird.fr>
References
Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.
Paradis, E., Claude, J. and Strimmer, K. (2004) APE: analyses ofphylogenetics and evolution in R language.Bioinformatics,20, 289–290.
Popescu, A.-A., Huber, K. T. and Paradis, E. (2012) ape 3.0: new toolsfor distance based phylogenetics and evolutionary analysis inR.Bioinformatics,28, 1536–1537.
Paradis, E. and Schliep, K. (2019) ape 5.0: an environment for modernphylogenetics and evolutionary analyses in R.Bioinformatics,35, 526–528.
Amino Acid Sequences
Description
These functions help to create and manipulate AA sequences.
Usage
## S3 method for class 'AAbin'print(x, ...)## S3 method for class 'AAbin'x[i, j, drop = FALSE]## S3 method for class 'AAbin'c(..., recursive = FALSE)## S3 method for class 'AAbin'rbind(...)## S3 method for class 'AAbin'cbind(..., check.names = TRUE, fill.with.Xs = FALSE, quiet = FALSE)## S3 method for class 'AAbin'as.character(x, ...)## S3 method for class 'AAbin'labels(object, ...)## S3 method for class 'AAbin'image(x, what, col, bg = "white", xlab = "", ylab = "", show.labels = TRUE, cex.lab = 1, legend = TRUE, grid = FALSE, show.aa = FALSE, aa.cex = 1, aa.font = 1, aa.col = "black", scheme = "Ape_AA",...)as.AAbin(x, ...)## S3 method for class 'character'as.AAbin(x, ...)## S3 method for class 'list'as.AAbin(x, ...)## S3 method for class 'AAString'as.AAbin(x, ...)## S3 method for class 'AAStringSet'as.AAbin(x, ...)## S3 method for class 'AAMultipleAlignment'as.AAbin(x, ...)## S3 method for class 'AAbin'as.list(x, ...)## S3 method for class 'AAbin'as.matrix(x, ...)## S3 method for class 'AAbin'as.phyDat(x, ...)dist.aa(x, pairwise.deletion = FALSE, scaled = FALSE)AAsubst(x)Arguments
x,object | an object of class |
i,j | indices of the rows and/or columns to select or todrop. They may be numeric, logical, or character (in the same way thanfor standardR objects). |
drop | logical; if |
recursive | logical; whether to go down lists and concatenate itselements. |
check.names | a logical specifying whether to check the rownamesbefore binding the columns (see details). |
fill.with.Xs | a logical indicating whether to keep allpossible individuals as indicating by the rownames, and eventuallyfilling the missing data with insertion gaps (ignored if |
quiet | a logical to switch off warning messages when some rowsare dropped. |
what | a vector of characters specifying the amino acids tovisualize. Currently, the only possible choice is to show the threecategories hydrophobic, small, and hydrophilic. |
col | a vector of colours. If missing, this is set to “red”,“yellow” and “blue”. |
bg | the colour used for AA codes not among |
xlab | the label for thex-axis; none by default. |
ylab | Idem for they-axis. Note that by default, the labelsof the sequences are printed on they-axis (see next option). |
show.labels | a logical controlling whether the sequence labels areprinted ( |
cex.lab | a single numeric controlling the size of the sequencelabels. Use |
legend | a logical controlling whether the legend is plotted( |
grid | a logical controlling whether to draw a grid ( |
show.aa | a logical controlling whether to show the AA symbols ( |
aa.cex,aa.font,aa.col | control the aspect of the AA symbols(ignored if the previous is |
scheme | a predefined color scheme. For amino acid options are "Ape_AA","Zappo_AA", "Clustal" and "Hydrophobicity", for nucleotides "Ape_NT" and"RY_NT". |
pairwise.deletion | a logical indicating whether to delete thesites with missing data in a pairwise way. The default is to delete thesites with at least one missing data for all sequences. |
scaled | a logical value specifying whether to scale the number ofAA differences by the sequence length. |
... | further arguments to be passed to or from other methods. |
Details
These functions help to manipulate amino acid sequences of class"AAbin". These objects are stored in vectors, matrices, or listswhich can be manipulated with the usual[ operator.
There is a conversion function to and from characters.
The functiondist.aa computes the number of AA differencesbetween each pair of sequences in a matrix; this can be scaled by thesequence length. See the functiondist.ml inphangorn for evolutionary distances with AA sequences.
The functionAAsubst returns the indices of the polymorphic sites(similar toseg.sites for DNA sequences; see examples below).
The two functionscbind.AAbin andrbind.AAbin work in thesame way than the similar methods for the class"DNAbin": seecbind.DNAbin for more explanations about their respectivebehaviours.
Value
an object of class"AAbin","character","dist", or"numeric", depending on the function.
Author(s)
Emmanuel Paradis, Franz Krah
See Also
Examples
data(woodmouse)AA <- trans(woodmouse, 2)seg.sites(woodmouse)AAsubst(AA)Tree Estimation Based on an Improved Version of the NJ Algorithm
Description
This function performs the BIONJ algorithm of Gascuel (1997).
Usage
bionj(X)Arguments
X | a distance matrix; may be an object of class |
Value
an object of class"phylo".
Author(s)
original C code by Hoa Sien Cuong and Olivier Gascuel; adapted andported toR by Vincent Lefortvincent.lefort@lirmm.fr
References
Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithmbased on a simple model of sequence data.Molecular Biology and Evolution,14:, 685–695.
See Also
nj,fastme,mvr,bionjs,SDM,dist.dna
Examples
### From Saitou and Nei (1987, Table 1):x <- c(7, 8, 11, 13, 16, 13, 17, 5, 8, 10, 13, 10, 14, 5, 7, 10, 7, 11, 8, 11, 8, 12, 5, 6, 10, 9, 13, 8)M <- matrix(0, 8, 8)M[lower.tri(M)] <- xM <- t(M)M[lower.tri(M)] <- xdimnames(M) <- list(1:8, 1:8)tr <- bionj(M)plot(tr, "u")### a less theoretical exampledata(woodmouse)trw <- bionj(dist.dna(woodmouse))plot(trw)Congruence among distance matrices
Description
FunctionCADM.global compute and test the coefficient of concordance among several distance matrices through a permutation test.
FunctionCADM.post carries out a posteriori permutation tests of the contributions of individual distance matrices to the overall concordance of the group.
Use in phylogenetic analysis: to identify congruence among distance matrices (D) representing different genes or different types of data. Congruent D matrices correspond to data tables that can be used together in a combined phylogenetic or other type of multivariate analysis.
Usage
CADM.global(Dmat, nmat, n, nperm=99, make.sym=TRUE, weights=NULL, silent=FALSE)CADM.post (Dmat, nmat, n, nperm=99, make.sym=TRUE, weights=NULL, mult="holm", mantel=FALSE, silent=FALSE)Arguments
Dmat | A text file listing the distance matrices one after the other, with or without blank lines in-between. Each matrix is in the form of a square distance matrix with 0's on the diagonal. |
nmat | Number of distance matrices in file Dmat. |
n | Number of objects in each distance matrix. All matrices must have the same number of objects. |
nperm | Number of permutations for the tests of significance. |
make.sym | TRUE: turn asymmetric matrices into symmetric matrices by averaging the two triangular portions. FALSE: analyse asymmetric matrices as they are. |
weights | A vector of positive weights for the distance matrices. Example: weights = c(1,2,3). NULL (default): all matrices have same weight in the calculation of W. |
mult | Method for correcting P-values in multiple testing. The methods are "holm" (default), "sidak", and "bonferroni". The Bonferroni correction is overly conservative; it is not recommended. It is included to allow comparisons with the other methods. |
mantel | TRUE: Mantel statistics will be computed from ranked distances, as well as permutational P-values. FALSE (default): Mantel statistics and tests will not be computed. |
silent | TRUE: informative messages will not be printed, but stopping messages will. Option useful for simulation work. FALSE: informative messages will be printed. |
Details
Dmat must contain two or more distance matrices, listed one after the other, all of the same size, and corresponding to the same objects in the same order. Raw data tables can be transformed into distance matrices before comparison with other such distance matrices, or with data that have been obtained as distance matrices, e.g. serological or DNA hybridization data. The distances will be transformed to ranks before computation of the coefficient of concordance and other statistics.
CADM.global tests the global null hypothesis that all matrices are incongruent. If the global null is rejected, functionCADM.post can be used to identify the concordant (H0 rejected) and discordant matrices (H0 not rejected) in the group. If a distance matrix has a negative value for theMantel.mean statistic, that matrix clearly does not belong to the group. Remove that matrix (if there are more than one, remove first the matrix that has the most strongly negative value forMantel.mean) and run the analysis again.
The corrections used for multiple testing are applied to the list of P-values (P) produced in the a posteriori tests; they take into account the number of tests (k) carried out simulatenously (number of matrices, parameternmat).
The Holm correction is computed after ordering the P-values in a list with the smallest value to the left. Compute adjusted P-values as:
P_{corr} = (k-i+1)*P
where i is the position in the ordered list. Final step: from left to right, if an adjustedP_{corr} in the ordered list is smaller than the one occurring at its left, make the smallest one equal to the largest one.
The Sidak correction is:
P_{corr} = 1 - (1 - P)^k
The Bonferonni correction is:
P_{corr} = k*P
Value
CADM.global produces a small table containing the W, Chi2, and Prob.perm statistics described in the following list.CADM.post produces a table stored in elementA_posteriori_tests, containing Mantel.mean, Prob, and Corrected.prob statistics in rows; the columns correspond to the k distance matrices under study, labeled Dmat.1 to Dmat.k.If parametermantel is TRUE, tables of Mantel statistics and P-values are computed among the matrices.
W | Kendall's coefficient of concordance, W (Kendall and Babington Smith 1939; see also Legendre 2010). |
Chi2 | Friedman's chi-square statistic (Friedman 1937) used in the permutation test of W. |
Prob.perm | Permutational probability. |
Mantel.mean | Mean of the Mantel correlations, computed on rank-transformed distances, between the distance matrix under test and all the other matrices in the study. |
Prob | Permutational probabilities, uncorrected. |
Corrected prob | Permutational probabilities corrected using the method selected in parameter |
Mantel.cor | Matrix of Mantel correlations, computed on rank-transformed distances, among the distance matrices. |
Mantel.prob | One-tailed P-values associated with the Mantel correlations of the previous table. The probabilities are computed in the right-hand tail. H0 is tested against the alternative one-tailed hypothesis that the Mantel correlation under test is positive. No correction is made for multiple testing. |
Author(s)
Pierre Legendre, Universite de Montreal
References
Campbell, V., Legendre, P. and Lapointe, F.-J. (2009) Assessing congruence among ultrametric distance matrices.Journal of Classification,26, 103–117.
Campbell, V., Legendre, P. and Lapointe, F.-J. (2011) The performance of the Congruence Among Distance Matrices (CADM) test in phylogenetic analysis.BMC Evolutionary Biology,11, 64.
Friedman, M. (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance.Journal of the American Statistical Association,32, 675–701.
Kendall, M. G. and Babington Smith, B. (1939) The problem of m rankings.Annals of Mathematical Statistics,10, 275–287.
Lapointe, F.-J., Kirsch, J. A. W. and Hutcheon, J. M. (1999) Total evidence, consensus, and bat phylogeny: a distance-based approach.Molecular Phylogenetics and Evolution,11, 55–66.
Legendre, P. (2010) Coefficient of concordance. Pp. 164-169 in: Encyclopedia of Research Design, Vol. 1. N. J. Salkind, ed. SAGE Publications, Inc., Los Angeles.
Legendre, P. and Lapointe, F.-J. (2004) Assessing congruence among distance matrices: single malt Scotch whiskiesrevisited.Australian and New Zealand Journal of Statistics,46, 615–629.
Legendre, P. and Lapointe, F.-J. (2005) Congruence entre matrices de distance. P. 178-181 in: Makarenkov, V., G. Cucumel et F.-J. Lapointe [eds] Comptes rendus des 12emes Rencontres de la Societe Francophone de Classification, Montreal, 30 mai - 1er juin 2005.
Siegel, S. and Castellan, N. J., Jr. (1988)Nonparametric statistics for the behavioral sciences. 2nd edition. New York: McGraw-Hill.
Examples
# Examples 1 and 2: 5 genetic distance matrices computed from simulated DNA# sequences representing 50 taxa having evolved along additive trees with# identical evolutionary parameters (GTR+ Gamma + I). Distance matrices were# computed from the DNA sequence matrices using a p distance corrected with the# same parameters as those used to simulate the DNA sequences. See Campbell et# al. (2009) for details.# Example 1: five independent additive trees. Data provided by V. Campbell.data(mat5Mrand)res.global <- CADM.global(mat5Mrand, 5, 50)# Example 2: three partly similar trees, two independent trees.# Data provided by V. Campbell.data(mat5M3ID)res.global <- CADM.global(mat5M3ID, 5, 50)res.post <- CADM.post(mat5M3ID, 5, 50, mantel=TRUE)# Example 3: three matrices respectively representing Serological# (asymmetric), DNA hybridization (asymmetric) and Anatomical (symmetric)# distances among 9 families. Data from Lapointe et al. (1999).data(mat3)res.global <- CADM.global(mat3, 3, 9, nperm=999)res.post <- CADM.post(mat3, 3, 9, nperm=999, mantel=TRUE)# Example 4, showing how to bind two D matrices (cophenetic matrices# in this example) into a file using rbind(), then run the global test.a <- rtree(5)b <- rtree(5)A <- cophenetic(a)B <- cophenetic(b)x <- rownames(A)B <- B[x, x]M <- rbind(A, B)CADM.global(M, 2, 5)Manipulate DNA Sequences in Bit-Level Format
Description
These functions help to manipulate DNA sequences coded in thebit-level coding scheme.
Usage
## S3 method for class 'DNAbin'print(x, printlen = 6, digits = 3, ...)## S3 method for class 'DNAbin'rbind(...)## S3 method for class 'DNAbin'cbind(..., check.names = TRUE, fill.with.gaps = FALSE, quiet = FALSE)## S3 method for class 'DNAbin'x[i, j, drop = FALSE]## S3 method for class 'DNAbin'as.matrix(x, ...)## S3 method for class 'DNAbin'c(..., recursive = FALSE)## S3 method for class 'DNAbin'as.list(x, ...)## S3 method for class 'DNAbin'labels(object, ...)Arguments
x,object | an object of class |
... | either further arguments to be passed to or from othermethods in the case of |
printlen | the number of labels to print (6 by default). |
digits | the number of digits to print (3 by default). |
check.names | a logical specifying whether to check the rownamesbefore binding the columns (see details). |
fill.with.gaps | a logical indicating whether to keep allpossible individuals as indicating by the rownames, and eventuallyfilling the missing data with insertion gaps (ignored if |
quiet | a logical to switch off warning messages when some rowsare dropped. |
i,j | indices of the rows and/or columns to select or to drop.They may be numeric, logical, or character (in the same way than forstandardR objects). |
drop | logical; if |
recursive | for compatibility with the generic (unused). |
Details
These are all ‘methods’ of generic functions which are here applied toDNA sequences stored as objects of class"DNAbin". They areused in the same way than the standardR functions to manipulatevectors, matrices, and lists. Additionally, the operators[[and$ may be used to extract a vector from a list. Note thatthe default ofdrop is not the same than the generic operator:this is to avoid dropping rownames when selecting a single sequence.
These functions are provided to manipulate easily DNA sequences codedwith the bit-level coding scheme. The latter allows much fastercomparisons of sequences, as well as storing them in less memorycompared to the format used beforeape 1.10.
Forcbind, the default behaviour is to keep only individuals(as indicated by the rownames) for which there are no missing data. Iffill.with.gaps = TRUE, a ‘complete’ matrix is returned,enventually with insertion gaps as missing data. Ifcheck.names = TRUE (the default), the rownames of each matrix are checked, andthe rows are reordered if necessary (if some rownames are duplicated,an error is returned). Ifcheck.names = FALSE, the matricesmust all have the same number of rows, and are simply binded; therownames of the first matrix are used. See the examples.
as.matrix may be used to convert DNA sequences (of the samelength) stored in a list into a matrix while keeping the names and theclass.as.list does the reverse operation.
Value
an object of class"DNAbin" in the case ofrbind,cbind, and[.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2007) A Bit-Level Coding Scheme for Nucleotides.https://emmanuelparadis.github.io/misc/BitLevelCodingScheme_20April2007.pdf
Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.
See Also
as.DNAbin,read.dna,read.GenBank,write.dna,image.DNAbin,AAbin
The corresponding generic functions are documented in the packagebase.
Examples
data(woodmouse)woodmouseprint(woodmouse, 15, 6)print(woodmouse[1:5, 1:300], 15, 6)### Just to show how distances could be influenced by sampling:dist.dna(woodmouse[1:2, ])dist.dna(woodmouse[1:3, ])### cbind and its options:x <- woodmouse[1:2, 1:5]y <- woodmouse[2:4, 6:10]as.character(cbind(x, y)) # gives warningas.character(cbind(x, y, fill.with.gaps = TRUE))## Not run: as.character(cbind(x, y, check.names = FALSE)) # gives an error## End(Not run)Recode Blocks of Indels
Description
This function scans a set of aligned DNA sequences and returns amatrix with information of the localisations and lengths on alignmentgaps.
Usage
DNAbin2indel(x)Arguments
x | an object of class |
Details
The output matrix has the same dimensions than the input one with,either a numeric value where an alignment gap starts giving the lengthof the gap, or zero. The rownames are kept.
Value
a numeric matrix.
Author(s)
Emmanuel Paradis
See Also
DNAbin,as.DNAbin,del.gaps,seg.sites,image.DNAbin,checkAlignment
Tree Estimation Based on the Minimum Evolution Algorithm
Description
The two FastME functions (balanced and OLS) perform theminimum evolution algorithm of Desper and Gascuel (2002).
Usage
fastme.bal(X, nni = TRUE, spr = TRUE, tbr = FALSE) fastme.ols(X, nni = TRUE)Arguments
X | a distance matrix; may be an object of class |
nni | a logical value; TRUE to perform NNIs (default). |
spr | ditto for SPRs. |
tbr | ignored (see details). |
Details
The code to perform topology searches based on TBR (tree bisection andreconnection) did not run correctly and has been removed after therelease ofape 5.3. A warning is issued iftbr = TRUE.
Value
an object of class"phylo".
Author(s)
original C code by Richard Desper; adapted and ported to Rby Vincent Lefortvincent.lefort@lirmm.fr
References
Desper, R. and Gascuel, O. (2002) Fast and accurate phylogenyreconstruction algorithms based on the minimum-evolution principle.Journal of Computational Biology,9, 687–705.
See Also
nj,bionj,write.tree,read.tree,dist.dna
Examples
### From Saitou and Nei (1987, Table 1):x <- c(7, 8, 11, 13, 16, 13, 17, 5, 8, 10, 13, 10, 14, 5, 7, 10, 7, 11, 8, 11, 8, 12, 5, 6, 10, 9, 13, 8)M <- matrix(0, 8, 8)M[lower.tri(M)] <- xM <- t(M)M[lower.tri(M)] <- xdimnames(M) <- list(1:8, 1:8)tr <- fastme.bal(M)plot(tr, "u")### a less theoretical exampledata(woodmouse)trw <- fastme.bal(dist.dna(woodmouse))plot(trw)Initialize a ‘corPhyl’ Structure Object
Description
Initialize acorPhyl correlation structure object.Does the same asInitialize.corStruct, but also checks the row names of data and builds an index.
Usage
## S3 method for class 'corPhyl'Initialize(object, data, ...)Arguments
object | An object inheriting from class |
data | The data to use. If it contains rownames, they are matched with the tree tip labels, otherwise data are supposed to be in the same order than tip labels and a warning is sent. |
... | some methods for this generic require additional arguments. None are used in this method. |
Value
An initialized object of same class asobject.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
See Also
corClasses,Initialize.corStruct.
Theoretical Lineage-Through Time Plots
Description
This function draws the lineage-through time (LTT) plots predictedunder a speciation-extinction model (aka birth-death model) withspecified values of speciation and extinction rates (which may varywith time).
A prediction interval is plotted by default which requires to define asample size (100 by default), and different curves can be combined.
Usage
LTT(birth = 0.1, death = 0, N = 100, Tmax = 50, PI = 95, scaled = TRUE, eps = 0.1, add = FALSE, backward = TRUE, ltt.style = list("black", 1, 1), pi.style = list("blue", 1, 2), ...)Arguments
birth | the speciation rate, this may be either a numeric valueor a funtion of time (named |
death | id. for the extinction rate. |
N | the size of the tree. |
Tmax | the age of the root of the tree. |
PI | the percentage value of the prediction interval; set thisvalue to 0 to not draw this interval. |
scaled | a logical values specifying whether to scale the |
eps | a numerical value giving the resolution of the time axis. |
add | a logical values specifying whether to make a new plot (thedefault). |
backward | a logical value: should the time axis be traced fromthe present (the default), or from the root of the tree? |
ltt.style | a list with three elements giving the style of theLTT curve with, respectively, the colour ( |
pi.style | id. for the prediction interval. |
... | arguments passed to |
Details
For the moment, this works well whenbirth anddeath areconstant. Some improvements are under progress for time-dependentrates (but see below for an example).
Author(s)
Emmanuel Paradis
References
Hallinan, N. (2012) The generalized time variable reconstructedbirth–death process.Journal of Theoretical Biology,300, 265–276.
Paradis, E. (2011) Time-dependent speciation and extinction fromphylogenies: a least squares approach.Evolution,65,661–672.
Paradis, E. (2015) Random phylogenies and the distribution ofbranching times.Journal of Theoretical Biology,387,39–45.
See Also
Examples
### predicted LTT plot under a Yule model with lambda = 0.1### and 50 species after 50 units of time...LTT(N = 50)### ... and with a birth-death model with the same rate of### diversification (try with N = 500):LTT(0.2, 0.1, N = 50, PI = 0, add = TRUE, ltt.style = list("red", 2, 1))### predictions under different tree sizes:layout(matrix(1:4, 2, 2, byrow = TRUE))for (N in c(50, 100, 500, 1000)) { LTT(0.2, 0.1, N = N) title(paste("N =", N))}layout(1)## Not run: ### speciation rate decreasing with timebirth.logis <- function(t) 1/(1 + exp(0.02 * t + 4))LTT(birth.logis)LTT(birth.logis, 0.05)LTT(birth.logis, 0.1)## End(Not run)Most Parsimonious Reconstruction
Description
This function does ancestral character reconstruction by parsimony asdescribed in Hanazawa et al. (1995) and modified by Narushima andHanazawa (1997).
Usage
MPR(x, phy, outgroup)Arguments
x | a vector of integers. |
phy | an object of class |
outgroup | an integer or a character string giving the tip of |
Details
Hanazawa et al. (1995) and Narushima and Hanazawa (1997) used Farris's(1970) and Swofford and Maddison's (1987) framework to reconstructancestral states using parsimony. The character is assumed to takeinteger values. The algorithm finds the sets of values for each nodeas intervals with lower and upper values.
It is recommended to root the tree with the outgroup before theanalysis, so plotting the values withnodelabels issimple.
Value
a matrix of integers with two columns named “lower” and “upper”giving the lower and upper values of the reconstructed sets for eachnode.
Author(s)
Emmanuel Paradis
References
Farris, J. M. (1970) Methods for computing Wagner trees.Systematic Zoology,19, 83–92.
Hanazawa, M., Narushima, H. and Minaka, N. (1995) Generating mostparsimonious reconstructions on a tree: a generalization of theFarris–Swofford–Maddison method.Discrete AppliedMathematics,56, 245–265.
Narushima, H. and Hanazawa, M. (1997) A more efficient algorithm forMPR problems in phylogeny.Discrete Applied Mathematics,80, 231–238.
Swofford, D. L. and Maddison, W. P. (1987) Reconstructing ancestralcharacter states under Wagner parsimony.MathematicalBiosciences,87, 199–229.
See Also
Examples
## the example in Narushima and Hanazawa (1997):tr <- read.tree(text = "(((i,j)c,(k,l)b)a,(h,g)e,f)d;")x <- c(1, 3, 0, 6, 5, 2, 4)names(x) <- letters[6:12](o <- MPR(x, tr, "f"))plot(tr)nodelabels(paste0("[", o[, 1], ",", o[, 2], "]"))tiplabels(x[tr$tip.label], adj = -2)## some random data:x <- rpois(30, 1)tr <- rtree(30, rooted = FALSE)MPR(x, tr, "t1")Moran's I Autocorrelation Index
Description
This function computes Moran's I autocorrelation coefficient ofx giving a matrix of weights using the method described byGittleman and Kot (1990).
Usage
Moran.I(x, weight, scaled = FALSE, na.rm = FALSE, alternative = "two.sided")Arguments
x | a numeric vector. |
weight | a matrix of weights. |
scaled | a logical indicating whether the coefficient should bescaled so that it varies between -1 and +1 (default to |
na.rm | a logical indicating whether missing values should beremoved. |
alternative | a character string specifying the alternativehypothesis that is tested against the null hypothesis of nophylogenetic correlation; must be of one "two.sided", "less", or"greater", or any unambiguous abbrevation of these. |
Details
The matrixweight is used as “neighbourhood” weights, andMoran's I coefficient is computed using the formula:
I = \frac{n}{S_0} \frac{\sum_{i=1}^n\sum_{j=1}^n w_{i,j}(y_i - \overline{y})(y_j - \overline{y})}{\sum_{i=1}^n {(y_i -\overline{y})}^2}
with
y_i= observationsw_{i,j}= distance weightn= number of observationsS_0=\sum_{i=1}^n\sum_{j=1}^n wij
The null hypothesis of no phylogenetic correlation is tested assumingnormality of I under this null hypothesis. If the observed valueof I is significantly greater than the expected value, then the valuesofx are positively autocorrelated, whereas if Iobserved <Iexpected, this will indicate negative autocorrelation.
Value
A list containing the elements:
observed | the computed Moran's I. |
expected | the expected value of I under the null hypothesis. |
sd | the standard deviation of I under the null hypothesis. |
p.value | the P-value of the test of the null hypothesis againstthe alternative hypothesis specified in |
Author(s)
Julien Dutheildutheil@evolbio.mpg.de andEmmanuel Paradis
References
Gittleman, J. L. and Kot, M. (1990) Adaptation: statistics and a nullmodel for estimating phylogenetic effects.Systematic Zoology,39, 227–241.
See Also
Examples
tr <- rtree(30)x <- rnorm(30)## weights w[i,j] = 1/d[i,j]:w <- 1/cophenetic(tr)## set the diagonal w[i,i] = 0 (instead of Inf...):diag(w) <- 0Moran.I(x, w)Moran.I(x, w, alt = "l")Moran.I(x, w, alt = "g")Moran.I(x, w, scaled = TRUE) # usualy the sameConstruction of Consensus Distance Matrix With SDM
Description
This function implements the SDM method of Criscuolo et al. (2006) fora set of n distance matrices.
Usage
SDM(...)Arguments
... | 2n elements (with n > 1), the first n elements are thedistance matrices: these can be (symmetric) matrices, objects ofclass |
Details
Reconstructs a consensus distance matrix from a set of input distancematrices on overlapping sets of taxa. Potentially missing values inthe supermatrix are represented byNA. An error is returned ifthe input distance matrices can not resolve to a consensus matrix.
Value
a 2-element list containing a distance matrix labelled by the union ofthe set of taxa of the input distance matrices, and a variance matrixassociated to the returned distance matrix.
Author(s)
Andrei Popescu
References
Criscuolo, A., Berry, V., Douzery, E. J. P. , and Gascuel, O. (2006)SDM: A fast distance-based approach for (super)tree building inphylogenomics.Systematic Biology,55, 740–755.
See Also
bionj,fastme,njs,mvrs,triangMtd
Ancestral Character Estimation
Description
ace estimates ancestral character states, and the associateduncertainty, for continuous and discrete characters. Ifmarginal = TRUE, a marginal estimation procedure is used. With this method,the likelihood values at a given node are computed using only theinformation from the tips (and branches) descending from this node.
The present implementation of marginal reconstruction for discretecharacters does not calculate the most likely state for each node,integrating over all the possible states, over all the other nodes inthe tree, in proportion to their probability. For more details, seethe Note below.
logLik,deviance, andAIC are generic functionsused to extract the log-likelihood, the deviance, or the Akaikeinformation criterion of a fitted object. If no such values areavailable,NULL is returned.
anova is another generic function which is used to comparenested models: the significance of the additional parameter(s) istested with likelihood ratio tests. You must ensure that the modelsare effectively nested (if they are not, the results will bemeaningless). It is better to list the models from the smallest to thelargest.
Usage
ace(x, phy, type = "continuous", method = if (type == "continuous") "REML" else "ML", CI = TRUE, model = if (type == "continuous") "BM" else "ER", scaled = TRUE, kappa = 1, corStruct = NULL, ip = 0.1, use.expm = FALSE, use.eigen = TRUE, marginal = FALSE)## S3 method for class 'ace'print(x, digits = 4, ...)## S3 method for class 'ace'logLik(object, ...)## S3 method for class 'ace'deviance(object, ...)## S3 method for class 'ace'AIC(object, ..., k = 2)## S3 method for class 'ace'anova(object, ...)Arguments
x | a vector or a factor; an object of class |
phy | an object of class |
type | the variable type; either |
method | a character specifying the method used forestimation. Four choices are possible: |
CI | a logical specifying whether to return the 95% confidenceintervals of the ancestral state estimates (for continuouscharacters) or the likelihood of the different states (for discreteones). |
model | a character specifying the model (ignored if |
scaled | a logical specifying whether to scale the contrastestimate (used only if |
kappa | a positive value giving the exponent transformation ofthe branch lengths (see details). |
corStruct | if |
ip | the initial value(s) used for the ML estimation procedurewhen |
use.expm | a logical specifying whether to use the packageexpm to compute the matrix exponential (relevant only if |
use.eigen | a logical (relevant if |
marginal | a logical (relevant if |
digits | the number of digits to be printed. |
object | an object of class |
k | a numeric value giving the penalty per estimated parameter;the default is |
... | further arguments passed to or from other methods. |
Details
Iftype = "continuous", the default model is Brownian motionwhere characters evolve randomly following a random walk. This modelcan be fitted by residual maximum likelihood (the default), maximumlikelihood (Felsenstein 1973, Schluter et al. 1997), least squares(method = "pic", Felsenstein 1985), or generalized leastsquares (method = "GLS", Martins and Hansen 1997, Cunningham etal. 1998). In the last case, the specification ofphy andmodel are actually ignored: it is instead given through acorrelation structure with the optioncorStruct.
In the settingmethod = "ML" andmodel = "BM" (this usedto be the default untilape 3.0-7) the maximum likelihoodestimation is done simultaneously on the ancestral values and thevariance of the Brownian motion process; these estimates are then usedto compute the confidence intervals in the standard way. The REMLmethod first estimates the ancestral value at the root (aka, thephylogenetic mean), then the variance of the Brownian motion processis estimated by optimizing the residual log-likelihood. The ancestralvalues are finally inferred from the likelihood function giving thesetwo parameters. Ifmethod = "pic" or"GLS", theconfidence intervals are computed using the expected variances underthe model, so they depend only on the tree.
It could be shown that, with a continous character, REML results inunbiased estimates of the variance of the Brownian motion processwhile ML gives a downward bias. Therefore the former is recommanded.
For discrete characters (type = "discrete"), only maximumlikelihood estimation is available (Pagel 1994) (seeMPRfor an alternative method). The model is specified through a numericmatrix with integer values taken as indices of the parameters. Thenumbers of rows and of columns of this matrix must be equal, and aretaken to give the number of states of the character. For instance,matrix(c(0, 1, 1, 0), 2) will represent a model with twocharacter states and equal rates of transition,matrix(c(0, 1, 2, 0), 2) a model with unequal rates,matrix(c(0, 1, 1, 1, 0, 1, 1, 1, 0), 3) a model with three states and equal rates oftransition (the diagonal is always ignored). There are short-cuts tospecify these models:"ER" is an equal-rates model (e.g., thefirst and third examples above),"ARD" is anall-rates-different model (the second example), and"SYM" is asymmetrical model (e.g.,matrix(c(0, 1, 2, 1, 0, 3, 2, 3, 0), 3)). If a short-cut is used, the number of states is determined fromthe data.
By default, the likelihood of the different ancestral states ofdiscrete characters are computed with a joint estimation procedureusing a procedure similar to the one described in Pupko et al. (2000).Ifmarginal = TRUE, a marginal estimation procedure is used(this was the only choice untilape 3.1-1). With this method,the likelihood values at a given node are computed using only theinformation from the tips (and branches) descending from this node.With the joint estimation, all information is used for each node. Thedifference between these two methods is further explained inFelsenstein (2004, pp. 259-260) and in Yang (2006, pp. 121-126). Thepresent implementation of the joint estimation uses a “two-pass”algorithm which is much faster than stochastic mapping while theestimates of both methods are very close.
With discrete characters it is necessary to compute the exponential ofthe rate matrix. The only possibility untilape 3.0-7 was thefunctionmatexpo inape. Ifuse.expm = TRUEanduse.eigen = FALSE, the functionexpm,in the package of the same name, is used.matexpo is faster butquite inaccurate for large and/or asymmetric matrices. In case ofdoubt, use the latter. Sinceape 3.0-10, it is possible to usean eigen decomposition avoiding the need to compute the matrixexponential; see details in Lebl (2013, sect. 3.8.3). This is muchfaster and is now the default.
Since version 5.2 ofape,ace can take state uncertaintyfor discrete characters into account: this should be coded withR'sNA only. More details:
https://www.mail-archive.com/r-sig-phylo@r-project.org/msg05286.html
Value
an object of class"ace" with the following elements:
ace | if |
CI95 | if |
sigma2 | if |
rates | if |
se | if |
index.matrix | if |
loglik | if |
lik.anc | if |
call | the function call. |
Note
Liam Revell points out that for discrete characters the ancestrallikelihood values returned withmarginal = FALSE are actuallythe marginal estimates, while settingmarginal = TRUE returnsthe conditional (scaled) likelihoods of the subtree:
http://blog.phytools.org/2015/05/about-how-acemarginaltrue-does-not.html
Author(s)
Emmanuel Paradis, Ben Bolker
References
Cunningham, C. W., Omland, K. E. and Oakley, T. H. (1998)Reconstructing ancestral character states: a criticalreappraisal.Trends in Ecology & Evolution,13,361–366.
Felsenstein, J. (1973) Maximum likelihood estimationof evolutionary trees from continuous characters.AmericanJournal of Human Genetics,25, 471–492.
Felsenstein, J. (1985) Phylogenies and the comparativemethod.American Naturalist,125, 1–15.
Felsenstein, J. (2004)Inferring Phylogenies. Sunderland:Sinauer Associates.
Lebl, J. (2013)Notes on Diffy Qs: Differential Equations forEngineers.https://www.jirka.org/diffyqs/.
Martins, E. P. and Hansen, T. F. (1997) Phylogenies and thecomparative method: a general approach to incorporating phylogeneticinformation into the analysis of interspecific data.AmericanNaturalist,149, 646–667.
Pagel, M. (1994) Detecting correlated evolution on phylogenies: ageneral method for the comparative analysis of discretecharacters.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,255, 37–45.
Pupko, T., Pe'er, I, Shamir, R., and Graur, D. (2000) A fast algorithmfor joint reconstruction of ancestral amino acid sequences.Molecular Biology and Evolution,17, 890–896.
Schluter, D., Price, T., Mooers, A. O. and Ludwig, D. (1997)Likelihood of ancestor states in adaptive radiation.Evolution,51, 1699–1711.
Yang, Z. (2006)Computational Molecular Evolution. Oxford:Oxford University Press.
See Also
MPR,corBrownian,compar.ou,anova
Reconstruction of ancestral sequences can be done with the packagephangorn (see function?ancestral.pml).
Examples
### Some random data...data(bird.orders)x <- rnorm(23)### Compare the three methods for continuous characters:ace(x, bird.orders)ace(x, bird.orders, method = "pic")ace(x, bird.orders, method = "GLS", corStruct = corBrownian(1, bird.orders))### For discrete characters:x <- factor(c(rep(0, 5), rep(1, 18)))ans <- ace(x, bird.orders, type = "d")#### Showing the likelihoods on each node:plot(bird.orders, type = "c", FALSE, label.offset = 1)co <- c("blue", "yellow")tiplabels(pch = 22, bg = co[as.numeric(x)], cex = 2, adj = 1)nodelabels(thermo = ans$lik.anc, piecol = co, cex = 0.75)Add a Scale Bar to a Phylogeny Plot
Description
This function adds a horizontal bar giving the scale of the branchlengths to a plot of a phylogenetic tree on the current graphicaldevice.
Usage
add.scale.bar(x, y, length = NULL, ask = FALSE, lwd = 1, lcol = "black", ...)Arguments
x | x location of the bar (can be left missing). |
y | y location of the bar (can be left missing). |
length | a numeric value giving the length of the scale bar. Ifnone is supplied, a value is calculated from the data. |
ask | a logical; if |
lwd | the width of the bar. |
lcol | the colour of the bar (use |
... | further arguments to be passed to |
Details
By default, the bar is placed in a corner of the graph depending onthe direction of the tree. Otherwise bothx andy mustbe specified (if only one is given it is ignored).
The further arguments (...) are used to format the text. Theymay befont,cex,col, and so on (see examplesbelow, and the help page ontext).
The functionlocator may be used todetermine thex andy arguments.
Author(s)
Emmanuel Paradis
See Also
Examples
tr <- rtree(10)layout(matrix(1:2, 2, 1))plot(tr)add.scale.bar()plot(tr)add.scale.bar(cex = 0.7, font = 2, col = "red")layout(1)Incomplete Distance Matrix Filling
Description
Fills missing entries from incomplete distance matrix using theadditive or the ultrametric procedure (see reference for details).
Usage
additive(X)ultrametric(X)Arguments
X | a distance matrix or an object of class |
Value
a distance matrix.
Author(s)
Andrei Popescu
References
Makarenkov, V. and Lapointe, F.-J. (2004) A weighted least-squaresapproach for inferring phylogenies from incomplete distancematrices.Bioinformatics,20, 2113–2121.
Alignment Explorer With Multiple Devices
Description
This function helps to explore DNA alignments by zooming in. The userclicks twice defining the opposite corners of the portion which isextracted and drawned on a new window.
Usage
alex(x, ...)Arguments
x | an object of class |
... | further arguments to pass to |
Details
This function works with a DNA alignment (freshly) plotted on aninteractive graphical device (i.e., not a file) withimage.After callingalex, the user clicks twice defining a rectanglein the alignment, then this portion of the alignment is extacted andplotted on anew window. The user can click as many times onthe alignment. The process is stopped by a right-click. If the userclicks twice outside the alignment, a message “Try again!” isprinted.
Each timealex is called, the alignment is plotted on a newwindow without closing or deleting those possibly already plotted.
In all cases, the device wherex is plotted is the activewindow after the operation. It shouldnot be closed during thewhole process.
Value
NULL
Author(s)
Emmanuel Paradis
See Also
Examples
## Not run: data(woodmouse)image(woodmouse)alex(woodmouse)## End(Not run)Compare DNA Sets
Description
Comparison of DNA sequence sets, particularly when aligned.
Usage
## S3 method for class 'DNAbin'all.equal(target, current, plot = FALSE, ...)Arguments
target,current | the two sets of sequences to be compared. |
plot | a logical value specifying whether to plot the sites thatare different (only if the labels of both alignments are the same). |
... | further arguments passed to |
Details
If the two sets of DNA sequences are exactly identical, this functionreturnsTRUE. Otherwise, a detailed comparison is made only ifthe labels (i.e., rownames) oftarget andcurrent are thesame (possibly in different orders). In all other cases, a briefdescription of the differences is returned (sometimes withrecommendations to make further comparisons).
This function can be used for testing in programs usingisTRUE (see examples below).
Value
TRUE if the two sets are identical; a list with two elements(message and different.sites) if a detailed comparison is done; or avector of mode character.
Author(s)
Emmanuel Paradis
See Also
image.DNAbin,clustal,checkAlignment,the generic function:all.equal
Examples
data(woodmouse)woodm2 <- woodmousewoodm2[1, c(1:5, 10:12, 30:40)] <- as.DNAbin("g")res <- all.equal(woodmouse, woodm2, plot = TRUE)str(res)## if used for testing in R programs:isTRUE(all.equal(woodmouse, woodmouse)) # TRUEisTRUE(all.equal(woodmouse, woodm2)) # FALSEall.equal(woodmouse, woodmouse[15:1, ])all.equal(woodmouse, woodmouse[-1, ])all.equal(woodmouse, woodmouse[, -1])## Not run: ## To run the followings you need internet and Clustal and MUSCLE## correctly installed.## Data from Johnson et al. (2006, Science)refs <- paste("DQ082", 505:545, sep = "")DNA <- read.GenBank(refs)DNA.clustal <- clustal(DNA)DNA.muscle <- muscle(DNA)isTRUE(all.equal(DNA.clustal, DNA.muscle)) # FALSEall.equal(DNA.clustal, DNA.muscle, TRUE)## End(Not run)Global Comparison of two Phylogenies
Description
This function makes a global comparison of two phylogenetic trees.
Usage
## S3 method for class 'phylo'all.equal(target, current, use.edge.length = TRUE, use.tip.label = TRUE, index.return = FALSE, tolerance = .Machine$double.eps ^ 0.5, scale = NULL, ...)Arguments
target | an object of class |
current | an object of class |
use.edge.length | if |
use.tip.label | if |
index.return | if |
tolerance | the numeric tolerance used to compare the branchlengths. |
scale | a positive number, comparison of branch lengths is madeafter scaling (i.e., dividing) them by this number. |
... | further arguments passed to or from other methods. |
Details
This function is meant to be an adaptation of the generic functionall.equal for the comparison of phylogenetic trees.
A single phylogenetic tree may have several representations in the Newickformat and in the"phylo" class of objects used in ‘ape’. Oneaim of the present function is to be able to identify whether twoobjects of class"phylo" represent the same phylogeny.
Value
A logical value, or a two-column matrix.
Note
The algorithm used here does not work correctly for the comparison oftopologies (i.e., ignoring tip labels) of unrooted trees. This alsoaffectsunique.multiPhylo which calls the present function. See:
https://www.mail-archive.com/r-sig-phylo@r-project.org/msg01445.html.
Author(s)
Benoît Durandb.durand@alfort.AFSSA.FR
See Also
all.equal for the genericR function,comparePhylo
Examples
### maybe the simplest example of two representations### for the same rooted tree...:t1 <- read.tree(text = "(a:1,b:1);")t2 <- read.tree(text = "(b:1,a:1);")all.equal(t1, t2)### ... compare with this:identical(t1, t2)### one just slightly more complicated...:t3 <- read.tree(text = "((a:1,b:1):1,c:2);")t4 <- read.tree(text = "(c:2,(a:1,b:1):1);")all.equal(t3, t4) # == all.equal.phylo(t3, t4)### ... here we force the comparison as lists:all.equal.list(t3, t4)Print DNA or AA Sequence Alignement
Description
This function displays in the console or a file an alignment of DNA orAAsequences. The first sequence is printed on the first row and thebases of the other sequences are replaced by dots if they areidentical with the first sequence.
Usage
alview(x, file = "", uppercase = TRUE, showpos = TRUE)Arguments
x | a matrix or a list of DNA sequences (class |
file | a character string giving the name of the file where to print the sequences; by default, they are printed in the console. |
uppercase | a logical specifying whether to print the bases asuppercase letters. |
showpos | either a logical value specifying whether to displaythe site positions, or a numeric vector giving these positions (seeexamples). |
Details
The first line of the output shows the position of the last column of the printed alignment.
Author(s)
Emmanuel Paradis
See Also
DNAbin,image.DNAbin,alex,clustal,checkAlignment,all.equal.DNAbin
Examples
data(woodmouse)alview(woodmouse[, 1:50])alview(woodmouse[, 1:50], uppercase = FALSE)## display only some sites:j <- c(10, 49, 125, 567) # just randomx <- woodmouse[, j]alview(x, showpos = FALSE) # no site position displayedalview(x, showpos = j)## Not run: alview(woodmouse, file = "woodmouse.txt")## End(Not run)Internal Ape Functions
Description
Internalape functions which are undocumented but still exportedbecause called by other packages. Use with care!
Tools to Explore Files
Description
These functions help to find files on the local disk.
Usage
Xplorefiles(from = "HOME", recursive = TRUE, ignore.case = TRUE)editFileExtensions()bydir(x)Xplor(from = "HOME")Arguments
from | the directory where to start the file search; by default,the ‘HOME’ directory. Use |
recursive | whether to search the subdirectories; |
ignore.case | whether to ignore the case of the file extensions; |
x | a list returned by |
Details
Xplorefiles looks for all files with a specified extension intheir names. The default is to look for the following file types:CLUSTAL (.aln), FASTA (.fas, .fasta), FASTQ (.fq, .fastq), NEWICK(.nwk, .newick, .tre, .tree), NEXUS (.nex, .nexus), and PHYLIP(.phy). This list can be modified witheditFileExtensions.
bydir sorts the list of files by directories.
Xplor combines the other operations and opens the results ina Web browser with clickable links to the directories and files.
Value
Xplorefiles returns a list.bydir prints the filelistings on the console.
Author(s)
Emmanuel Paradis
Examples
## Not run: x <- Xplorefiles()x # all data files on your diskbydir(x) # sorted by directoriesbydir(x["fasta"]) # only the FASTA filesXplorefiles(getwd(), recursive = FALSE) # look only in current dirXplor()## End(Not run)Conversion Among DNA Sequence Internal Formats
Description
These functions transform a set of DNA sequences among variousinternal formats.
Usage
as.alignment(x)as.DNAbin(x, ...)## S3 method for class 'character'as.DNAbin(x, ...)## S3 method for class 'list'as.DNAbin(x, ...)## S3 method for class 'alignment'as.DNAbin(x, ...)## S3 method for class 'DNAString'as.DNAbin(x, ...)## S3 method for class 'DNAStringSet'as.DNAbin(x, ...)## S3 method for class 'PairwiseAlignmentsSingleSubject'as.DNAbin(x, ...)## S3 method for class 'DNAMultipleAlignment'as.DNAbin(x, ...)## S3 method for class 'DNAbin'as.character(x, ...)Arguments
x | a matrix or a list containing the DNA sequences, or an objectof class |
... | further arguments to be passed to or from other methods. |
Details
Foras.alignment, the sequences given as argument should bestored as matrices or lists of single-character strings (the formatused inape before version 1.10). The returned object is in theformat used in the packageseqinr to store aligned sequences.
as.DNAbin is a generic function with methods so that it workswith sequences stored into vectors, matrices, or lists. It can convertsome S4 classes from the packageBiostrings in BioConductor. Forconsistency withinape, this uses an S3-style syntax. To convertobjects of class"DNAStringSetList", see the examples.
as.character is a generic function: the present methodconverts objects of class"DNAbin" into the format usedbeforeape 1.10 (matrix of single characters, or list of vectorsof single characters). This function must be used first to convertobjects of class"DNAbin" into the class"alignment".
Value
an object of class"alignment" in the case of"as.alignment"; an object of class"DNAbin" in the caseof"as.DNAbin"; a matrix of mode character or a list containingvectors of mode character in the case of"as.character".
Author(s)
Emmanuel Paradis
See Also
DNAbin,read.dna,read.GenBank,write.dna
Examples
data(woodmouse)x <- as.character(woodmouse)x[, 1:20]str(as.alignment(x))identical(as.DNAbin(x), woodmouse)### conversion from BioConductor:## Not run: if (require(Biostrings)) {data(phiX174Phage)X <- as.DNAbin(phiX174Phage)## base frequencies:base.freq(X) # from apealphabetFrequency(phiX174Phage) # from Biostrings### for objects of class "DNAStringSetList"X <- lapply(x, as.DNAbin) # a list of lists### to put all sequences in a single list:X <- unlist(X, recursive = FALSE)class(X) <- "DNAbin"}## End(Not run)Split Frequencies and Conversion Among Split Classes
Description
bitsplits returns the bipartitions (aka splits) for a singletree or a list of trees. If at least one tree is rooted, an error isreturned.
countBipartitions returns the frequencies of the bipartitionsfrom a reference tree (phy) observed in a list of trees (X), all unrooted.
as.bitsplits andas.prop.part are generic functions forconverting between the"bitsplits" and"prop.part"classes.
Usage
bitsplits(x)countBipartitions(phy, X)as.bitsplits(x)## S3 method for class 'prop.part'as.bitsplits(x)## S3 method for class 'bitsplits'print(x, ...)## S3 method for class 'bitsplits'sort(x, decreasing = FALSE, ...)as.prop.part(x, ...)## S3 method for class 'bitsplits'as.prop.part(x, include.trivial = FALSE, ...)Arguments
x | an object of the appropriate class. |
phy | an object of class |
X | an object of class |
decreasing | a logical value to sort the bipartitions inincreasing (the default) or decreasing order of their frequency. |
include.trivial | a logical value specifying whether to includethe trivial split with all tips in the returned object. |
... | further arguments passed to or from other methods. |
Details
These functions count bipartitions as defined by internal branches, sothey work only with unrooted trees. The structure of the class"bitsplits" is described in a separate document on ape's website.
This data structure has a memory requirement proportional ton^2, so it can be inefficient with large trees (> 1000 tips),particularly if they are very different (i.e., with few sharedsplits). In any case, an error occurs if the product of the number oftips by the number of nodes is greater than2^{31}-1 (~2.1billion). A warning message is given if the tree(s) has(ve) more than46,341 tips. It may happen that the search for splits is interruptedif the data structure is full (with a warning message).
Value
bitsplits,as.bitsplits, andsort return an objectof class"bitsplits".
countBipartitions returns a vector of integers.
as.prop.part returns an object of class"prop.part".
Author(s)
Emmanuel Paradis
See Also
Examples
tr <- rtree(20)pp <- prop.part(tr)as.bitsplits(pp)## works only with unrooted trees (ape 5.5):countBipartitions(rtree(10, rooted = FALSE), rmtree(100, 10, rooted = FALSE))Conversion Between Phylo and Matching Objects
Description
These functions convert objects between the classes"phylo" and"matching".
Usage
as.matching(x, ...)## S3 method for class 'phylo'as.matching(x, labels = TRUE, ...)## S3 method for class 'matching'as.phylo(x, ...)Arguments
x | an object to convert as an object of class |
labels | a logical specifying whether the tip and node labelsshould be included in the returned matching. |
... | further arguments to be passed to or from other methods. |
Details
A matching is a representation where each tip and each node are givena number, and sibling groups are grouped in a “matching pair” (seeDiaconis and Holmes 1998, for details). This coding system can be usedonly for binary (fully dichotomous) trees.
Diaconis and Holmes (1998) gave some conventions to insure that agiven tree has a unique representation as a matching. I have tried tofollow them in the present functions.
Value
as.matching returns an object of class"matching" withthe following component:
matching | a two-column numeric matrix where the columnsrepresent the sibling pairs. |
tip.label | (optional) a character vector giving the tip labelswhere the ith element is the label of the tip numbered i in |
node.label | (optional) a character vector giving the nodelabels in the same order than in |
as.phylo.matching returns an object of class"phylo".
Note
Branch lengths are not supported in the present version.
Author(s)
Emmanuel Paradis
References
Diaconis, P. W. and Holmes, S. P. (1998) Matchings and phylogenetictrees.Proceedings of the National Academy of Sciences USA,95, 14600–14602.
See Also
Examples
data(bird.orders)m <- as.matching(bird.orders)str(m)mtr <- as.phylo(m)all.equal(tr, bird.orders, use.edge.length = FALSE)Conversion Among Tree and Network Objects
Description
as.phylo is a generic function which converts an object into atree of class"phylo". There are currently two methods forobjects of class"hclust" and of class"phylog"(implemented in the packageade4). The default method is for anyobject inheriting the class"phylo" which is returned unchanged.
as.hclust.phylo is a method of the genericas.hclust which converts an object of class"phylo" into one of class"hclust". This can used toconvert an object of class"phylo" into one of class"dendrogram" (see examples).
as.network andas.igraph convert trees of class"phylo" into these respective classes defined in the packagesof the same names (where the generics are defined).
old2new.phylo andnew2old.phylo are utility functionsfor converting between the old and new coding of the class"phylo".
Usage
as.phylo(x, ...)## Default S3 method:as.phylo(x, ...)## S3 method for class 'hclust'as.phylo(x, ...)## S3 method for class 'phylog'as.phylo(x, ...)## S3 method for class 'phylo'as.hclust(x, ...)old2new.phylo(phy)new2old.phylo(phy)## S3 method for class 'phylo'as.network(x, directed = is.rooted(x), ...)## S3 method for class 'phylo'as.igraph(x, directed = is.rooted(x), use.labels = TRUE, ...)Arguments
x | an object to be converted into another class. |
directed | a logical value: should the network be directed? Bydefault, this depends on whether the tree is rooted or not. |
use.labels | a logical specifying whether to use labels to buildthe network of class |
... | further arguments to be passed to or from other methods. |
phy | an object of class |
Value
An object of class"hclust","phylo","network",or"igraph".
Note
In an object of class"hclust", theheight gives thedistance between the two sets that are being agglomerated. So thesedistances are divided by two when setting the branch lengths of aphylogenetic tree.
Author(s)
Emmanuel Paradis
See Also
hclust,as.hclust,dendrogram,as.phylo.formula
Examples
data(bird.orders)hc <- as.hclust(bird.orders)tr <- as.phylo(hc)all.equal(bird.orders, tr) # TRUE### shows the three plots for tree objects:dend <- as.dendrogram(hc)layout(matrix(c(1:3, 3), 2, 2))plot(bird.orders, font = 1)plot(hc)par(mar = c(8, 0, 0, 0)) # leave space for the labelsplot(dend)### how to get identical plots with### plot.phylo and plot.dendrogram:layout(matrix(1:2, 2, 1))plot(bird.orders, font = 1, no.margin = TRUE, label.offset = 0.4)par(mar = c(0, 0, 0, 8))plot(dend, horiz = TRUE)layout(1)## Not run: ### convert into networks:if (require(network)) { x <- as.network(rtree(10)) print(x) plot(x, vertex.cex = 1:4) plot(x, displaylabels = TRUE)}tr <- rtree(5)if (require(igraph)) { print((x <- as.igraph(tr))) plot(x) print(as.igraph(tr, TRUE, FALSE)) print(as.igraph(tr, FALSE, FALSE))}## End(Not run)Conversion from Taxonomy Variables to Phylogenetic Trees
Description
The functionas.phylo.formula (short formas.phylo)builds a phylogenetic tree (an object of classphylo) froma set of nested taxonomic variables.
Usage
## S3 method for class 'formula'as.phylo(x, data = parent.frame(), collapse = TRUE, ...)Arguments
x | a right-side formula describing the taxonomic relationship: |
data | the data.frame where to look for the variables (defaultto user's workspace). |
collapse | a logical value specifying whether to collapse singlenodes in the returned tree (see details). |
... | further arguments to be passed from other methods. |
Details
Taxonomic variables must be nested and passed in the correct order:the higher clade must be on the left of the formula, for instance~Order/Family/Genus/Species. In most cases, the resulting treewill be unresolved and will contain polytomies.
The optioncollapse = FALSE has for effect to add single nodesin the tree when a given higher level has only one element in thelevel below (e.g., a monospecific genus); see the example below.
Value
an object of class"phylo".
Author(s)
Julien Dutheildutheil@evolbio.mpg.de, Eric Marcon andKlaus Schliep
See Also
as.phylo,read.tree for adescription of"phylo" objects,multi2di
Examples
data(carnivora)frm <- ~SuperFamily/Family/Genus/Speciestr <- as.phylo(frm, data = carnivora, collapse=FALSE)tr$edge.length <- rep(1, nrow(tr$edge))plot(tr, show.node.label=TRUE)Nnode(tr)## compare with:Nnode(as.phylo(frm, data = carnivora, collapse = FALSE))Axis on Side of Phylogeny
Description
This function adds a scaled axis on the side of a phylogeny plot.
Usage
axisPhylo(side = NULL, root.time = NULL, backward = TRUE, ...)Arguments
side | a numeric value specifying the side where the axis isplotted: 1: below, 2: left, 3: above, 4: right. By default, this is taken from the direction of the plot. |
root.time | the time assigned to the root node of the tree. Bydefault, this is taken from the |
backward | a logical value; if TRUE, the most distant tip fromthe root is considered as the origin of the time scale; if FALSE,this is the root node. |
... | further arguments to be passed to |
Details
The further arguments (...) are used to format the axis. Theymay befont,cex,col,las, and so on (seethe help pages onaxis andpar).
Author(s)
Emmanuel Paradis, Klaus Schliep
See Also
plot.phylo,add.scale.bar,axis,par
Examples
tr <- rtree(30)ch <- rcoal(30)plot(ch)axisPhylo()plot(tr, "c", FALSE, direction = "u")axisPhylo(las = 1)Balance of a Dichotomous Phylogenetic Tree
Description
This function computes the balance of a phylogenetic tree, that is foreach node of the tree the numbers of descendants (i.e. tips) on eachof its daughter-branch. The tree must be fully dichotomous.
Usage
balance(phy)Arguments
phy | an object of class |
Value
a numeric matrix with two columns and one row for each node of thetree. The columns give the numbers of descendants on eachdaughter-branches (the order of both columns being arbitrary). If thephylogenyphy has an elementnode.label, this is used asrownames for the returned matrix; otherwise the numbers (of modecharacter) of the matrixedge ofphy are used as rownames.
Author(s)
Emmanuel Paradis
References
Aldous, D. J. (2001) Stochastic models and descriptive statistics forphylogenetic trees, from Yule to today.Statistical Science,16, 23–34.
Base frequencies from DNA Sequences
Description
base.freq computes the frequencies (absolute or relative) ofthe four DNA bases (adenine, cytosine, guanine, and thymidine) from asample of sequences.
GC.content computes the proportion of G+C (using the previousfunction). All missing or unknown sites are ignored.
Ftab computes the contingency table with the absolutefrequencies of the DNA bases from a pair of sequences.
Usage
base.freq(x, freq = FALSE, all = FALSE)GC.content(x)Ftab(x, y = NULL)Arguments
x | a vector, a matrix, or a list which contains the DNAsequences. |
y | a vector with a single DNA sequence. |
freq | a logical specifying whether to return the proportions(the default) or the absolute frequencies (counts). |
all | a logical; by default only the counts of A, C, G, and T arereturned. If |
Details
The base frequencies are computed over all sequences in thesample.
ForFtab, if the argumenty is given then bothxandy are coerced as vectors and must be of equal length. Ify is not given,x must be a matrix or a list and onlythe two first sequences are used.
Value
A numeric vector with namesc("a", "c", "g", "t") (and possibly"r", "m", ..., a single numeric value, or a four by four matrixwith similar dimnames.
Author(s)
Emmanuel Paradis
See Also
seg.sites,nuc.div (inpegas),DNAbin
Examples
data(woodmouse)base.freq(woodmouse)base.freq(woodmouse, TRUE)base.freq(woodmouse, TRUE, TRUE)GC.content(woodmouse)Ftab(woodmouse)Ftab(woodmouse[1, ], woodmouse[2, ]) # same than aboveFtab(woodmouse[14:15, ]) # between the last twoExtended Version of the Birth-Death Models to Estimate Speciationand Extinction Rates
Description
This function fits by maximum likelihood a birth-death model to thecombined phylogenetic and taxonomic data of a given clade. Thephylogenetic data are given by a tree, and the taxonomic data by thenumber of species for the its tips.
Usage
bd.ext(phy, S, conditional = TRUE)Arguments
phy | an object of class |
S | a numeric vector giving the number of species for each tip. |
conditional | whether probabilities should be conditioned on noextinction (mainly to compare results with previous analyses; seedetails). |
Details
A re-parametrization of the birth-death model studied by Kendall(1948) so that the likelihood has to be maximized overd/b andb - d, whereb is the birth rate, andd the deathrate.
The standard-errors of the estimated parameters are computed using anormal approximation of the maximum likelihood estimates.
If the argumentS has names, then they are matched to the tiplabels ofphy. The user must be careful here since the functionrequires that both series of names perfectly match, so this operationmay fail if there is a typing or syntax error. If both series of namesdo not match, the valuesS are taken to be in the same orderthan the tip labels ofphy, and a warning message is issued.
Note that the function does not check that the tree is effectivelyultrametric, so if it is not, the returned result may not bemeaningful.
Ifconditional = TRUE, the probabilities of the taxonomic dataare calculated conditioned on no extinction (Rabosky et al. 2007). Inprevious versions of the present function (until ape 2.6-1),unconditional probabilities were used resulting in underestimatedextinction rate. Though it does not make much sense to useconditional = FALSE, this option is provided to compare resultsfrom previous analyses: if the species richnesses are relatively low,both versions will give similar results (see examples).
Author(s)
Emmanuel Paradis
References
Paradis, E. (2003) Analysis of diversification: combining phylogeneticand taxonomic data.Proceedings of the Royal Society ofLondon. Series B. Biological Sciences,270, 2499–2505.
Rabosky, D. L., Donnellan, S. C., Talaba, A. L. and Lovette,I. J. (2007) Exceptional among-lineage variation in diversificationrates during the radiation of Australia's most diverse vertebrateclade.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,274, 2915–2923.
See Also
birthdeath,branching.times,diversi.gof,diversi.time,ltt.plot,yule,yule.cov,bd.time
Examples
### An example from Paradis (2003) using the avian orders:data(bird.orders)### Number of species in each order from Sibley and Monroe (1990):S <- c(10, 47, 69, 214, 161, 17, 355, 51, 56, 10, 39, 152, 6, 143, 358, 103, 319, 23, 291, 313, 196, 1027, 5712)bd.ext(bird.orders, S)bd.ext(bird.orders, S, FALSE) # same than older versionsTime-Dependent Birth-Death Models
Description
This function fits a used-defined time-dependent birth-deathmodel.
Usage
bd.time(phy, birth, death, BIRTH = NULL, DEATH = NULL, ip, lower, upper, fast = FALSE, boot = 0, trace = 0)Arguments
phy | an object of class |
birth | either a numeric (if speciation rate is assumedconstant), or a (vectorized) function specifying how the birth(speciation) probability changes through time (see details). |
death | id. for extinction probability. |
BIRTH | (optional) a vectorized function giving the primitiveof |
DEATH | id. for |
ip | a numeric vector used as initial values for the estimationprocedure. If missing, these values are guessed. |
lower,upper | the lower and upper bounds of the parameters. Ifmissing, these values are guessed too. |
fast | a logical value specifying whether to use fasterintegration (see details). |
boot | the number of bootstrap replicates to assess theconfidence intervals of the parameters. Not run by default. |
trace | an integer value. If non-zero, the fitting procedure isprinted every |
Details
Details on how to specify the birth and death functions and theirprimitives can be found in the help page ofyule.time.
The model is fitted by minimizing the least squares deviation betweenthe observed and the predicted distributions of branching times. Thesecomputations rely heavily on numerical integrations. Iffast = FALSE, integrations are done with R'sintegratefunction. Iffast = TRUE, a faster but less accurate functionprovided inape is used. If fitting a complex model to a largephylogeny, a strategy might be to first use the latter option, andthen to use the estimates as starting values withfast = FALSE.
Value
A list with the following components:
par: a vector of estimates with names taken from the parametersin the specified functions.
SS: the minimized sum of squares.
convergence: output convergence criterion from
nlminb.message: id.
iterations: id.
evaluations: id.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2011) Time-dependent speciation and extinction fromphylogenies: a least squares approach.Evolution,65,661–672.
See Also
ltt.plot,birthdeath,yule.time,LTT
Examples
set.seed(3)tr <- rbdtree(0.1, 0.02)bd.time(tr, 0, 0) # fits a simple BD modelbd.time(tr, 0, 0, ip = c(.1, .01)) # 'ip' is useful here## the classic logistic:birth.logis <- function(a, b) 1/(1 + exp(-a*t - b))## Not run: bd.time(tr, birth.logis, 0, ip = c(0, -2, 0.01))## slow to get:## $par## a b death## -0.003486961 -1.995983179 0.016496454#### $SS## [1] 20.73023## End(Not run)Phylogenetic Generalized Linear Mixed Model for Binary Data
Description
binaryPGLMM performs linear regression for binary phylogenetic data, estimating regression coefficients with approximate standard errors. It simultaneously estimates the strength of phylogenetic signal in the residuals and gives an approximate conditional likelihood ratio test for the hypothesis that there is no signal. Therefore, when applied without predictor (independent) variables, it gives a test for phylogenetic signal for binary data. The method uses a GLMM approach, alternating between penalized quasi-likelihood (PQL) to estimate the "mean components" and restricted maximum likelihood (REML) to estimate the "variance components" of the model.
binaryPGLMM.sim is a companion function that simulates binary phylogenetic data of the same structure analyzed by binaryPGLMM.
Usage
binaryPGLMM(formula, data = list(), phy, s2.init = 0.1, B.init = NULL, tol.pql = 10^-6, maxit.pql = 200, maxit.reml = 100)binaryPGLMM.sim(formula, data = list(), phy, s2 = NULL, B = NULL, nrep = 1)## S3 method for class 'binaryPGLMM'print(x, digits = max(3, getOption("digits") - 3), ...)Arguments
formula | a two-sided linear formula object describing thefixed-effects of the model; for example, Y ~ X. |
data | a data frame containing the variables named in formula. |
phy | a phylogenetic tree as an object of class "phylo". |
s2.init | an initial estimate of s2, the scaling component of thevariance in the PGLMM. A value of s2 = 0 implies no phylogeneticsignal. Note that the variance-covariance matrix given by thephylogeny phy is scaled to have determinant = 1. |
B.init | initial estimates of B, the matrix containing regressioncoefficients in the model. This matrix must havedim(B.init)=c(p+1,1), where p is the number of predictor(independent) variables; the first element of B corresponds to theintercept, and the remaining elements correspond in order to thepredictor (independent) variables in the model. |
tol.pql | a control parameter dictating the tolerance forconvergence for the PQL optimization. |
maxit.pql | a control parameter dictating the maximum number ofiterations for the PQL optimization. |
maxit.reml | a control parameter dictating the maximum number ofiterations for the REML optimization. |
x | an object of class "binaryPGLMM". |
s2 | in binaryPGLMM.sim, value of s2. See s2.init. |
B | in binaryPGLMM.sim, value of B, the matrix containing regression coefficients in the model. See B.init. |
nrep | in binaryPGLMM.sim, number of compete data sets produced. |
digits | the number of digits to print. |
... | further arguments passed to |
Details
The function estimates parameters for the model
Pr(Y = 1) = q
q = inverse.logit(b0 + b1 * x1 + b2 * x2 + \dots + \epsilon)
\epsilon ~ Gaussian(0, s2 * V)
whereV is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution). Although mathematically there is no requirement forV to be ultrametric, forcingV into ultrametric form can aide in the interpretation of the model, because in regression for binary dependent variables, only the off-diagonal elements (i.e., covariances) of matrixV are biologically meaningful (see Ives & Garland 2014).
The function converts a phylo tree object into a variance-covariance matrix, and further standardizes this matrix to have determinant = 1. This in effect standardizes the interpretation of the scalar s2. Although mathematically not required, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).
The estimation method alternates between PQL to obtain estimates of the mean components of the model (this is the standard approach to estimating GLMs) and REML to obtain estimates of the variance components. This method gives relatively fast and robust estimation. Nonetheless, the estimates of the coefficients B will generally be upwards bias, as is typical of estimation for binary data. The standard errors of B are computed from the PQL results conditional on the estimate of s2 and therefore should tend to be too small. The function returns an approximate P-value for the hypothesis of no phylogenetic signal in the residuals (i.e., H0:s2 = 0) using an approximate likelihood ratio test based on the conditional REML likelihood (rather than the marginal likelihood). Simulations have shown that these P-values tend to be high (giving type II errors: failing to identify variances that in fact are statistically significantly different from zero).
It is a good idea to confirm statistical inferences using parametric bootstrapping, and the companion function binaryPGLMM.sim gives a simply tool for this. See Examples below.
Value
An object of class "binaryPGLMM".
formula | formula specifying the regression model. |
B | estimates of the regression coefficients. |
B.se | approximate PQL standard errors of the regressioncoefficients. |
B.cov | approximate PQL covariance matrix for the regressioncoefficients. |
B.zscore | approximate PQL Z scores for the regressioncoefficients. |
B.pvalue | approximate PQL tests for the regression coefficientsbeing different from zero. |
s2 | phylogenetic signal measured as the scalar magnitude of thephylogenetic variance-covariance matrix s2 * V. |
P.H0.s2 | approximate likelihood ratio test of the hypothesis H0that s2 = 0. This test is based on the conditional REML (keeping theregression coefficients fixed) and is prone to inflated type 1 errors. |
mu | for each data point y, the estimate of p that y = 1. |
b | for each data point y, the estimate of inverse.logit(p). |
X | the predictor (independent) variables returned in matrix form(including 1s in the first column). |
H | residuals of the form b + (Y - mu)/(mu * (1 - mu)). |
B.init | the user-provided initial estimates of B. If B.init isnot provided, these are estimated using glm() assuming no phylogeneticsignal. The glm() estimates can generate convergence problems, sousing small values (e.g., 0.01) is more robust but slower. |
VCV | the standardized phylogenetic variance-covariance matrix. |
V | estimate of the covariance matrix of H. |
convergeflag | flag for cases when convergence failed. |
iteration | number of total iterations performed. |
converge.test.B | final tolerance for B. |
converge.test.s2 | final tolerance for s2. |
rcondflag | number of times B is reset to 0.01. This is done whenrcond(V) < 10^(-10), which implies that V cannot be inverted. |
Y | in binaryPGLMM.sim, the simulated values of Y. |
Author(s)
Anthony R. Ives
References
Ives, A. R. and Helmus, M. R. (2011) Generalized linear mixed modelsfor phylogenetic analyses of community structure.EcologicalMonographs,81, 511–525.
Ives, A. R. and Garland, T., Jr. (2014) Phylogenetic regression forbinary dependent variables. Pages 231–261in L. Z. Garamszegi,editor.Modern Phylogenetic Comparative Methods and TheirApplication in Evolutionary Biology. Springer-Verlag, BerlinHeidelberg.
See Also
packagepez and its functioncommunityPGLMM;packagephylolm and its functionphyloglm;packageMCMCglmm
Examples
## Illustration of binaryPGLMM() with simulated data# Generate random phylogenyn <- 100phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1)# Generate random data and standardize to have mean 0 and variance 1X1 <- rTraitCont(phy, model = "BM", sigma = 1)X1 <- (X1 - mean(X1))/var(X1)# Simulate binary Ysim.dat <- data.frame(Y=array(0, dim=n), X1=X1, row.names=phy$tip.label)sim.dat$Y <- binaryPGLMM.sim(Y ~ X1, phy=phy, data=sim.dat, s2=.5, B=matrix(c(0,.25),nrow=2,ncol=1), nrep=1)$Y# Fit modelbinaryPGLMM(Y ~ X1, phy=phy, data=sim.dat)## Not run: # Compare with phyloglm()library(phylolm)summary(phyloglm(Y ~ X1, phy=phy, data=sim.dat))# Compare with glm() that does not account for phylogenysummary(glm(Y ~ X1, data=sim.dat, family="binomial"))# Compare with logistf() that does not account# for phylogeny but is less biased than glm()library(logistf)logistf(Y ~ X1, data=sim.dat)# Compare with MCMCglmmlibrary(MCMCglmm)V <- vcv(phy)V <- V/max(V)detV <- exp(determinant(V)$modulus[1])V <- V/detV^(1/n)invV <- Matrix(solve(V),sparse=T)sim.dat$species <- phy$tip.labelrownames(invV) <- sim.dat$speciesnitt <- 43000thin <- 10burnin <- 3000prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)))summary(MCMCglmm(Y ~ X1, random=~species, ginvers=list(species=invV), data=sim.dat, slice=TRUE, nitt=nitt, thin=thin, burnin=burnin, family="categorical", prior=prior, verbose=FALSE))## Examine bias in estimates of B1 and s2 from binaryPGLMM with# simulated data. Note that this will take a while.Reps = 1000s2 <- 0.4B1 <- 1meanEsts <- data.frame(n = Inf, B1 = B1, s2 = s2, Pr.s2 = 1, propconverged = 1)for (n in c(160, 80, 40, 20)) { meanEsts.n <- data.frame(B1 = 0, s2 = 0, Pr.s2 = 0, convergefailure = 0) for (rep in 1:Reps) { phy <- compute.brlen(rtree(n = n), method = "Grafen", power = 1) X <- rTraitCont(phy, model = "BM", sigma = 1) X <- (X - mean(X))/var(X) sim.dat <- data.frame(Y = array(0, dim = n), X = X, row.names = phy$tip.label) sim <- binaryPGLMM.sim(Y ~ 1 + X, phy = phy, data = sim.dat, s2 = s2, B = matrix(c(0,B1), nrow = 2, ncol = 1), nrep = 1) sim.dat$Y <- sim$Y z <- binaryPGLMM(Y ~ 1 + X, phy = phy, data = sim.dat) meanEsts.n[rep, ] <- c(z$B[2], z$s2, z$P.H0.s2, z$convergeflag == "converged") }converged <- meanEsts.n[,4]meanEsts <- rbind(meanEsts, c(n, mean(meanEsts.n[converged==1,1]), mean(meanEsts.n[converged==1,2]), mean(meanEsts.n[converged==1, 3] < 0.05), mean(converged)))}meanEsts# Results output for B1 = 0.5, s2 = 0.4; n-Inf gives the values used to# simulate the data# n B1 s2 Pr.s2 propconverged# 1 Inf 1.000000 0.4000000 1.00000000 1.000# 2 160 1.012719 0.4479946 0.36153072 0.993# 3 80 1.030876 0.5992027 0.24623116 0.995# 4 40 1.110201 0.7425203 0.13373860 0.987# 5 20 1.249886 0.8774708 0.05727377 0.873## Examine type I errors for estimates of B0 and s2 from binaryPGLMM()# with simulated data. Note that this will take a while.Reps = 1000s2 <- 0B0 <- 0B1 <- 0H0.tests <- data.frame(n = Inf, B0 = B0, s2 = s2, Pr.B0 = .05, Pr.s2 = .05, propconverged = 1)for (n in c(160, 80, 40, 20)) { ests.n <- data.frame(B1 = 0, s2 = 0, Pr.B0 = 0, Pr.s2 = 0, convergefailure = 0) for (rep in 1:Reps) { phy <- compute.brlen(rtree(n = n), method = "Grafen", power = 1) X <- rTraitCont(phy, model = "BM", sigma = 1) X <- (X - mean(X))/var(X) sim.dat <- data.frame(Y = array(0, dim = n), X = X, row.names = phy$tip.label) sim <- binaryPGLMM.sim(Y ~ 1, phy = phy, data = sim.dat, s2 = s2, B = matrix(B0, nrow = 1, ncol = 1), nrep = 1) sim.dat$Y <- sim$Y z <- binaryPGLMM(Y ~ 1, phy = phy, data = sim.dat) ests.n[rep, ] <- c(z$B[1], z$s2, z$B.pvalue, z$P.H0.s2, z$convergeflag == "converged") }converged <- ests.n[,5]H0.tests <- rbind(H0.tests, c(n, mean(ests.n[converged==1,1]), mean(ests.n[converged==1,2]), mean(ests.n[converged==1, 3] < 0.05), mean(ests.n[converged==1, 4] < 0.05), mean(converged)))}H0.tests# Results for type I errors for B0 = 0 and s2 = 0; n-Inf gives the values# used to simulate the data. These results show that binaryPGLMM() tends to# have lower-than-nominal p-values; fewer than 0.05 of the simulated# data sets have H0:B0=0 and H0:s2=0 rejected at the alpha=0.05 level.# n B0 s2 Pr.B0 Pr.s2 propconverged# 1 Inf 0.0000000000 0.00000000 0.05000000 0.05000000 1.000# 2 160 -0.0009350357 0.07273163 0.02802803 0.04804805 0.999# 3 80 -0.0085831477 0.12205876 0.04004004 0.03403403 0.999# 4 40 0.0019303847 0.25486307 0.02206620 0.03711133 0.997# 5 20 0.0181394905 0.45949266 0.02811245 0.03313253 0.996## End(Not run)Binds Trees
Description
This function binds together two phylogenetic trees to give a singleobject of class"phylo".
Usage
bind.tree(x, y, where = "root", position = 0, interactive = FALSE)x + yArguments
x | an object of class |
y | an object of class |
where | an integer giving the number of the node or tip of thetree |
position | a numeric value giving the position from the tip ornode given by |
interactive | if |
Details
The argumentx can be seen as the receptor tree, whereasy is the donor tree. The root ofy is then grafted on alocation ofx specified bywhere and, possibly,position. Ify has a root edge, this is added as ininternal branch in the resulting tree.
x + y is a shortcut for:
bind.tree(x, y, position = if (is.null(x$root.edge)) 0 else x$root.edge)
If only one of the trees has no branch length, the branch lengths ofthe other one are ignored with a warning.
If one (or both) of the trees has no branch length, it is possible tospecify a value of 'position' to graft 'y' below the node of 'x'specified by 'where'. In this case, the exact value of 'position' isnot important as long as it is greater than zero. The new node will bemultichotomous if 'y' has no root edge. This can be solved by givingan arbitrary root edge to 'y' beforehand (e.g.,y$root.edge <- 1): it will be deleted during the binding operation.
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis
See Also
Examples
### binds the two clades of bird orderstreefile1 <- tempfile("tree", fileext = ".tre")treefile2 <- tempfile("tree", fileext = ".tre")cat("((Struthioniformes:21.8,Tinamiformes:21.8):4.1,", "((Craciformes:21.6,Galliformes:21.6):1.3,Anseriformes:22.9):3.0):2.1;", file = treefile1, sep = "\n")cat("(Turniciformes:27.0,(Piciformes:26.3,((Galbuliformes:24.4,", "((Bucerotiformes:20.8,Upupiformes:20.8):2.6,", "(Trogoniformes:22.1,Coraciiformes:22.1):1.3):1.0):0.6,", "(Coliiformes:24.5,(Cuculiformes:23.7,(Psittaciformes:23.1,", "(((Apodiformes:21.3,Trochiliformes:21.3):0.6,", "(Musophagiformes:20.4,Strigiformes:20.4):1.5):0.6,", "((Columbiformes:20.8,(Gruiformes:20.1,Ciconiiformes:20.1):0.7):0.8,", "Passeriformes:21.6):0.9):0.6):0.6):0.8):0.5):1.3):0.7):1.0;", file = treefile2, sep = "\n")tree.bird1 <- read.tree(treefile1)tree.bird2 <- read.tree(treefile2)unlink(c(treefile1, treefile2)) # clean-up(birds <- tree.bird1 + tree.bird2)layout(matrix(c(1, 2, 3, 3), 2, 2))plot(tree.bird1)plot(tree.bird2)plot(birds)### examples with random treesx <- rtree(4, tip.label = LETTERS[1:4])y <- rtree(4, tip.label = LETTERS[5:8])x <- makeNodeLabel(x, prefix = "x_")y <- makeNodeLabel(y, prefix = "y_")x$root.edge <- y$root.edge <- .2z <- bind.tree(x, y, po=.2)plot(y, show.node.label = TRUE, font = 1, root.edge = TRUE)title("y")plot(x, show.node.label = TRUE, font = 1, root.edge = TRUE)title("x")plot(z, show.node.label = TRUE, font = 1, root.edge = TRUE)title("z <- bind.tree(x, y, po=.2)")## make sure the terminal branch length is long enough:x$edge.length[x$edge[, 2] == 2] <- 0.2z <- bind.tree(x, y, 2, .1)plot(y, show.node.label = TRUE, font = 1, root.edge = TRUE)title("y")plot(x, show.node.label = TRUE, font = 1, root.edge = TRUE)title("x")plot(z, show.node.label = TRUE, font = 1, root.edge = TRUE)title("z <- bind.tree(x, y, 2, .1)")x <- rtree(50)y <- rtree(50)x$root.edge <- y$root.edge <- .2z <- x + yplot(y, show.tip.label = FALSE, root.edge = TRUE); axisPhylo()title("y")plot(x, show.tip.label = FALSE, root.edge = TRUE); axisPhylo()title("x")plot(z, show.tip.label = FALSE, root.edge = TRUE); axisPhylo()title("z <- x + y")layout(1)Phylogeny of the Families of Birds From Sibley and Ahlquist
Description
This data set describes the phylogenetic relationships of the familiesof birds as reported by Sibley and Ahlquist (1990). Sibley andAhlquist inferred this phylogeny from an extensive number of DNA/DNAhybridization experiments. The “tapestry” reported by these twoauthors (more than 1000 species out of the ca. 9000 extant birdspecies) generated a lot of debates.
The present tree is based on the relationships among families. A fewfamilies were not included in the figures in Sibley and Ahlquist, andthus are not included here as well. The branch lengths were calculatedfrom the values of\Delta T_{50}H as found in Sibleyand Ahlquist (1990, figs. 354, 355, 356, and 369).
Usage
data(bird.families)Format
The data are stored as an object of class"phylo" whichstructure is described in the help page of the functionread.tree.
Source
Sibley, C. G. and Ahlquist, J. E. (1990) Phylogeny and classificationof birds: a study in molecular evolution. New Haven: Yale University Press.
See Also
Examples
data(bird.families)op <- par(cex = 0.3)plot(bird.families)par(op)Phylogeny of the Orders of Birds From Sibley and Ahlquist
Description
This data set describes the phylogenetic relationships of the ordersof birds as reported by Sibley and Ahlquist (1990). Sibley andAhlquist inferred this phylogeny from an extensive number of DNA/DNAhybridization experiments. The “tapestry” reported by these twoauthors (more than 1000 species out of the ca. 9000 extant birdspecies) generated a lot of debates.
The present tree is based on the relationships among orders. Thebranch lengths were calculated from the values of\Delta T_{50}H as found in Sibley and Ahlquist (1990,fig. 353).
Usage
data(bird.orders)Format
The data are stored as an object of class"phylo" whichstructure is described in the help page of the functionread.tree.
Source
Sibley, C. G. and Ahlquist, J. E. (1990) Phylogeny and classificationof birds: a study in molecular evolution. New Haven: Yale University Press.
See Also
Examples
data(bird.orders)plot(bird.orders)Estimation of Speciation and Extinction Rates With Birth-Death Models
Description
This function fits by maximum likelihood a birth-death model to thebranching times computed from a phylogenetic tree using the method ofNee et al. (1994).
Usage
birthdeath(phy)## S3 method for class 'birthdeath'print(x, ...)Arguments
phy | an object of class |
x | an object of class |
... | further arguments passed to the |
Details
Nee et al. (1994) used a re-parametrization of the birth-death modelstudied by Kendall (1948) so that the likelihood has to be maximizedoverd/b andb - d, whereb is the birth rate,andd the death rate. This is the approach used by the presentfunction.
This function computes the standard-errors of the estimated parametersusing a normal approximations of the maximum likelihood estimates:this is likely to be inaccurate because of asymmetries of thelikelihood function (Nee et al. 1995). In addition, 95 intervals of both parameters are computed using profile likelihood:they are particularly useful if the estimate ofd/b is at theboundary of the parameter space (i.e. 0, which is often the case).
Note that the function does not check that the tree is effectivelyultrametric, so if it is not, the returned result may not be meaningful.
Value
An object of class"birthdeath" which is a list with thefollowing components:
tree | the name of the tree analysed. |
N | the number of species. |
dev | the deviance (= -2 log lik) at its minimum. |
para | the estimated parameters. |
se | the corresponding standard-errors. |
CI | the 95% profile-likelihood confidence intervals. |
Author(s)
Emmanuel Paradis
References
Kendall, D. G. (1948) On the generalized “birth-and-death”process.Annals of Mathematical Statistics,19, 1–15.
Nee, S., May, R. M. and Harvey, P. H. (1994) The reconstructedevolutionary process.Philosophical Transactions of the RoyalSociety of London. Series B. Biological Sciences,344, 305–311.
Nee, S., Holmes, E. C., May, R. M. and Harvey, P. H. (1995) Estimatingextinctions from molecular phylogenies. inExtinction Rates,eds. Lawton, J. H. and May, R. M., pp. 164–182, Oxford University Press.
See Also
branching.times,diversi.gof,diversi.time,ltt.plot,yule,bd.ext,yule.cov,bd.time
Tree Bipartition and Bootstrapping Phylogenies
Description
These functions analyse bipartitions found in a series of trees.
prop.part counts the number of bipartitions found in a seriesof trees given as.... If a single tree is passed, thereturned object is a list of vectors with the tips descending fromeach node (i.e., clade compositions indexed by node number).
prop.clades counts the number of times the bipartitions presentinphy are present in a series of trees given as... orin the list previously computed and given withpart.
boot.phylo performs a bootstrap analysis.
Usage
boot.phylo(phy, x, FUN, B = 100, block = 1, trees = FALSE, quiet = FALSE, rooted = is.rooted(phy), jumble = TRUE, mc.cores = 1)prop.part(..., check.labels = TRUE)prop.clades(phy, ..., part = NULL, rooted = FALSE)## S3 method for class 'prop.part'print(x, ...)## S3 method for class 'prop.part'summary(object, ...)## S3 method for class 'prop.part'plot(x, barcol = "blue", leftmar = 4, col = "red", ...)Arguments
phy | an object of class |
x | in the case of |
FUN | the function used to estimate |
B | the number of bootstrap replicates. |
block | the number of columns in |
trees | a logical specifying whether to return the bootstrapedtrees ( |
quiet | a logical: a progress bar is displayed by default. |
rooted | a logical specifying whether the trees should be treatedas rooted or not. |
jumble | a logical value. By default, the rows of |
mc.cores | the number of cores (CPUs) to be used (passed toparallel). |
... | either (i) a single object of class |
check.labels | a logical specifying whether to check the labelsof each tree. If |
part | a list of partitions as returned by |
object | an object of class |
barcol | the colour used for the bars displaying the number ofpartitions in the upper panel. |
leftmar | the size of the margin on the left to display the tiplabels. |
col | the colour used to visualise the bipartitions. |
Details
The argumentFUN inboot.phylo must be the function usedto estimate the tree from the original data matrix. Thus, if the treewas estimated with neighbor-joining (seenj), one maybe wantssomething likeFUN = function(xx) nj(dist.dna(xx)).
block inboot.phylo specifies the number of columns tobe resampled altogether. For instance, if one wants to resample at thecodon-level, thenblock = 3 must be used.
Usingcheck.labels = FALSE inprop.part decreasescomputing times. This requires that (i) all trees have the same tiplabels,and (ii) these labels are ordered similarly in alltrees (in other words, the elementtip.label are identical inall trees).
The plot function represents a contingency table of the differentpartitions (on thex-axis) in the lower panel, and their observednumbers in the upper panel. Any further arguments (...) are used tochange the aspects of the points in the lower panel: these may bepch,col,bg,cex, etc. This functionworks only if there is an attributelabels in the object.
The print method displays the partitions and their numbers. Thesummary method extracts the numbers only.
Value
prop.part returns an object of class"prop.part" whichis a list with an attribute"number". The elements of this listare the observed clades, and the attribute their respectivenumbers. If the defaultcheck.labels = FALSE is used, anattribute"labels" is added, and the vectors of the returnedobject contains the indices of these labels instead of the labelsthemselves.
prop.clades andboot.phylo return a numeric vectorwhichith element is the number associated to theithnode ofphy. Iftrees = TRUE,boot.phylo returnsa list whose first element (named"BP") is like before, and thesecond element ("trees") is a list with the bootstrapedtrees.
summary returns a numeric vector.
Note
prop.clades calls internallyprop.part with the optioncheck.labels = TRUE, which may be very slow. If the treespassed as... fulfills conditions (i) and (ii) above, then itmight be faster to first call, e.g.,pp <- prop.part(...), thenuse the optionpart:prop.clades(phy, part = pp).
Sinceape 3.5,prop.clades should return sensible resultsfor all values ofrooted: ifFALSE, the numbers ofbipartitions (or splits); ifTRUE, the number of clades (ofhopefully rooted trees).
Author(s)
Emmanuel Paradis
References
Efron, B., Halloran, E. and Holmes, S. (1996) Bootstrap confidencelevels for phylogenetic trees.Proceedings of the NationalAcademy of Sciences USA,93, 13429–13434.
Felsenstein, J. (1985) Confidence limits on phylogenies: an approachusing the bootstrap.Evolution,39, 783–791.
See Also
as.bitsplits,dist.topo,consensus,nodelabels
Examples
data(woodmouse)f <- function(x) nj(dist.dna(x))tr <- f(woodmouse)### Are bootstrap values stable?for (i in 1:5) print(boot.phylo(tr, woodmouse, f, quiet = TRUE))### How many partitions in 100 random trees of 10 labels?...TR <- rmtree(100, 10)pp10 <- prop.part(TR)length(pp10)### ... and in 100 random trees of 20 labels?TR <- rmtree(100, 20)pp20 <- prop.part(TR)length(pp20)plot(pp10, pch = "x", col = 2)plot(pp20, pch = "x", col = 2)set.seed(2)tr <- rtree(10) # rooted## the following used to return a wrong result with ape <= 3.4:prop.clades(tr, tr)prop.clades(tr, tr, rooted = TRUE)tr <- rtree(10, rooted = FALSE)prop.clades(tr, tr) # correct### an illustration of the use of prop.clades with bootstrap trees:fun <- function(x) as.phylo(hclust(dist.dna(x), "average")) # upgma() in phangorntree <- fun(woodmouse)## get 100 bootstrap trees:bstrees <- boot.phylo(tree, woodmouse, fun, trees = TRUE)$trees## get proportions of each clade:clad <- prop.clades(tree, bstrees, rooted = TRUE)## get proportions of each bipartition:boot <- prop.clades(tree, bstrees)layout(1)par(mar = rep(2, 4))plot(tree, main = "Bipartition vs. Clade Support Values")drawSupportOnEdges(boot)nodelabels(clad)legend("bottomleft", legend = c("Bipartitions", "Clades"), pch = 22, pt.bg = c("green", "lightblue"), pt.cex = 2.5)## Not run: ## an example of double bootstrap:nrep1 <- 100nrep2 <- 100p <- ncol(woodmouse)DB <- 0for (b in 1:nrep1) { X <- woodmouse[, sample(p, p, TRUE)] DB <- DB + boot.phylo(tr, X, f, nrep2, quiet = TRUE)}DB## to compare with:boot.phylo(tr, woodmouse, f, 1e4)## End(Not run)Branching Times of a Phylogenetic Tree
Description
This function computes the branching times of a phylogenetic tree,that is the distance from each node to the tips, under the assumption thatthe tree is ultrametric. Note that the function does not check that thetree is effectively ultrametric, so if it is not, the returned resultmay not be meaningful.
Usage
branching.times(phy)Arguments
phy | an object of class |
Value
a numeric vector with the branching times. If the phylogenyphyhas an elementnode.label, this is used as names for thereturned vector; otherwise the numbers (of mode character) of thematrixedge ofphy are used as names.
Author(s)
Emmanuel Paradis
See Also
Building Lists of Trees
Description
These functions help to build lists of trees of class"multiPhylo".
Usage
## S3 method for class 'phylo'c(..., recursive = TRUE)## S3 method for class 'multiPhylo'c(..., recursive = TRUE).compressTipLabel(x, ref = NULL).uncompressTipLabel(x)Arguments
... | one or several objects of class |
recursive | see details. |
x | an object of class |
ref | an optional vector of mode character to constrain the orderof the tips. By default, the order from the first tree is used. |
Details
Thesec methods check all the arguments, and return by defaulta list of single trees unless some objects are not trees or lists oftrees, in which caserecursive is switched to FALSE and awarning message is given. Ifrecursive = FALSE, the objects aresimply concatenated into a list. Beforeape 4.0,recursivewas always set to FALSE.
.compressTipLabel transforms an object of class"multiPhylo" by checking that all trees have the same tiplabels and renumbering the tips in theedge matrix so that thetip numbers are also the same taking the first tree as the reference(duplicated labels are not allowed). The returned object has a uniquevector of tip labels (attr(x, "TipLabel")).
.uncompressTipLabel does the reverse operation.
Value
An object of class"multiPhylo".
Author(s)
Emmanuel Paradis
See Also
Examples
x <- c(rtree(4), rtree(2))xy <- c(rtree(4), rtree(4))z <- c(x, y)zprint(z, TRUE)try(.compressTipLabel(x)) # errora <- .compressTipLabel(y).uncompressTipLabel(a) # back to y## eventually compare str(a) and str(y)Carnivora body sizes and life history traits
Description
Dataset adapted from Gittleman (1986), including 2 morphological variables (body and brain sizes), 8 life history traits variables and 4 taxonomic variables.
Usage
data(carnivora)Format
A data frame with 112 observations on 17 variables.
| [,1] | Order | factor | Carnivora order |
| [,2] | SuperFamily | factor | Super family (Caniformia or Feliformia) |
| [,3] | Family | factor | Carnivora family |
| [,4] | Genus | factor | Carnivora genus |
| [,5] | Species | factor | Carnivora species |
| [,6] | FW | numeric | Female body weight (kg) |
| [,7] | SW | numeric | Average body weight of adult male and adult female (kg) |
| [,8] | FB | numeric | Female brain weight (g) |
| [,9] | SB | numeric | Average brain weight of adult male and adult female (g) |
| [,10] | LS | numeric | Litter size |
| [,11] | GL | numeric | Gestation length (days) |
| [,12] | BW | numeric | Birth weigth (g) |
| [,13] | WA | numeric | Weaning age (days) |
| [,14] | AI | numeric | Age of independance (days) |
| [,15] | LY | numeric | Longevity (months) |
| [,16] | AM | numeric | Age of sexual maturity (days) |
| [,17] | IB | numeric | Inter-birth interval (months) |
Source
Gittleman, J. L. (1986) Carnivore life history patterns: allometric,phylogenetic and ecological associations.American Naturalist,127: 744–771.
Examples
data(carnivora)## Fig. 1 in Gittleman (1986):plot(carnivora$BW ~ carnivora$FW, pch = (1:8)[carnivora$Family], log = "xy", xlab = "Female body weight (kg)", ylab = "Birth weigth (g)", ylim = c(1, 2000))legend("bottomright", legend = levels(carnivora$Family), pch = 1:8)plot(carnivora$BW ~ carnivora$FB, pch = (1:8)[carnivora$Family], log = "xy", xlab = "Female brain weight (g)", ylab = "Birth weigth (g)", ylim = c(1, 2000))legend("bottomright", legend = levels(carnivora$Family), pch = 1:8)Check DNA Alignments
Description
This function performs a series of diagnostics on a DNA alignement.
Usage
checkAlignment(x, check.gaps = TRUE, plot = TRUE, what = 1:4)Arguments
x | an object of class |
check.gaps | a logical value specifying whether to check thedistribution of alignment gaps. |
plot | a logical value specifying whether to do the plots. |
what | an integer value giving the plot to be done. By default,four plots are done on the same figure. |
Details
This function prints on the console a series of diagnostics on theset a aligned DNA sequences. If alignment gaps are present, theirwidth distribution is analysed, as well as the width of contiguousbase segments. The pattern of nucleotide diversity on each site isalso analysed, and a relevant table is printed.
Ifplot = TRUE, four plots are done: an image of thealignement, the distribution of gap widths (if present), the Shannonindex of nucleotide diversity along the sequence, and the number ofobserved bases along the sequence.
If the sequences contain many gaps, it might be better to setcheck.gaps = FALSE to skip the analysis of contiguoussegments.
Value
NULL
Author(s)
Emmanuel Paradis
See Also
alview,image.DNAbin,all.equal.DNAbin
Examples
data(woodmouse)checkAlignment(woodmouse)layout(1)Checking Labels
Description
Checking and correcting character strings, particularly before writinga Newick tree.
Usage
checkLabel(x)Arguments
x | a vector of mode character. |
Details
This function deletes the leading and trailing spaces (includingtabulations, new lines, and left or right parentheses at the beginningor end of the strings), substitutes the spaces inside the strings byunderscores, and substitutes commas, colons, semicolons, andparentheses inside the strings by dashes.
Value
a vector of mode character.
Author(s)
Emmanuel Paradis
See Also
makeLabel,makeNodeLabel,mixedFontLabel,stripLabel,updateLabel
Examples
checkLabel(" Homo sapiens\t(Primates; World) ")Check the Structure of a "phylo" Object
Description
This function takes as single argument an object (phy), checks itselements, and prints a diagnostic. All problems are printed with alabel: FATAL (will likely cause an error or a crash) or MODERATE (maycause some problems).
This function is mainly intended for developers creating"phylo" objects from scratch.
Usage
checkValidPhylo(phy)Arguments
phy | an object of class |
Value
NULL.
Author(s)
Emmanuel Paradis
Examples
tr <- rtree(3)checkValidPhylo(tr)tr$edge[1] <- 0checkValidPhylo(tr)Number of Cherries and Null Models of Trees
Description
This function calculates the number of cherries (see definition below)on a phylogenetic tree, and tests the null hypotheses whether thisnumber agrees with those predicted from two null models of trees (theYule model, and the uniform model).
Usage
cherry(phy)Arguments
phy | an object of class |
Details
A cherry is a pair of adjacent tips on a tree. The tree can be eitherrooted or unrooted, but the present function considers only rootedtrees. The probability distribution function of the number of cherrieson a tree depends on the speciation/extinction model that generatedthe tree.
McKenzie and Steel (2000) derived the probabilitydistribution function of the number of cherries for two models: theYule model and the uniform model. Broadly, in the Yule model, each extantspecies is equally likely to split into two daughter-species; in theuniform model, a branch is added to tree on any of the alreadyexisting branches with a uniform probability.
The probabilities are computed using recursive formulae; however, forboth models, the probability density function converges to a normallaw with increasing number of tips in the tree. The function usesthese normal approximations for a number of tips greater than or equalto 20.
Value
A NULL value is returned, the results are simply printed.
Author(s)
Emmanuel Paradis
References
McKenzie, A. and Steel, M. (2000) Distributions of cherries for twomodels of trees.Mathematical Biosciences,164, 81–92.
See Also
Bat Phylogeny
Description
This phylogeny of bats (Mammalia: Chiroptera) is a supertree (i.e. acomposite phylogeny constructed from several sources; see source fordetails).
Usage
data(chiroptera)Format
The data are stored in RData (binary) format.
Source
Jones, K. E., Purvis, A., MacLarnon, A., Bininda-Emonds, O. R. P. andSimmons, N. B. (2002) A phylogenetic supertree of the bats (Mammalia:Chiroptera).Biological Reviews of the Cambridge PhilosophicalSociety,77, 223–259.
See Also
Examples
data(chiroptera)str(chiroptera)op <- par(cex = 0.3)plot(chiroptera, type = "c")par(op)Molecular Dating With Mean Path Lengths
Description
This function estimates the node ages of a tree using the mean pathlengths method of Britton et al. (2002). The branch lengths of theinput tree are interpreted as (mean) numbers of substitutions.
Usage
chronoMPL(phy, se = TRUE, test = TRUE)Arguments
phy | an object of class |
se | a logical specifying whether to compute the standard-errorsof the node ages ( |
test | a logical specifying whether to test the molecular clockat each node ( |
Details
The mean path lengths (MPL) method estimates the age of a node withthe mean of the distances from this node to all tips descending fromit. Under the assumption of a molecular clock, standard-errors of theestimates node ages can be computed (Britton et al. 2002).
The tests performed iftest = TRUE is a comparison of the MPLof the two subtrees originating from a node; the null hypothesis isthat the rate of substitution was the same in both subtrees (Brittonet al. 2002). The test statistic follows, under the null hypothesis, astandard normal distribution. The returnedP-value is theprobability of observing a greater absolute value (i.e., a two-sidedtest). No correction for multiple testing is applied: this is left tothe user.
Absolute dating can be done by multiplying the edge lengths found bycalibrating one node age.
Value
an object of class"phylo" with branch lengths as estimated bythe function. There are, by default, two attributes:
stderr | the standard-errors of the node ages. |
Pval | theP-value of the test of the molecular clock foreach node. |
Note
The present version requires a dichotomous tree.
Author(s)
Emmanuel Paradis
References
Britton, T., Oxelman, B., Vinnersten, A. and Bremer, K. (2002)Phylogenetic dating with confidence intervals using mean pathlengths.Molecular Phylogenetics and Evolution,24,58–65.
See Also
Examples
tr <- rtree(10)tr$edge.length <- 5*tr$edge.lengthchr <- chronoMPL(tr)layout(matrix(1:4, 2, 2, byrow = TRUE))plot(tr)title("The original tree")plot(chr)axisPhylo()title("The dated MPL tree")plot(chr)nodelabels(round(attr(chr, "stderr"), 3))title("The standard-errors")plot(tr)nodelabels(round(attr(chr, "Pval"), 3))title("The tests")layout(1)Molecular Dating With Penalized Likelihood
Description
This function estimates the node ages of a tree using asemi-parametric method based on penalized likelihood (Sanderson2002). The branch lengths of the input tree are interpreted as meannumbers of substitutions (i.e., per site).
Usage
chronopl(phy, lambda, age.min = 1, age.max = NULL, node = "root", S = 1, tol = 1e-8, CV = FALSE, eval.max = 500, iter.max = 500, ...)Arguments
phy | an object of class |
lambda | value of the smoothing parameter. |
age.min | numeric values specifying the fixed node ages (if |
age.max | numeric values specifying the oldest bound of the nodesknown to be within an interval. |
node | the numbers of the nodes whose ages are given by |
S | the number of sites in the sequences; leave the default ifbranch lengths are in mean number of substitutions. |
tol | the value below which branch lengths are consideredeffectively zero. |
CV | whether to perform cross-validation. |
eval.max | the maximal number of evaluations of the penalizedlikelihood function. |
iter.max | the maximal number of iterations of the optimizationalgorithm. |
... | further arguments passed to control |
Details
The idea of this method is to use a trade-off between a parametricformulation where each branch has its own rate, and a nonparametricterm where changes in rates are minimized between contiguousbranches. A smoothing parameter (lambda) controls this trade-off. Iflambda = 0, then the parametric component dominates and rates vary asmuch as possible among branches, whereas for increasing values oflambda, the variation are smoother to tend to a clock-like model (samerate for all branches).
lambda must be given. The known ages are given inage.min, and the correponding node numbers innode.These two arguments must obviously be of the same length. By default,an age of 1 is assumed for the root, and the ages of the other nodesare estimated.
Ifage.max = NULL (the default), it is assumed thatage.min gives exactly known ages. Otherwise,age.max andage.min must be of the same length and give the intervals foreach node. Some node may be known exactly while the others areknown within some bounds: the values will be identical in botharguments for the former (e.g.,age.min = c(10, 5), age.max = c(10, 6), node = c(15, 18) means that the age of node 15 is 10units of time, and the age of node 18 is between 5 and 6).
If two nodes are linked (i.e., one is the ancestor of the other) andhave the same values ofage.min andage.max (say, 10 and15) this will result in an error because the medians of these valuesare used as initial times (here 12.5) giving initial branch length(s)equal to zero. The easiest way to solve this is to change slightly thegiven values, for instance useage.max = 14.9 for the youngestnode, orage.max = 15.1 for the oldest one (or similarly forage.min).
The input tree may have multichotomies. If some internal branches areof zero-length, they are collapsed (with a warning), and the returnedtree will have less nodes than the input one. The presence ofzero-lengthed terminal branches of results in an error since it makeslittle sense to have zero-rate branches.
The cross-validation used here is different from the one proposed bySanderson (2002). Here, each tip is dropped successively and theanalysis is repeated with the reduced tree: the estimated dates forthe remaining nodes are compared with the estimates from the fulldata. For theith tip the following is calculated:
\sum_{j=1}^{n-2}{\frac{(t_j - t_j^{-i})^2}{t_j}}
,
wheret_j is the estimated date for thejth nodewith the full phylogeny,t_j^{-i} is the estimated datefor thejth node after removing tipi from the tree,andn is the number of tips.
The present version uses thenlminb to optimisethe penalized likelihood function: see its help page for details onparameters controlling the optimisation procedure.
Value
an object of class"phylo" with branch lengths as estimated bythe function. There are three or four further attributes:
ploglik | the maximum penalized log-likelihood. |
rates | the estimated rates for each branch. |
message | the message returned by |
D2 | the influence of each observation on overall dateestimates (if |
Note
The new functionchronos replaces the present one whichis no more maintained.
Author(s)
Emmanuel Paradis
References
Sanderson, M. J. (2002) Estimating absolute rates of molecularevolution and divergence times: a penalized likelihoodapproach.Molecular Biology and Evolution,19,101–109.
See Also
Molecular Dating by Penalised Likelihood and Maximum Likelihood
Description
chronos is the main function fitting a chronogram to aphylogenetic tree whose branch lengths are in number of substitutionper sites.
makeChronosCalib is a tool to prepare data frames with thecalibration points of the phylogenetic tree.
chronos.control creates a list of parameters to be passedtochronos.
Usage
chronos(phy, lambda = 1, model = "correlated", quiet = FALSE, calibration = makeChronosCalib(phy), control = chronos.control())## S3 method for class 'chronos'print(x, ...)makeChronosCalib(phy, node = "root", age.min = 1, age.max = age.min, interactive = FALSE, soft.bounds = FALSE)chronos.control(...)Arguments
phy | an object of class |
lambda | value of the smoothing parameter. |
model | a character string specifying the model of substitutionrate variation among branches. The possible choices are:“correlated”, “relaxed”, “discrete”, “clock”, or anunambiguous abbreviation of these. |
quiet | a logical value; by default the calculation progress aredisplayed. |
calibration | a data frame (see details). |
control | a list of parameters controlling the optimisationprocedure (see details). |
x | an object of class |
node | a vector of integers giving the node numbers for which acalibration point is given. The default is a short-cut for theroot. |
age.min,age.max | vectors of numerical values giving the minimumand maximum ages of the nodes specified in |
interactive | a logical value. If |
soft.bounds | (currently unused) |
... | in the case of |
Details
chronos replaceschronopl but with a different interfaceand some extensions (see References).
The known dates (argumentcalibration) must be given in a dataframe with the following column names: node, age.min, age.max, andsoft.bounds (the last one is yet unused). For each row, these are,respectively: the number of the node in the “phylo” coding standard,the minimum age for this node, the maximum age, and a logical valuespecifying whether the bounds are soft. If age.min = age.max, thismeans that the age is exactly known. This data frame can be built withmakeChronosCalib which returns by default a data frame with asingle row giving age = 1 for the root. The data frame can be builtinteractively by clicking on the plotted tree.
The argumentcontrol allows one to change some parameters ofthe optimisation procedure. This must be a list with names. Theavailable options with their default values are:
tol = 1e-8: tolerance for the estimation of the substitutionrates.
iter.max = 1e4: the maximum number of iterations at eachoptimization step.
eval.max = 1e4: the maximum number of function evaluations ateach optimization step.
nb.rate.cat = 10: the number of rate categories if
model= "discrete"(set this parameter to 1 to fit a strict clockmodel).dual.iter.max = 20: the maximum number of alternativeiterations between rates and dates.
epsilon = 1e-6: the convergence diagnostic criterion.
Usingmodel = "clock" is actually a short-cut tomodel = "discrete" and settingnb.rate.cat = 1 in the list passed tocontrol.
The commandchronos.control() returns a list with the defaultvalues of these parameters. They may be modified by passing them tothis function, or directly in the list.
Value
chronos returns an object of classc("chronos", "phylo"). There is a print method for it. There are additionalattributes which can be visualised withstr or extracted withattr.
makeChronosCalib returns a data frame.
chronos.control returns a list.
Author(s)
Emmanuel Paradis, Santiago Claramunt, Guillaume Louvel
References
Kim, J. and Sanderson, M. J. (2008) Penalized likelihood phylogeneticinference: bridging the parsimony-likelihood gap.SystematicBiology,57, 665–674.
Paradis, E. (2013) Molecular dating of phylogenies by likelihoodmethods: a comparison of models and a new informationcriterion.Molecular Phylogenetics and Evolution,67,436–444.
Sanderson, M. J. (2002) Estimating absolute rates of molecularevolution and divergence times: a penalized likelihoodapproach.Molecular Biology and Evolution,19,101–109.
See Also
Examples
library(ape)tr <- rtree(10)### the default is the correlated rate model:chr <- chronos(tr)### strict clock model:ctrl <- chronos.control(nb.rate.cat = 1)chr.clock <- chronos(tr, model = "discrete", control = ctrl)### How different are the rates?attr(chr, "rates")attr(chr.clock, "rates")## Not run: cal <- makeChronosCalib(tr, interactive = TRUE)cal### if you made mistakes, you can edit the data frame with:### fix(cal)chr <- chronos(tr, calibration = cal)## End(Not run)Multiple Sequence Alignment with External Applications
Description
These functions call their respective program fromR to align a setof nucleotide sequences of class"DNAbin" or"AAbin". The application(s) must be installed seperately and itis highly recommended to do this so that the executables are in adirectory located on the PATH of the system.
This version includes an experimental version ofmuscle5 whichcalls MUSCLE5 (see the link to the documentation in the Referencesbelow);muscle still calls MUSCLE version 3. Note that theexecutable of MUSCLE5 is also named ‘muscle’ by the defaultcompilation setting.
The functionsefastats andletterconf require MUSCLE5.
Usage
clustal(x, y, guide.tree, pw.gapopen = 10, pw.gapext = 0.1, gapopen = 10, gapext = 0.2, exec = NULL, MoreArgs = "", quiet = TRUE, original.ordering = TRUE, file)clustalomega(x, y, guide.tree, exec = NULL,MoreArgs = "", quiet = TRUE, original.ordering = TRUE, file)muscle(x, y, guide.tree, exec, MoreArgs = "", quiet = TRUE, original.ordering = TRUE, file)muscle5(x, exec = "muscle", MoreArgs = "", quiet = FALSE, file, super5 = FALSE, mc.cores = 1)tcoffee(x, exec = "t_coffee", MoreArgs = "", quiet = TRUE, original.ordering = TRUE)efastats(X, exec = "muscle", quiet = FALSE)letterconf(X, exec = "muscle")Arguments
x | an object of class |
y | an object of class |
guide.tree | guide tree, an object of class |
pw.gapopen,pw.gapext | gap opening and gap extension penaltiesused by Clustal during pairwise alignments. |
gapopen,gapext | idem for global alignment. |
exec | a character string giving the name of the program, withits path if necessary. |
MoreArgs | a character string giving additional options. |
quiet | a logical: the default is to not print onR's console themessages from the external program. |
original.ordering | a logical specifying whether to return thealigned sequences in the same order than in |
file | a file with its path if results should be stored (can bemissing). |
super5 | a logical value. By default, the PPP algorithm is used. |
mc.cores | the number of cores to be used by MUSCLE5. |
X | a list with several alignments of the same sequences withall with the same row order. |
Details
It is highly recommended to install the executables properly so thatthey are in a directory located on the PATH (i.e., accessible from anyother directory). Alternatively, the full path to the executablemay be given (e.g.,exec = "~/muscle/muscle"), or a (symbolic)link may be copied in the working directory. For Debian and itsderivatives (e.g., Ubuntu), it is recommended to use the binariesdistributed by Debian.
clustal tries to guess the name of the executable programdepending on the operating system. Specifically, the followings areused: “clustalw” under Linux, “clustalw2” under MacOS, and“clustalw2.exe” under Windows. Forclustalomega,“clustalo[.exe]” is the default on all systems (with no specificpath).
When called without arguments (i.e.,clustal(), ...), thefunction prints the options of the program which may be passed toMoreArgs.
Sinceape 5.1,clustal,clustalomega, andmuscle can align AA sequences as well as DNA sequences.
Value
an object of class"DNAbin" or"AAbin" with the alignedsequences.
efastats returns a data frame.
letterconf opens the default Web brower.
Author(s)
Emmanuel Paradis, Franz Krah
References
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J.,Higgins, D. G. and Thompson, J. D. (2003) Multiple sequence alignmentwith the Clustal series of programs.Nucleic Acids Research31, 3497–3500.http://www.clustal.org/
Edgar, R. C. (2004) MUSCLE: Multiple sequence alignment with highaccuracy and high throughput.Nucleic Acids Research,32, 1792–1797.http://www.drive5.com/muscle/muscle_userguide3.8.html
Notredame, C., Higgins, D. and Heringa, J. (2000) T-Coffee: A novelmethod for multiple sequence alignments.Journal of MolecularBiology,302, 205–217.https://tcoffee.org/
Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W.,Lopez, R., McWilliam, H., Remmert, M., S\"oding, J., Thompson,J. D. and Higgins, D. G. (2011) Fast, scalable generation ofhigh-quality protein multiple sequence alignments using ClustalOmega.Molecular Systems Biology,7, 539.http://www.clustal.org/
See Also
image.DNAbin,del.gaps,all.equal.DNAbin,alex,alview,checkAlignment
Examples
## Not run: ### display the options:clustal()clustalomega()muscle()tcoffee()data(woodmouse)### open gaps more easily:clustal(woodmouse, pw.gapopen = 1, pw.gapext = 1)### T-Coffee requires negative values (quite slow; muscle() is much faster):tcoffee(woodmouse, MoreArgs = "-gapopen=-10 -gapext=-2")## End(Not run)Coalescent Intervals
Description
This function extracts or generates information about coalescent intervals(number of lineages, interval lengths, interval count, total depth) froma phylogenetic tree or a list of internode distances. The input treeneeds to be ultra-metric (i.e. clock-like).
Usage
coalescent.intervals(x)Arguments
x | either an ultra-metric phylogenetic tree (i.e. an object ofclass |
Value
An object of class"coalescentIntervals" with the following entries:
lineages | A vector with the number of lineages at the start of each coalescentinterval. |
interval.length | A vector with the length of each coalescentinterval. |
interval.count | The total number of coalescentintervals. |
total.depth | The sum of the lengths of all coalescentintervals. |
Author(s)
Korbinian Strimmer
See Also
branching.times,collapsed.intervals,read.tree.
Examples
data("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load treeci <- coalescent.intervals(tree.hiv) # from treecidata("hivtree.table") # same tree, but in table formatci <- coalescent.intervals(hivtree.table$size) # from vector of interval lengthsciCollapse Single Nodes
Description
collapse.singles deletes the single nodes (i.e., with a singledescendant) in a tree.
has.singles tests for the presence of single node(s) in a tree.
Usage
collapse.singles(tree, root.edge = FALSE)has.singles(tree)Arguments
tree | an object of class |
root.edge | whether to get the singleton edges from the rootuntil the first bifurcating node and put them as |
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis, Klaus Schliep
See Also
Examples
## a tree with 3 tips and 3 nodes:e <- c(4L, 6L, 6L, 5L, 5L, 6L, 1L, 5L, 3L, 2L)dim(e) <- c(5, 2)tr <- structure(list(edge = e, tip.label = LETTERS[1:3], Nnode = 3L), class = "phylo")trhas.singles(tr)## the following shows that node #4 (ie, the root) is a singleton## and node #6 is the first bifurcating nodetr$edge## A bifurcating tree has less nodes than it has tips:## the following used to fail with ape 4.1 or lower:plot(tr)collapse.singles(tr) # only 2 nodes## give branch lengths to use the 'root.edge' option:tr$edge.length <- runif(5)str(collapse.singles(tr, TRUE)) # has a 'root.edge'Collapsed Coalescent Intervals
Description
This function takes a"coalescentIntervals" objects and collapses neighbouringcoalescent intervals into a single combined interval so that every collapsed interval islarger thanepsilon. Collapsed coalescent intervals are used, e.g., to obtain thegeneralized skyline plot (skyline). Forepsilon = 0 no intervalis collapsed.
Usage
collapsed.intervals(ci, epsilon=0)Arguments
ci | coalescent intervals (i.e. an object of class |
epsilon | collapsing parameter that controls the amount of smoothing(allowed range: from |
Details
Proceeding from the tips to the root of the tree each smallinterval is pooled with the neighboring interval closer to the root. If theneighboring interval is also small, then pooling continues until the compositeinterval is larger thanepsilon. Note that this approach prevents theoccurrence of zero-length intervals at the present.For more details see Strimmer and Pybus (2001).
Value
An object of class"collapsedIntervals" with the following entries:
lineages | A vector with the number of lineages at the start of each coalescentinterval. |
interval.length | A vector with the length of each coalescentinterval. |
collapsed.interval | A vector indicating for each coalescent interval to whichcollapsed interval it belongs. |
interval.count | The total number of coalescentintervals. |
collapsed.interval.count | The number of collapsed intervals. |
total.depth | The sum of the lengths of all coalescentintervals. |
epsilon | The value of the underlying smoothing parameter. |
Author(s)
Korbinian Strimmer
References
Strimmer, K. and Pybus, O. G. (2001) Exploring the demographic historyof DNA sequences using the generalized skyline plot.MolecularBiology and Evolution,18, 2298–2305.
See Also
Examples
data("hivtree.table") # example tree# colescent intervals from vector of interval lengthsci <- coalescent.intervals(hivtree.table$size)ci# collapsed intervalscl1 <- collapsed.intervals(ci,0)cl2 <- collapsed.intervals(ci,0.0119)cl1cl2Cheverud's Comparative Method
Description
This function computes the phylogenetic variance component and theresidual deviation for continous characters, taking into account thephylogenetic relationships among species, following the comparativemethod described in Cheverud et al. (1985). The correction proposed byRholf (2001) is used.
Usage
compar.cheverud(y, W, tolerance = 1e-06, gold.tol = 1e-04)Arguments
y | A vector containing the data to analyse. |
W | The phylogenetic connectivity matrix. All diagonal elementswill be ignored. |
tolerance | Minimum difference allowed to consider eigenvalues asdistinct. |
gold.tol | Precision to use in golden section search alogrithm. |
Details
Model:
y = \rho W y + e
wheree is the error term, assumed to be normally distributed.\rho is estimated by the maximum likelihood procedure givenin Rohlf (2001), using a golden section search algorithm. The code ofthis function is indeed adapted from a MatLab code given in appendixin Rohlf's article, to correct a mistake in Cheverud's original paper.
Value
A list with the following components:
rhohat | The maximum likelihood estimate of |
Wnorm | The normalized version of |
residuals | Error terms ( |
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
References
Cheverud, J. M., Dow, M. M. and Leutenegger, W. (1985) The quantitativeassessment of phylogenetic constraints in comparative analyses: sexualdimorphism in body weight among primates.Evolution,39, 1335–1351.
Rohlf, F. J. (2001) Comparative methods for the analysis of continuousvariables: geometric interpretations.Evolution,55,2143–2160.
Harvey, P. H. and Pagel, M. D. (1991)The Comparative Method inEvolutionary Biology. Oxford University Press.
See Also
Examples
### Example from Harvey and Pagel's book:y<-c(10,8,3,4)W <- matrix(c(1,1/6,1/6,1/6,1/6,1,1/2,1/2,1/6,1/2,1,1,1/6,1/2,1,1), 4)compar.cheverud(y,W)### Example from Rohlf's 2001 article:W<- matrix(c( 0,1,1,2,0,0,0,0, 1,0,1,2,0,0,0,0, 1,1,0,2,0,0,0,0, 2,2,2,0,0,0,0,0, 0,0,0,0,0,1,1,2, 0,0,0,0,1,0,1,2, 0,0,0,0,1,1,0,2, 0,0,0,0,2,2,2,0),8)W <- 1/WW[W == Inf] <- 0y<-c(-0.12,0.36,-0.1,0.04,-0.15,0.29,-0.11,-0.06)compar.cheverud(y,W)Comparative Analysis with GEEs
Description
compar.gee performs the comparative analysis using generalizedestimating equations as described by Paradis and Claude (2002).
drop1 tests single effects of a fitted model output fromcompar.gee.
predict returns the predicted (fitted) values of the model.
Usage
compar.gee(formula, data = NULL, family = "gaussian", phy, corStruct, scale.fix = FALSE, scale.value = 1)## S3 method for class 'compar.gee'drop1(object, scope, quiet = FALSE, ...)## S3 method for class 'compar.gee'predict(object, newdata = NULL, type = c("link", "response"), ...)Arguments
formula | a formula giving the model to be fitted. |
data | the name of the data frame where the variables in |
family | a function specifying the distribution assumed for theresponse; by default a Gaussian distribution (with link identity) isassumed (see |
phy | an object of class |
corStruct | a (phylogenetic) correlation structure. |
scale.fix | logical, indicates whether the scale parameter shouldbe fixed (TRUE) or estimated (FALSE, the default). |
scale.value | if |
object | an object of class |
scope | <unused>. |
quiet | a logical specifying whether to display a warning messageabout eventual “marginality principle violation”. |
newdata | a data frame with column names matching the variablesin the formula of the fitted object (see |
type | a character string specifying the type of predictedvalues. By default, the linear (link) prediction is returned. |
... | further arguments to be passed to |
Details
If a data frame is specified for the argumentdata, then itsrownames are matched to the tip labels ofphy. The user must becareful here since the function requires that both series of namesperfectly match, so this operation may fail if there is a typing orsyntax error. If both series of names do not match, the values in thedata frame are taken to be in the same order than the tip labels ofphy, and a warning message is issued.
Ifdata = NULL, then it is assumed that the variables are inthe same order than the tip labels ofphy.
Value
compar.gee returns an object of class"compar.gee" withthe following components:
call | the function call, including the formula. |
effect.assign | a vector of integers assigning the coefficientsto the effects (used by |
nobs | the number of observations. |
QIC | the quasilikelihood information criterion as defined by Pan(2001). |
coefficients | the estimated coefficients (or regression parameters). |
residuals | the regression residuals. |
family | a character string, the distribution assumed for the response. |
link | a character string, the link function used for the mean function. |
scale | the scale (or dispersion parameter). |
W | the variance-covariance matrix of the estimated coefficients. |
dfP | the phylogenetic degrees of freedom (see Paradis and Claudefor details on this). |
drop1 returns an object of class"anova".
predict returns a vector or a data frame ifnewdata is used.
Note
The calculation of the phylogenetic degrees of freedom is likely to beapproximative for non-Brownian correlation structures (this will berefined soon).
The calculation of the quasilikelihood information criterion (QIC)needs to be tested.
Author(s)
Emmanuel Paradis
References
Pan, W. (2001) Akaike's information criterion in generalizedestimating equations.Biometrics,57, 120–125.
Paradis, E. and Claude J. (2002) Analysis of comparative data usinggeneralized estimating equations.Journal of theoreticalBiology,218, 175–185.
See Also
read.tree,pic,compar.lynch,drop1
Examples
### The example in Phylip 3.5c (originally from Lynch 1991)### (the same analysis than in help(pic)...)tr <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = tr)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)### Both regressions... the results are quite close to those obtained### with pic().compar.gee(X ~ Y, phy = tree.primates)compar.gee(Y ~ X, phy = tree.primates)### Now do the GEE regressions through the origin: the results are quite### different!compar.gee(X ~ Y - 1, phy = tree.primates)compar.gee(Y ~ X - 1, phy = tree.primates)Lynch's Comparative Method
Description
This function computes the heritable additive value and the residualdeviation for continous characters, taking into account thephylogenetic relationships among species, following the comparativemethod described in Lynch (1991).
Usage
compar.lynch(x, G, eps = 1e-4)Arguments
x | eiher a matrix, a vector, or a data.frame containing the datawith species as rows and variables as columns. |
G | a matrix that can be interpreted as an among-species correlationmatrix. |
eps | a numeric value to detect convergence of the EM algorithm. |
Details
The parameter estimates are computed following the EM(expectation-maximization) algorithm. This algorithm usually leads toconvergence but may lead to local optima of the likelihoodfunction. It is recommended to run several times the function in orderto detect these potential local optima. The ‘optimal’ value foreps depends actually on the range of the data and may bechanged by the user in order to check the stability of the parameterestimates. Convergence occurs when the differences between twosuccessive iterations of the EM algorithm leads to differences betweenboth residual and additive values less than or equal toeps.
Value
A list with the following components:
vare | estimated residual variance-covariance matrix. |
vara | estimated additive effect variance covariance matrix. |
u | estimates of the phylogeny-wide means. |
A | addtitive value estimates. |
E | residual values estimates. |
lik | logarithm of the likelihood for the entire set of observedtaxon-specific mean. |
Note
The present function does not perform the estimation of ancestralphentoypes as proposed by Lynch (1991). This will be implemented ina future version.
Author(s)
Julien Claudejulien.claude@umontpellier.fr
References
Lynch, M. (1991) Methods for the analysis of comparative data inevolutionary biology.Evolution,45, 1065–1080.
See Also
Examples
### The example in Lynch (1991)x <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = x)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)compar.lynch(cbind(X, Y), G = vcv.phylo(tree.primates, cor = TRUE))Ornstein–Uhlenbeck Model for Continuous Characters
Description
This function fits an Ornstein–Uhlenbeck model giving a phylogenetictree, and a continuous character. The user specifies the node(s) wherethe optimum changes. The parameters are estimated by maximumlikelihood; their standard-errors are computed assuming normality ofthese estimates.
Usage
compar.ou(x, phy, node = NULL, alpha = NULL)Arguments
x | a numeric vector giving the values of a continuouscharacter. |
phy | an object of class |
node | a vector giving the number(s) of the node(s) where theparameter ‘theta’ (the trait optimum) is assumed to change. Thenode(s) can be specified with their labels if |
alpha | the value of |
Details
The Ornstein–Uhlenbeck (OU) process can be seen as a generalizationof the Brownian motion process. In the latter, characters are assumedto evolve randomly under a random walk, that is change is equallylikely in any direction. In the OU model, change is more likelytowards the direction of an optimum (denoted\theta) witha strength controlled by a parameter denoted\alpha.
The present function fits a model where the optimum parameter\theta, is allowed to vary throughout the tree. This isspecified with the argumentnode:\theta changesafter each node whose number is given there. Note that the optimumchangesonly for the lineages which are descendants of thisnode.
Hansen (1997) recommends to not estimate\alpha togetherwith the other parameters. The present function allows this by givinga numeric value to the argumentalpha. By default, thisparameter is estimated, but this seems to yield very largestandard-errors, thus validating Hansen's recommendation. In practice,a “poor man estimation” of\alpha can be done byrepeating the function call with different values ofalpha, andselecting the one that minimizes the deviance (see Hansen 1997 for anexample).
Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.
The user must be careful here since the function requires that bothseries of names perfectly match, so this operation may fail if thereis a typing or syntax error. If both series of names do not match, thevalues in thex are taken to be in the same order than the tiplabels ofphy, and a warning message is issued.
Value
an object of class"compar.ou" which is list with the followingcomponents:
deviance | the deviance (= -2 * loglik). |
para | a data frame with the maximum likelihood estimates andtheir standard-errors. |
call | the function call. |
Note
The inversion of the variance-covariance matrix in the likelihoodfunction appeared as somehow problematic. The present implementationuses a Cholevski decomposition with the functionchol2inv instead of the usual functionsolve.
Author(s)
Emmanuel Paradis
References
Hansen, T. F. (1997) Stabilizing selection and the comparativeanalysis of adaptation.Evolution,51, 1341–1351.
See Also
ace,compar.lynch,corBrownian,corMartins,pic
Examples
data(bird.orders)### This is likely to give you estimates close to 0, 1, and 0### for alpha, sigma^2, and theta, respectively:compar.ou(x <- rnorm(23), bird.orders)### Much better with a fixed alpha:compar.ou(x, bird.orders, alpha = 0.1)### Let us 'mimick' the effect of different optima### for the two clades of birds...x <- c(rnorm(5, 0), rnorm(18, 5))### ... the model with two optima:compar.ou(x, bird.orders, node = 25, alpha = .1)### ... and the model with a single optimum:compar.ou(x, bird.orders, node = NULL, alpha = .1)### => Compare both models with the difference in deviances## which follows a chi^2 with df = 1.Compare Two "phylo" Objects
Description
This function compares two phylogenetic trees, rooted or unrooted, andreturns a detailed report of this comparison.
Usage
comparePhylo(x, y, plot = FALSE, force.rooted = FALSE, use.edge.length = FALSE, commons = TRUE, location = "bottomleft", ...)## S3 method for class 'comparePhylo'print(x, ...)Arguments
x,y | two objects of class |
plot | a logical value. If |
force.rooted | a logical value. If |
use.edge.length | a logical value passed to |
commons | whether to show the splits (the default), or the splitsspecific to each tree (applies only for unrooted trees). |
location | location of where to position the |
... | further parameters used by |
Details
In all cases, the numbers of tips and of nodes and the tip labels arecompared.
If both trees are rooted, or ifforce.rooted = TRUE, the cladecompositions of each tree are compared. If both trees are alsoultrametric, their branching times are compared.
If both trees are unrooted and have the same number of nodes, thebipartitions (aka splits) are compared.
Ifplot = TRUE, the edge lengths are not used by defaultbecause in some situations with unrooted trees, some splits might notbe visible if the corresponding internal edge length is very short. Touse edge lengths, setuse.edge.length = TRUE.
Value
an object of class"comparePhylo" which is a list with messagesfrom the comparison and, optionally, tables comparing branching times.
Author(s)
Emmanuel Paradis, Klaus Schliep
See Also
Examples
## two unrooted trees but force comparison as rooted:a <- read.tree(text = "(a,b,(c,d));")b <- read.tree(text = "(a,c,(b,d));")comparePhylo(a, b, plot = TRUE, force.rooted = TRUE)## two random unrooted trees:c <- rtree(5, rooted = FALSE)d <- rtree(5, rooted = FALSE)comparePhylo(c, d, plot = TRUE)Branch Lengths Computation
Description
This function computes branch lengths of a tree using differentmethods.
Usage
compute.brlen(phy, method = "Grafen", power = 1, ...)Arguments
phy | an object of class |
method | the method to be used to compute the branch lengths;this must be one of the followings: (i) |
power | The power at which heights must be raised (see below). |
... | further argument(s) to be passed to |
Details
Grafen's (1989) computation of branch lengths: each node is given a‘height’, namely the number of leaves of the subtree minus one, 0 forleaves. Each height is scaled so that root height is 1, and thenraised at power 'rho' (> 0). Branch lengths are then computed as thedifference between height of lower node and height of upper node.
If one or several numeric values are provided asmethod, theyare recycled if necessary. If a function is given instead, furtherarguments are given in place of... (they must be named, seeexamples).
Zero-length branches are not treated as multichotomies, and thus mayneed to be collapsed (seedi2multi).
Value
An object of classphylo with branch lengths.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de andEmmanuel Paradis
References
Grafen, A. (1989) The phylogenetic regression.PhilosophicalTransactions of the Royal society of London. Series B. BiologicalSciences,326, 119–157.
See Also
read.tree for a description ofphylo objects,di2multi,multi2di
Examples
data(bird.orders)plot(compute.brlen(bird.orders, 1))plot(compute.brlen(bird.orders, runif, min = 0, max = 5))layout(matrix(1:4, 2, 2))plot(compute.brlen(bird.orders, power=1), main=expression(rho==1))plot(compute.brlen(bird.orders, power=3), main=expression(rho==3))plot(compute.brlen(bird.orders, power=0.5), main=expression(rho==0.5))plot(compute.brlen(bird.orders, power=0.1), main=expression(rho==0.1))layout(1)Compute and Set Branching Times
Description
This function computes the branch lengths of a tree giving itsbranching times (aka node ages or heights).
Usage
compute.brtime(phy, method = "coalescent", force.positive = NULL)Arguments
phy | an object of class |
method | either |
force.positive | a logical value (see details). |
Details
By default, a set of random branching times is generated from a simplecoalescent, and the optionforce.positive is set toTRUEso that no branch length is negative.
If a numeric vector is passed tomethod, it is taken as thebranching times of the nodes with respect to their numbers (i.e., thefirst element ofmethod is the branching time of the nodenumberedn + 1 [= the root], the second element of the nodenumberedn + 2, and so on), soforce.positive is set toFALSE. This may result in negative branch lengths. To avoidthis, one should useforce.positive = TRUE in which case thebranching times are eventually reordered.
Value
An object of class"phylo" with branch lengths and ultrametric.
Author(s)
Emmanuel Paradis
See Also
Examples
tr <- rtree(10)layout(matrix(1:4, 2))plot(compute.brtime(tr)); axisPhylo()plot(compute.brtime(tr, force.positive = FALSE)); axisPhylo()plot(compute.brtime(tr, 1:9)); axisPhylo() # a bit nonsenseplot(compute.brtime(tr, 1:9, TRUE)); axisPhylo()layout(1)Concensus Trees
Description
Given a series of trees, this function returns the consensus tree. Bydefault, the strict-consensus tree is computed. To get themajority-rule consensus tree, usep = 0.5. Any value between0.5 and 1 can be used.
Usage
consensus(..., p = 1, check.labels = TRUE, rooted = FALSE)Arguments
... | either (i) a single object of class |
p | a numeric value between 0.5 and 1 giving the proportion for aclade to be represented in the consensus tree. |
check.labels | a logical specifying whether to check the labelsof each tree. If |
rooted | a logical specifying whether the trees should be treated asrooted or not. |
Details
Usingcheck.labels = FALSE results inconsiderable decrease in computing times. This requires that alltrees have the same tip labels,and these labels areordered similarly in all trees (in other words, the elementtip.label are identical in all trees).
Untilape 5.6-2, the trees passed to this function wereimplicitly treated as rooted, even when the optionrooted = FALSE was used. This is now fixed (see PR65 on GitHub) so that, bydefault, the trees are explicitly treated as unrooted (even ifis.rooted returnsTRUE). Thus, it couldbe that results now differ from previous analyses (settingrooted = TRUE might help to replicate previous results).
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis
References
Felsenstein, J. (2004)Inferring Phylogenies. Sunderland:Sinauer Associates.
See Also
Pairwise Distances from a Phylogenetic Tree
Description
cophenetic.phylo computes the pairwise distances between thepairs of tips from a phylogenetic tree using its branch lengths.
dist.nodes does the same but between all nodes, internal andterminal, of the tree.
Usage
## S3 method for class 'phylo'cophenetic(x)dist.nodes(x, fail.if.no.length = FALSE)Arguments
x | an object of class |
fail.if.no.length | a logical values. If the tree has no branchlengths, these are all fixed to one (with a warning) so thecomputation is done. If you prefer to catch the case of no branchlengths with an error, set this option to |
Value
a numeric matrix with colnames and rownames set to the names of thetips (as given by the elementtip.label of the argumentphy), or, in the case ofdist.nodes, the numbers of thetips and the nodes (as given by the elementedge).
Author(s)
Emmanuel Paradis
See Also
read.tree to read tree files in Newick format,cophenetic for the generic function
Plots two phylogenetic trees face to face with links between the tips.
Description
This function plots two trees face to face with the links if specified. It is possible to rotate the branches of each tree around the nodes by clicking.
Usage
cophyloplot(x, y, assoc = NULL, use.edge.length = FALSE, space = 0, length.line = 1, gap = 2, type = "phylogram", rotate = FALSE, col = par("fg"), lwd = par("lwd"), lty = par("lty"), show.tip.label = TRUE, font = 3, ...)Arguments
x,y | two objects of class |
assoc | a matrix with 2 columns specifying the associationsbetween the tips. If NULL, no links will be drawn. |
use.edge.length | a logical indicating whether the branch lengthsshould be used to plot the trees; default is FALSE. |
space | a positive value that specifies the distance between thetwo trees. |
length.line | a positive value that specifies the length of thehorizontal line associated to each taxa. Default is 1. |
gap | a value specifying the distance between the tips of thephylogeny and the lines. |
type | a character string specifying the type of phylogeny to bedrawn; it must be one of "phylogram" (the default) or "cladogram". |
rotate | a logical indicating whether the nodes of the phylogenycan be rotated by clicking. Default is FALSE. |
col | a character vector indicating the color to be used for thelinks; recycled as necessary. |
lwd | id. for the width. |
lty | id. for the line type. |
show.tip.label | a logical indicating whether to show the tiplabels on the phylogeny (defaults to 'TRUE', i.e. the labels areshown). |
font | an integer specifying the type of font for thelabels: 1 (plain text), 2 (bold), 3 (italic, the default), or 4(bold italic). |
... | (unused) |
Details
The aim of this function is to plot simultaneously two phylogenetic trees with associated taxa. The two trees do not necessarily have the same number of tips and more than one tip in one phylogeny can be associated with a tip in the other.
The association matrix used to draw the links has to be a matrix with two columns containing the names of the tips. One line in the matrix represents one link on the plot. The first column of the matrix has to contain tip labels of the first tree (phy1) and the second column of the matrix, tip labels of the second tree (phy2). There is no limit (low or high) for the number of lines in the matrix. A matrix with two colums and one line will give a plot with one link.
Argumentsgap,length.line andspace have to be changed to get a nice plot of the two phylogenies. Note that the function takes into account the length of the character strings corresponding to the names at the tips, so that the lines do not overwrite those names.
Therotate argument can be used to transform both phylogenies in order to get the more readable plot (typically by decreasing the number of crossing lines). This can be done by clicking on the nodes. The escape button or right click take back to the console.
Author(s)
Damien de Viennedamien.de-vienne@u-psud.fr
See Also
plot.phylo,rotate,rotateConstr
Examples
#two random treestree1 <- rtree(40)tree2 <- rtree(20)#creation of the association matrix:association <- cbind(tree2$tip.label, tree2$tip.label)cophyloplot(tree1, tree2, assoc = association, length.line = 4, space = 28, gap = 3)#plot with rotations## Not run: cophyloplot(tree1, tree2, assoc=association, length.line=4, space=28, gap=3, rotate=TRUE)## End(Not run)Blomberg et al.'s Correlation Structure
Description
The “ACDC” (accelerated/decelerated) model assumes that continuoustraits evolve under a Brownian motion model which rates accelerates(ifg < 1) or decelerates (ifg > 1) throughtime. Ifg = 1, then the model reduces to a Brownian motionmodel.
Usage
corBlomberg(value, phy, form = ~1, fixed = FALSE)## S3 method for class 'corBlomberg'corMatrix(object, covariate = getCovariate(object), corr = TRUE, ...)## S3 method for class 'corBlomberg'coef(object, unconstrained = TRUE, ...)Arguments
value | the (initial) value of the parameter |
phy | an object of class |
form | a one sided formula of the form ~ t, or ~ t | g,specifying the taxa covariate t and, optionally, a grouping factorg. A covariate for this correlation structure must be charactervalued, with entries matching the tip labels in the phylogenetictree. When a grouping factor is present in form, the correlationstructure is assumed to apply only to observations within the samegrouping level; observations with different grouping levels areassumed to be uncorrelated. Defaults to ~ 1, which corresponds tousing the order of the observations in the data as a covariate, andno groups. |
fixed | a logical specifying whether |
object | an (initialized) object of class |
covariate | an optional covariate vector (matrix), or list ofcovariate vectors (matrices), at which values the correlationmatrix, or list of correlation matrices, are to beevaluated. Defaults to getCovariate(object). |
corr | a logical value specifying whether to return thecorrelation matrix (the default) or the variance-covariance matrix. |
unconstrained | a logical value. If |
... | further arguments passed to or from other methods. |
Value
an object of class"corBlomberg", the coefficients from anobject of this class, or the correlation matrix of an initializedobject of this class. In most situations, onlycorBlomberg willbe called by the user.
Author(s)
Emmanuel Paradis
References
Blomberg, S. P., Garland, Jr, T., and Ives, A. R. (2003) Testing forphylogenetic signal in comparative data: behavioral traits are morelabile.Evolution,57, 717–745.
Brownian Correlation Structure
Description
Expected covariance under a Brownian model (Felsenstein 1985,Martinsand Hansen 1997)
V_{ij} = \gamma \times t_a
wheret_a is the distance on the phylogeny between the rootand the most recent common ancestor of taxai andjand\gamma is a constant.
Usage
corBrownian(value=1, phy, form=~1)## S3 method for class 'corBrownian'coef(object, unconstrained = TRUE, ...)## S3 method for class 'corBrownian'corMatrix(object, covariate = getCovariate(object), corr = TRUE, ...)Arguments
value | The |
phy | An object of class |
object | An (initialized) object of class |
corr | a logical value. If 'TRUE' the function returns thecorrelation matrix, otherwise it returns the variance/covariance matrix. |
form | a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups. |
covariate | an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object). |
unconstrained | a logical value. If 'TRUE' the coefficients are returnedin unconstrained form (the same used in the optimizationalgorithm). If 'FALSE' the coefficients are returned in"natural", possibly constrained, form. Defaults to 'TRUE' |
... | some methods for these generics require additional arguments.None are used in these methods. |
Value
An object of classcorBrownian, or the coefficient from anobject of this class (actually sendsnumeric(0)), or thecorrelation matrix of an initialized object of this class.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
References
Felsenstein, J. (1985) Phylogenies and the comparative method.American Naturalist,125, 1–15.
Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparativemethod: a general approach to incorporating phylogenetic informationinto the analysis of interspecific data.American Naturalist,149, 646–667.
See Also
Phylogenetic Correlation Structures
Description
Classes of phylogenetic correlation structures ("corPhyl")available inape.
corBrownian: Brownian motion model (Felsenstein 1985)
corMartins: The covariance matrix defined in Martins and Hansen(1997)
corGrafen: The covariance matrix defined in Grafen (1989)
corPagel: The covariance matrix defined in Freckelton et al. (2002)
corBlomberg: The covariance matrix defined in Blomberg et al. (2003)
See the help page of each class for references and detaileddescription.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de, EmmanuelParadis
See Also
corClasses andgls in thenlme librarie,corBrownian,corMartins,corGrafen,corPagel,corBlomberg,vcv,vcv2phylo
Examples
library(nlme)txt <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = txt)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")dat <- data.frame(Species = Species, X = X, Y = Y)m1 <- gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = ~Species))summary(m1)m2 <- gls(Y ~ X, dat, correlation=corMartins(1, tree.primates, form = ~Species))summary(m2)corMatrix(m2$modelStruct$corStruct)m3 <- gls(Y ~ X, dat, correlation=corGrafen(1, tree.primates, form = ~Species))summary(m3)corMatrix(m3$modelStruct$corStruct)Grafen's (1989) Correlation Structure
Description
Grafen's (1989) covariance structure. Branch lengths are computed usingGrafen's method (seecompute.brlen). The covariancematrice is then the traditional variance-covariance matrix for aphylogeny.
Usage
corGrafen(value, phy, form=~1, fixed = FALSE)## S3 method for class 'corGrafen'coef(object, unconstrained = TRUE, ...)## S3 method for class 'corGrafen'corMatrix(object, covariate = getCovariate(object), corr = TRUE, ...)Arguments
value | The |
phy | An object of class |
object | An (initialized) object of class |
corr | a logical value. If 'TRUE' the function returns thecorrelation matrix, otherwise it returns the variance/covariancematrix. |
fixed | an optional logical value indicating whether thecoefficients should be allowed to vary in the optimization, or keptfixed at their initial value. Defaults to 'FALSE', in which case thecoefficients are allowed to vary. |
form | a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups. |
covariate | an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object). |
unconstrained | a logical value. If 'TRUE' the coefficients arereturned in unconstrained form (the same used in the optimizationalgorithm). If 'FALSE' the coefficients are returned in "natural",possibly constrained, form. Defaults to 'TRUE' |
... | some methods for these generics require additionalarguments. None are used in these methods. |
Value
An object of classcorGrafen or the rho coefficient from anobject of this class or the correlation matrix of an initializedobject of this class.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
References
Grafen, A. (1989) The phylogenetic regression.PhilosophicalTransactions of the Royal society of London. Series B. BiologicalSciences,326, 119–157.
See Also
corClasses,compute.brlen,vcv.phylo.
Martins's (1997) Correlation Structure
Description
Martins and Hansen's (1997) covariance structure:
V_{ij} = \gamma \times e^{-\alpha t_{ij}}
wheret_{ij} is the phylogenetic distance between taxai andj and\gamma is a constant.
Usage
corMartins(value, phy, form = ~1, fixed = FALSE)## S3 method for class 'corMartins'coef(object, unconstrained = TRUE, ...)## S3 method for class 'corMartins'corMatrix(object,covariate = getCovariate(object), corr = TRUE, ...)Arguments
value | The |
phy | An object of class |
object | An (initialized) object of class |
corr | a logical value. If 'TRUE' the function returns thecorrelation matrix, otherwise it returns the variance/covariancematrix. |
fixed | an optional logical value indicating whether thecoefficients should be allowed to vary in the optimization, ok keptfixed at their initial value. Defaults to 'FALSE', in which case thecoefficients are allowed to vary. |
form | a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups. |
covariate | an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object). |
unconstrained | a logical value. If 'TRUE' the coefficients are returnedin unconstrained form (the same used in the optimizationalgorithm). If 'FALSE' the coefficients are returned in"natural", possibly constrained, form. Defaults to 'TRUE' |
... | some methods for these generics require additional arguments.None are used in these methods. |
Value
An object of classcorMartins or the alpha coefficient from an object of this classor the correlation matrix of an initialized object of this class.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
References
Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparativemethod: a general approach to incorporating phylogenetic informationinto the analysis of interspecific data.American Naturalist,149, 646–667.
See Also
Pagel's “lambda” Correlation Structure
Description
The correlation structure from the present model is derived from theBrownian motion model by multiplying the off-diagonal elements (i.e.,the covariances) by\lambda. The variances are thus thesame than for a Brownian motion model.
Usage
corPagel(value, phy, form = ~1, fixed = FALSE)## S3 method for class 'corPagel'corMatrix(object, covariate = getCovariate(object), corr = TRUE, ...)## S3 method for class 'corPagel'coef(object, unconstrained = TRUE, ...)Arguments
value | the (initial) value of the parameter |
phy | an object of class |
form | a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups. |
fixed | a logical specifying whether |
object | an (initialized) object of class |
covariate | an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object). |
corr | a logical value specifying whether to return thecorrelation matrix (the default) or the variance-covariance matrix. |
unconstrained | a logical value. If |
... | further arguments passed to or from other methods. |
Value
an object of class"corPagel", the coefficients from an objectof this class, or the correlation matrix of an initialized object ofthis class. In most situations, onlycorPagel will be calledby the user.
Author(s)
Emmanuel Paradis
References
Freckleton, R. P., Harvey, P. H. and M. Pagel, M. (2002) Phylogeneticanalysis and comparative data: a test and review of evidence.American Naturalist,160, 712–726.
Pagel, M. (1999) Inferring the historical patterns of biologicalevolution.Nature,401,877–884.
Correlations among Multiple Traits with Phylogenetic Signal
Description
This function calculates Pearson correlation coefficients for multiple continuous traits that may have phylogenetic signal, allowing users to specify measurement error as the standard error of trait values at the tips of the phylogenetic tree. Phylogenetic signal for each trait is estimated from the data assuming that trait evolution is given by a Ornstein-Uhlenbeck process. Thus, the function allows the estimation of phylogenetic signal in multiple traits while incorporating correlations among traits. It is also possible to include independent variables (covariates) for each trait to remove possible confounding effects. corphylo() returns the correlation matrix for trait values, estimates of phylogenetic signal for each trait, and regression coefficients for independent variables affecting each trait.
Usage
corphylo(X, U = list(), SeM = NULL, phy = NULL, REML = TRUE,method = c("Nelder-Mead", "SANN"), constrain.d = FALSE, reltol = 10^-6,maxit.NM = 1000, maxit.SA = 1000, temp.SA = 1, tmax.SA = 1, verbose = FALSE)## S3 method for class 'corphylo'print(x, digits = max(3, getOption("digits") - 3), ...)Arguments
X | a n x p matrix with p columns containing the values for the n taxa. Rows of X should have rownames matching the taxon names in phy. |
U | a list of p matrices corresponding to the p columns of X, with each matrix containing independent variables for the corresponding column of X. The rownames of each matrix within U must be the same as X, or alternatively, the order of values in rows must match those in X. If U is omitted, only the mean (aka intercept) for each column of X is estimated. If U[[i]] is NULL, only an intercept is estimated for X[, i]. If all values of U[[i]][j] are the same, this variable is automatically dropped from the analysis (i.e., there is no offset in the regression component of the model). |
SeM | a n x p matrix with p columns containing standard errors of the trait values in X. The rownames of SeM must be the same as X, or alternatively, the order of values in rows must match those in X. If SeM is omitted, the trait values are assumed to be known without error. If only some traits have mesurement errors, the remaining traits can be given zero-valued standard errors. |
phy | a phylo object giving the phylogenetic tree. The rownames of phy must be the same as X, or alternatively, the order of values in rows must match those in X. |
REML | whether REML or ML is used for model fitting. |
method | in optim(), either Nelder-Mead simplex minimization or SANN (simulated annealing) minimization is used. If SANN is used, it is followed by Nelder-Mead minimization. |
constrain.d | if constrain.d is TRUE, the estimates of d are constrained to be between zero and 1. This can make estimation more stable and can be tried if convergence is problematic. This does not necessarily lead to loss of generality of the results, because before using corphylo, branch lengths of phy can be transformed so that the "starter" tree has strong phylogenetic signal. |
reltol | a control parameter dictating the relative tolerance for convergence in the optimization; see optim(). |
maxit.NM | a control parameter dictating the maximum number of iterations in the optimization with Nelder-Mead minimization; see optim(). |
maxit.SA | a control parameter dictating the maximum number of iterations in the optimization with SANN minimization; see optim(). |
temp.SA | a control parameter dictating the starting temperature in the optimization with SANN minimization; see optim(). |
tmax.SA | a control parameter dictating the number of function evaluations at each temperature in the optimization with SANN minimization; see optim(). |
verbose | if TRUE, the model logLik and running estimates of thecorrelation coefficients and values of d are printed each iterationduring optimization. |
x | an objects of class corphylo. |
digits | the number of digits to be printed. |
... | arguments passed to and from other methods. |
Details
For the case of two variables, the function estimates parameters for the model of the form, for example,
X[1] = B[1,0] + B[1,1] * u[1,1] + \epsilon[1]
X[2] = B[2,0] + B[2,1] * u[2,1] + \epsilon[2]
\epsilon ~ Gaussian(0, V)
whereB[1,0],B[1,1],B[2,0], andB[2,1] are regression coefficients, andV is a variance-covariance matrix containing the correlation coefficient r, parameters of the OU processd1 andd2, and diagonal matricesM1 andM2 of measurement standard errors forX[1] andX[2]. The matrixV is2n x 2n, withn x n blocks given by
V[1,1] = C[1,1](d1) + M1
V[1,2] = C[1,2](d1,d2)
V[2,1] = C[2,1](d1,d2)
V[2,2] = C[2,2](d2) + M2
whereC[i,j](d1,d2) are derived from phy under the assumption of joint OU evolutionary processes for each trait (see Zheng et al. 2009). This formulation extends in the obvious way to more than two traits.
Value
An object of class "corphylo".
cor.matrix | the p x p matrix of correlation coefficients. |
d | values of d from the OU process for each trait. |
B | estimates of the regression coefficients, including intercepts. Coefficients are named according to the list U. For example, B1.2 is the coefficient corresponding to U[[1]][, 2], and if column 2 in U[[1]] is named "colname2", then the coefficient will be B1.colname2. Intercepts have the form B1.0. |
B.se | standard errors of the regression coefficients. |
B.cov | covariance matrix for regression coefficients. |
B.zscore | Z scores for the regression coefficients. |
B.pvalue | tests for the regression coefficients being different from zero. |
logLik | he log likelihood for either the restricted likelihood (REML = TRUE) or the overall likelihood (REML = FALSE). |
AIC | AIC for either the restricted likelihood (REML = TRUE) or the overall likelihood (REML = FALSE). |
BIC | BIC for either the restricted likelihood (REML = TRUE) or the overall likelihood (REML = FALSE). |
REML | whether REML is used rather than ML (TRUE or FALSE). |
constrain.d | whether or not values of d were constrained to be between 0 and 1 (TRUE or FALSE). |
XX | values of X in vectorized form, with each trait X[, i] standardized to have mean zero and standard deviation one. |
UU | design matrix with values in UU corresponding to XX; each variable U[[i]][, j] is standardized to have mean zero and standard deviation one. |
MM | vector of measurement standard errors corresponding to XX, with the standard errors suitably standardized. |
Vphy | the phylogenetic covariance matrix computed from phy and standardized to have determinant equal to one. |
R | covariance matrix of trait values relative to the standardized values of XX. |
V | overall estimated covariance matrix of residuals for XX including trait correlations, phylogenetic signal, and measurement error variances. This matrix can be used to simulate data for parametric bootstrapping. See examples. |
C | matrix V excluding measurement error variances. |
convcode | he convergence code provided by optim(). |
niter | number of iterations performed by optim(). |
Author(s)
Anthony R. Ives
References
Zheng, L., A. R. Ives, T. Garland, B. R. Larget, Y. Yu, and K. F. Cao. 2009. New multivariate tests for phylogenetic signal and trait correlations applied to ecophysiological phenotypes of nineManglietia species.Functional Ecology23:1059–1069.
Examples
## Simple example using data without correlations or phylogenetic## signal. This illustrates the structure of the input data.phy <- rcoal(10, tip.label = 1:10)X <- matrix(rnorm(20), nrow = 10, ncol = 2)rownames(X) <- phy$tip.labelU <- list(NULL, matrix(rnorm(10, mean = 10, sd = 4), nrow = 10, ncol = 1))rownames(U[[2]]) <- phy$tip.labelSeM <- matrix(c(0.2, 0.4), nrow = 10, ncol = 2)rownames(SeM) <- phy$tip.labelcorphylo(X = X, SeM = SeM, U = U, phy = phy, method = "Nelder-Mead")## Not run: ## Simulation example for the correlation between two variables. The## example compares the estimates of the correlation coefficients from## corphylo when measurement error is incorporated into the analyses with## three other cases: (i) when measurement error is excluded, (ii) when## phylogenetic signal is ignored (assuming a "star" phylogeny), and (iii)## neither measurement error nor phylogenetic signal are included.## In the simulations, variable 2 is associated with a single## independent variable. This requires setting up a list U that has 2## elements: element U[[1]] is NULL and element U[[2]] is a n x 1 vector## containing simulated values of the independent variable.# Set up parameter values for simulating datan <- 50phy <- rcoal(n, tip.label = 1:n)R <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2)d <- c(0.3, .95)B2 <- 1Se <- c(0.2, 1)SeM <- matrix(Se, nrow = n, ncol = 2, byrow = T)rownames(SeM) <- phy$tip.label# Set up needed matrices for the simulationsp <- length(d)star <- stree(n)star$edge.length <- array(1, dim = c(n, 1))star$tip.label <- phy$tip.labelVphy <- vcv(phy)Vphy <- Vphy/max(Vphy)Vphy <- Vphy/exp(determinant(Vphy)$modulus[1]/n)tau <- matrix(1, nrow = n, ncol = 1) C <- matrix(0, nrow = p * n, ncol = p * n)for (i in 1:p) for (j in 1:p) {Cd <- (d[i]^tau * (d[j]^t(tau)) * (1 - (d[i] * d[j])^Vphy))/(1 - d[i] * d[j])C[(n * (i - 1) + 1):(i * n), (n * (j - 1) + 1):(j * n)] <- R[i, j] * Cd}MM <- matrix(SeM^2, ncol = 1)V <- C + diag(as.numeric(MM))## Perform a Cholesky decomposition of Vphy. This is used to generate## phylogenetic signal: a vector of independent normal random variables,## when multiplied by the transpose of the Cholesky deposition of Vphy will## have covariance matrix equal to Vphy.iD <- t(chol(V))# Perform Nrep simulations and collect the resultsNrep <- 100cor.list <- matrix(0, nrow = Nrep, ncol = 1)cor.noM.list <- matrix(0, nrow = Nrep, ncol = 1)cor.noP.list <- matrix(0, nrow = Nrep, ncol = 1)cor.noMP.list <- matrix(0, nrow = Nrep, ncol = 1)d.list <- matrix(0, nrow = Nrep, ncol = 2)d.noM.list <- matrix(0, nrow = Nrep, ncol = 2)B.list <- matrix(0, nrow = Nrep, ncol = 3)B.noM.list <- matrix(0, nrow = Nrep, ncol = 3)B.noP.list <- matrix(0, nrow = Nrep, ncol = 3)for (rep in 1:Nrep) {XX <- iD X <- matrix(XX, nrow = n, ncol = 2)rownames(X) <- phy$tip.labelU <- list(NULL, matrix(rnorm(n, mean = 2, sd = 10), nrow = n, ncol = 1))rownames(U[[2]]) <- phy$tip.labelcolnames(U[[2]]) <- "V1"X[,2] <- X[,2] + B2[1] * U[[2]][,1] - B2[1] * mean(U[[2]][,1])z <- corphylo(X = X, SeM = SeM, U = U, phy = phy, method = "Nelder-Mead")z.noM <- corphylo(X = X, U = U, phy = phy, method = "Nelder-Mead")z.noP <- corphylo(X = X, SeM = SeM, U = U, phy = star, method = "Nelder-Mead")cor.list[rep] <- z$cor.matrix[1, 2]cor.noM.list[rep] <- z.noM$cor.matrix[1, 2]cor.noP.list[rep] <- z.noP$cor.matrix[1, 2]cor.noMP.list[rep] <- cor(cbind(lm(X[,1] ~ 1)$residuals, lm(X[,2] ~ U[[2]])$residuals))[1,2]d.list[rep, ] <- z$dd.noM.list[rep, ] <- z.noM$dB.list[rep, ] <- z$BB.noM.list[rep, ] <- z.noM$BB.noP.list[rep, ] <- z.noP$Bshow(c(rep, z$convcode, z$cor.matrix[1, 2], z$d))}correlation <- rbind(R[1, 2], mean(cor.list), mean(cor.noM.list), mean(cor.noP.list), mean(cor.noMP.list))rownames(correlation) <- c("True", "With SeM and Phy", "Without SeM", "Without Phy", "Without Phy or SeM")correlationsignal.d <- rbind(d, colMeans(d.list), colMeans(d.noM.list))rownames(signal.d) <- c("True", "With SeM and Phy", "Without SeM")signal.dest.B <- rbind(c(0, 0, B2), colMeans(B.list), colMeans(B.noM.list), colMeans(B.noP.list))rownames(est.B) <- c("True", "With SeM and Phy", "Without SeM", "Without Phy")colnames(est.B) <- rownames(z$B)est.B# Example simulation output# correlation # [,1]# True 0.7000000# With SeM and Phy 0.7055958# Without SeM 0.3125253# Without Phy 0.4054043# Without Phy or SeM 0.3476589# signal.d # [,1] [,2]# True 0.300000 0.9500000# With SeM and Phy 0.301513 0.9276663# Without SeM 0.241319 0.4872675# est.B # B1.0 B2.0 B2.V1# True 0.00000000 0.0000000 1.0000000# With SeM and Phy -0.01285834 0.2807215 0.9963163# Without SeM 0.01406953 0.3059110 0.9977796# Without Phy 0.02139281 0.3165731 0.9942140## End(Not run)Phylogenetic Correlogram
Description
This function computes a correlogram from taxonomic levels.
Usage
correlogram.formula(formula, data = NULL, use = "all.obs")Arguments
formula | a formula of the type |
data | a data frame containing the variables specified in theformula. If |
use | a character string specifying how to handle missingvalues (i.e., |
Details
See the vignette in R:vignette("MoranI").
Value
An object of classcorrelogram which is a data frame with threecolumns:
obs | the computed Moran's I |
p.values | the corresponding P-values |
labels | the names of each level |
or an object of classcorrelogramList containing a list ofobjects of classcorrelogram if several variables are given asresponse informula.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de andEmmanuel Paradis
See Also
Examples
data(carnivora)### Using the formula interface:co <- correlogram.formula(SW ~ Order/SuperFamily/Family/Genus, data=carnivora)coplot(co)### Several correlograms on the same plot:cos <- correlogram.formula(SW + FW ~ Order/SuperFamily/Family/Genus, data=carnivora)cosplot(cos)NEXUS Data Example
Description
Example of Protein data in NEXUS format (Maddison et al., 1997).Data is written in interleaved format using a single DATA block.Original data from Rokas et al (2002).
Usage
data(cynipids)Format
ASCII text in NEXUS format
References
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.
Rokas, A., Nylander, J. A. A., Ronquist, F. and Stone, G. N. (2002) Amaximum likelihood analysis of eight phylogenetic markers in Gallwasps(Hymenoptera: Cynipidae): implications for insect phylogeneticstudies.Molecular Phylogenetics and Evolution,22,206–219.
Probability Density Under Birth–Death Models
Description
These functions compute the probability density under somebirth–death models, that is the probability of obtainingxspecies after a timet giving how speciation and extinctionprobabilities vary through time (these may be constant, or even equalto zero for extinction).
Usage
dyule(x, lambda = 0.1, t = 1, log = FALSE)dbd(x, lambda, mu, t, conditional = FALSE, log = FALSE)dbdTime(x, birth, death, t, conditional = FALSE, BIRTH = NULL, DEATH = NULL, fast = FALSE)Arguments
x | a numeric vector of species numbers (see Details). |
lambda | a numerical value giving the probability of speciation;can be a vector with several values for |
mu | id. for extinction. |
t | id. for the time(s). |
log | a logical value specifying whether the probabilities shouldbe returned log-transformed; the default is |
conditional | a logical specifying whether the probabilitiesshould be computed conditional under the assumption of no extinctionafter time |
birth,death | a (vectorized) function specifying how thespeciation or extinction probability changes through time (see |
BIRTH,DEATH | a (vectorized) function giving the primitiveof |
fast | a logical value specifying whether to use fasterintegration (see |
Details
These three functions compute the probabilities to observexspecies starting from a single one after timet (assumed to becontinuous). The first function is a short-cut for the second one withmu = 0 and with default values for the two other arguments.dbdTime is for time-varyinglambda andmuspecified asR functions.
dyule is vectorized simultaneously on its three argumentsx,lambda, andt, according toR's rules ofrecycling arguments.dbd is vectorized simultaneouslyxandt (to make likelihood calculations easy), anddbdTime is vectorized only onx; the other arguments areeventually shortened with a warning if necessary.
The returned value is, logically, zero for values ofx out ofrange, i.e., negative or zero fordyule or ifconditional = TRUE. However, it is not checked if the values ofx arepositive non-integers and the probabilities are computed and returned.
The details on the form of the argumentsbirth,death,BIRTH,DEATH, andfast can be found in the linksbelow.
Value
a numeric vector.
Note
If you use these functions to calculate a likelihood function, it isstrongly recommended to compute the log-likelihood with, for instancein the case of a Yule process,sum(dyule( , log = TRUE)) (seeexamples).
Author(s)
Emmanuel Paradis
References
Kendall, D. G. (1948) On the generalized “birth-and-death”process.Annals of Mathematical Statistics,19, 1–15.
See Also
Examples
x <- 0:10plot(x, dyule(x), type = "h", main = "Density of the Yule process")text(7, 0.85, expression(list(lambda == 0.1, t == 1)))y <- dbd(x, 0.1, 0.05, 10)z <- dbd(x, 0.1, 0.05, 10, conditional = TRUE)d <- rbind(y, z)colnames(d) <- xbarplot(d, beside = TRUE, ylab = "Density", xlab = "Number of species", legend = c("unconditional", "conditional on\nno extinction"), args.legend = list(bty = "n"))title("Density of the birth-death process")text(17, 0.4, expression(list(lambda == 0.1, mu == 0.05, t == 10)))## Not run: ### generate 1000 values from a Yule process with lambda = 0.05x <- replicate(1e3, Ntip(rlineage(0.05, 0)))### the correct way to calculate the log-likelihood...:sum(dyule(x, 0.05, 50, log = TRUE))### ... and the wrong way:log(prod(dyule(x, 0.05, 50)))### a third, less preferred, way:sum(log(dyule(x, 0.05, 50)))## End(Not run)Definition of Vectors for Plotting or Annotating
Description
This function can be used to define vectors to annotate a set of taxonnames, labels, etc. It should facilitate the (re)definition of coloursor similar attributes for plotting trees or other graphics.
Usage
def(x, ..., default = NULL, regexp = FALSE)Arguments
x | a vector of mode character. |
... | a series of statements defining the attributes. |
default | the default to be used (see details). |
regexp | a logical value specifying whether the statementsdefined in |
Details
The idea of this function is to make the definition of colours, etc.,simpler than what is done usually. A typical use is:
def(tr$tip.label, Homo_sapiens = "blue")
which will return a vector of character strings all "black" except onematching the tip label "Homo_sapiens" which will be "blue". Another usecould be:
def(tr$tip.label, Homo_sapiens = 2)
which will return a vector a numerical values all 1 except for"Homo_sapiens" which will be 2. Several definitions can be done, e.g.:
def(tr$tip.label, Homo_sapiens = "blue", Pan_paniscus = "red")
The default value is determined with respect to the mode of the valuesgiven with the... (either "black" or 1).
Ifregexp = TRUE is used, then the names of the statements must bequoted, e.g.:
def(tr$tip.label, "^Pan_" = "red", regexp = TRUE)
will return "red" for all labels starting with "Pan_".
Value
a vector of the same length thanx.
Author(s)
Emmanuel Paradis
Examples
data(bird.orders)a <- def(bird.orders$tip.label, Galliformes = 2)str(a) # numericplot(bird.orders, font = a)co <- def(bird.orders$tip.label, Passeriformes = "red", Trogoniformes = "blue")str(co) # characterplot(bird.orders, tip.color = co)### use of a regexp (so we need to quote it) to colour all orders### with names starting with "C" (and change the default):co2 <- def(bird.orders$tip.label, "^C" = "gold", default = "grey", regexp = TRUE)plot(bird.orders, tip.color = co2)Vertex Degrees in Trees and Networks
Description
degree is a generic function to calculate the degree of allnodes in a tree or in a network.
Usage
degree(x, ...)## S3 method for class 'phylo'degree(x, details = FALSE, ...)## S3 method for class 'evonet'degree(x, details = FALSE, ...)Arguments
x | an object (tree, network, ...). |
details | whether to return the degree of each node in the tree,or a summary table (the default). |
... | arguments passed to methods. |
Details
The degree of a node (or vertex) in a network is defined by the numberof branches (or edges) that connect to this node. In a phylogenetictree, the tips (or terminal nodes) are of degree one, and the(internal) nodes are of degree two or more.
There are currently two methods for the classes"phylo" and"evonet". The default of these functions is to return a summarytable with the degrees observed in the tree or network in the firstcolumn, and the number of nodes in the second column. Ifdetails = TRUE, a vector giving the degree of each node (as numbered in theedge matrix) is returned.
The validity of the object is not checked, sodegree can beused to check problems with badly conformed trees.
Value
a data frame ifdetails = FALSE, or a vector of integersotherwise.
Author(s)
Emmanuel Paradis
See Also
Examples
data(bird.orders)degree(bird.orders)degree(bird.orders, details = TRUE)data(bird.families)degree(bird.families)degree(rtree(10)) # 10, 1, 8degree(rtree(10, rooted = FALSE)) # 10, 0, 8degree(stree(10)) # 10 + 1 node of degree 10Delete Alignment Gaps in DNA or AA Sequences
Description
These functions remove gaps ("-") in a sample of DNA sequences.
Usage
del.gaps(x)del.colgapsonly(x, threshold = 1, freq.only = FALSE)del.rowgapsonly(x, threshold = 1, freq.only = FALSE)Arguments
x | a matrix, a list, or a vector containing the DNA or AAsequences; only matrices for |
threshold | the largest gap proportion to delete the column or row. |
freq.only | if |
Details
del.gaps remove all gaps, so the returned sequences may nothave all the same lengths and are therefore returned in a list.
del.colgapsonly removes the columns with a proportion at leastthreshold of gaps. Thus by default, only the columns with gapsonly are removed (useful when a small matrix is extracted from a largealignment).del.rowgapsonly does the same for the rows.
The class of the input sequences is respected and kept unchanged,unless it contains neither"DNAbin" nor"AAbin" in whichcase the object is first converted into the class"DNAbin".
Value
del.gaps returns a vector (if there is only one input sequence)or a list of sequences;del.colgapsonly anddel.rowgapsonly return a matrix of sequences or a numericvector (with names for the second function) iffreq.only = TRUE.
Author(s)
Emmanuel Paradis
See Also
base.freq,seg.sites,image.DNAbin,checkAlignment
Delta Plots
Description
This function makes a\delta plot following Holland etal. (2002).
Usage
delta.plot(X, k = 20, plot = TRUE, which = 1:2)Arguments
X | a distance matrix, may be an object of class “dist”. |
k | an integer giving the number of intervals in the plot. |
plot | a logical specifying whether to draw the |
which | a numeric vector indicating which plots are done; 1: thehistogram of the |
Details
See Holland et al. (2002) for details and interpretation.
The computing time of this function is proportional to the fourthpower of the number of observations (O(n^4)), so calculationsmay be very long with only a slight increase in sample size.
Value
This function returns invisibly a named list with two components:
counts: the counts for the histogram of
\delta_qvaluesdelta.bar: the mean
\deltavalue for eachobservation
Author(s)
Emmanuel Paradis
References
Holland, B. R., Huber, K. T., Dress, A. and Moulton, V. (2002) Deltaplots: a tool for analyzing phylogenetic distance data.Molecular Biology and Evolution,12, 2051–2059.
See Also
Examples
data(woodmouse)d <- dist.dna(woodmouse)delta.plot(d)layout(1)delta.plot(d, 40, which = 1)Pairwise Distances from DNA Sequences
Description
This function computes a matrix of pairwise distances from DNAsequences using a model of DNA evolution. Eleven substitution models(and the raw distance) are currently available.
Usage
dist.dna(x, model = "K80", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = FALSE)Arguments
x | a matrix or a list containing the DNA sequences; this must beof class |
model | a character string specifying the evolutionary model to beused; must be one of |
variance | a logical indicating whether to compute the variancesof the distances; defaults to |
gamma | a value for the gamma parameter possibly used to apply acorrection to the distances (by default no correction is applied). |
pairwise.deletion | a logical indicating whether to delete thesites with missing data in a pairwise way. The default is to deletethe sites with at least one missing data for all sequences (ignoredif |
base.freq | the base frequencies to be used in the computations(if applicable). By default, the base frequencies are computed fromthe whole set of sequences. |
as.matrix | a logical indicating whether to return the results asa matrix. The default is to return an object of classdist. |
Details
The molecular evolutionary models available through the optionmodel have been extensively described in the literature. Abrief description is given below; more details can be found in thereferences.
raw,N: This is simply the proportion or the number ofsites that differ between each pair of sequences. This may be usefulto draw “saturation plots”. The optionsvarianceandgammahave no effect, butpairwise.deletionmay have.TS,TV: These are the numbers of transitions andtransversions, respectively.JC69: This model was developed by Jukes and Cantor (1969). Itassumes that all substitutions (i.e. a change of a base by anotherone) have the same probability. This probability is the same for allsites along the DNA sequence. This last assumption can be relaxed byassuming that the substition rate varies among site following agamma distribution which parameter must be given by the user. Bydefault, no gamma correction is applied. Another assumption is thatthe base frequencies are balanced and thus equal to 0.25.K80: The distance derived by Kimura (1980), sometimes referredto as “Kimura's 2-parameters distance”, has the same underlyingassumptions than the Jukes–Cantor distance except that two kinds ofsubstitutions are considered: transitions (A <-> G, C <-> T), andtransversions (A <-> C, A <-> T, C <-> G, G <-> T). They are assumedto have different probabilities. A transition is the substitution ofa purine (C, T) by another one, or the substitution of a pyrimidine(A, G) by another one. A transversion is the substitution of apurine by a pyrimidine, or vice-versa. Both transition andtransversion rates are the same for all sites along the DNAsequence. Jin and Nei (1990) modified the Kimura model to allow forvariation among sites following a gamma distribution. Like for theJukes–Cantor model, the gamma parameter must be given by theuser. By default, no gamma correction is applied.F81: Felsenstein (1981) generalized the Jukes–Cantor modelby relaxing the assumption of equal base frequencies. The formulaeused in this function were taken from McGuire et al. (1999).K81: Kimura (1981) generalized his model (Kimura 1980) byassuming different rates for two kinds of transversions: A <-> C andG <-> T on one side, and A <-> T and C <-> G on the other. This iswhat Kimura called his “three substitution types model” (3ST), andis sometimes referred to as “Kimura's 3-parameters distance”.F84: This model generalizes K80 by relaxing the assumptionof equal base frequencies. It was first introduced by Felsenstein in1984 in Phylip, and is fully described by Felsenstein and Churchill(1996). The formulae used in this function were taken from McGuireet al. (1999).BH87: Barry and Hartigan (1987) developed a distance basedon the observed proportions of changes among the four bases. Thisdistance is not symmetric.T92: Tamura (1992) generalized the Kimura model by relaxingthe assumption of equal base frequencies. This is done by takinginto account the bias in G+C content in the sequences. Thesubstitution rates are assumed to be the same for all sites alongthe DNA sequence.TN93: Tamura and Nei (1993) developed a model which assumesdistinct rates for both kinds of transition (A <-> G versus C <->T), and transversions. The base frequencies are not assumed to beequal and are estimated from the data. A gamma correction of theinter-site variation in substitution rates is possible.GG95: Galtier and Gouy (1995) introduced a model where theG+C content may change through time. Different rates are assumed fortransitons and transversions.logdet: The Log-Det distance, developed by Lockhart etal. (1994), is related to BH87. However, this distance issymmetric. Formulae from Gu and Li (1996) are used.dist.logdetinphangorn uses a differentimplementation that gives substantially different distances forlow-diverging sequences.paralin: Lake (1994) developed the paralinear distance whichcan be viewed as another variant of the Barry–Hartigan distance.indel: this counts the number of sites where there is aninsertion/deletion gap in one sequence and not in the other.indelblock: same than before but contiguous gaps arecounted as a single unit. Note that the distance between-A-andA--is 3 because there are three different blocks of gaps, whereasthe “indel” distance will be 2.
Value
an object of classdist (by default), or a numericmatrix ifas.matrix = TRUE. Ifmodel = "BH87", a numericmatrix is returned because the Barry–Hartigan distance is notsymmetric.
Ifvariance = TRUE an attribute called"variance" isgiven to the returned object.
Note
If the sequences are very different, most evolutionary distances areundefined and a non-finite value (Inf or NaN) is returned. You may dodist.dna(, model = "raw") to check whether some values arehigher than 0.75.
Author(s)
Emmanuel Paradis
References
Barry, D. and Hartigan, J. A. (1987) Asynchronous distance betweenhomologous DNA sequences.Biometrics,43, 261–276.
Felsenstein, J. (1981) Evolutionary trees from DNA sequences: amaximum likelihood approach.Journal of Molecular Evolution,17, 368–376.
Felsenstein, J. and Churchill, G. A. (1996) A Hidden Markov modelapproach to variation among sites in rate of evolution.Molecular Biology and Evolution,13, 93–104.
Galtier, N. and Gouy, M. (1995) Inferring phylogenies from DNAsequences of unequal base compositions.Proceedings of theNational Academy of Sciences USA,92, 11317–11321.
Gu, X. and Li, W.-H. (1996) Bias-corrected paralinear and LogDetdistances and tests of molecular clocks and phylogenies undernonstationary nucleotide frequencies.Molecular Biology andEvolution,13, 1375–1383.
Jukes, T. H. and Cantor, C. R. (1969) Evolution of proteinmolecules. inMammalian Protein Metabolism, ed. Munro, H. N.,pp. 21–132, New York: Academic Press.
Kimura, M. (1980) A simple method for estimating evolutionary rates ofbase substitutions through comparative studies of nucleotidesequences.Journal of Molecular Evolution,16, 111–120.
Kimura, M. (1981) Estimation of evolutionary distances betweenhomologous nucleotide sequences.Proceedings of the NationalAcademy of Sciences USA,78, 454–458.
Jin, L. and Nei, M. (1990) Limitations of the evolutionary parsimonymethod of phylogenetic analysis.Molecular Biology andEvolution,7, 82–102.
Lake, J. A. (1994) Reconstructing evolutionary trees from DNA andprotein sequences: paralinear distances.Proceedings of theNational Academy of Sciences USA,91, 1455–1459.
Lockhart, P. J., Steel, M. A., Hendy, M. D. and Penny, D. (1994)Recovering evolutionary trees under a more realistic model of sequenceevolution.Molecular Biology and Evolution,11,605–602.
McGuire, G., Prentice, M. J. and Wright, F. (1999). Improved errorbounds for genetic distances from DNA sequences.Biometrics,55, 1064–1070.
Tamura, K. (1992) Estimation of the number of nucleotide substitutionswhen there are strong transition-transversion and G + C-contentbiases.Molecular Biology and Evolution,9, 678–687.
Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotidesubstitutions in the control region of mitochondrial DNA in humans andchimpanzees.Molecular Biology and Evolution,10, 512–526.
See Also
read.GenBank,read.dna,write.dna,DNAbin,dist.gene,cophenetic.phylo,dist
Pairwise Distances from Genetic Data
Description
This function computes a matrix of distances between pairs ofindividuals from a matrix or a data frame of genetic data.
Usage
dist.gene(x, method = "pairwise", pairwise.deletion = FALSE, variance = FALSE)Arguments
x | a matrix or a data frame (will be coerced as a matrix). |
method | a character string specifying the method used to computethe distances; two choices are available: |
pairwise.deletion | a logical indicating whether to delete thecolumns with missing data on a pairwise basis. The default is todelete the columns with at least one missing observation. |
variance | a logical, indicates whether the variance of thedistances should be returned (default to |
Details
This function is meant to be very general and accepts different kindsof data (alleles, haplotypes, SNP, DNA sequences, ...). The rows ofthe data matrix represent the individuals, and the columns the loci.
In the case of the pairwise method, the distanced between twoindividuals is the number of loci for which they differ, and theassociated variance isd(L - d)/L, whereL is the numberof loci.
In the case of the percentage method, this distance is divided byL,and the associated variance isd(1 - d)/L.
For more elaborate distances with DNA sequences, see the functiondist.dna.
Value
an object of classdist. Ifvariance = TRUE anattribute called"variance" is given to the returned object.
Note
Missing data (NA) are coded and treated in R's usual way.
Author(s)
Emmanuel Paradis
See Also
dist.dna,cophenetic.phylo,dist
Topological Distances Between Two Trees
Description
This function computes the topological distance between twophylogenetic trees or among trees in a list (ify = NULL usingdifferent methods.
Usage
dist.topo(x, y = NULL, method = "PH85", mc.cores = 1)Arguments
x | an object of class |
y | an (optional) object of class |
method | a character string giving the method to be used: either |
mc.cores | the number of cores (CPUs) to be used (passed toparallel). |
Details
Two methods are available: the one by Penny and Hendy (1985,originally from Robinson and Foulds 1981), and the branch length scoreby Kuhner and Felsenstein (1994). The trees are always considered asunrooted.
The topological distance is defined as twice the number of internalbranches defining different bipartitions of the tips (Robinson andFoulds 1981; Penny and Hendy 1985). Rzhetsky and Nei (1992) proposed amodification of the original formula to take multifurcations intoaccount.
The branch length score may be seen as similar to the previousdistance but taking branch lengths into account. Kuhner andFelsenstein (1994) proposed to calculate the square root of the sum ofthe squared differences of the (internal) branch lengths definingsimilar bipartitions (or splits) in both trees.
Value
a single numeric value if bothx andy are used, anobject of class"dist" otherwise.
Note
The geodesic distance of Billera et al. (2001) has been disabled: seethe packagedistory on CRAN.
Author(s)
Emmanuel Paradis
References
Billera, L. J., Holmes, S. P. and Vogtmann, K. (2001) Geometry of thespace of phylogenetic trees.Advances in Applied Mathematics,27, 733–767.
Kuhner, M. K. and Felsenstein, J. (1994) Simulation comparison ofphylogeny algorithms under equal and unequal evolutionary rates.Molecular Biology and Evolution,11, 459–468.
Nei, M. and Kumar, S. (2000)Molecular Evolution andPhylogenetics. Oxford: Oxford University Press.
Penny, D. and Hendy, M. D. (1985) The use of tree comparisonmetrics.Systemetic Zoology,34, 75–82.
Robinson, D. F. and Foulds, L. R. (1981) Comparison of phylogenetictrees.Mathematical Biosciences,53, 131–147.
Rzhetsky, A. and Nei, M. (1992) A simple method for estimating andtesting minimum-evolution trees.Molecular Biology andEvolution,9, 945–967.
See Also
Examples
ta <- rtree(30, rooted = FALSE)tb <- rtree(30, rooted = FALSE)dist.topo(ta, ta) # 0dist.topo(ta, tb) # unlikely to be 0## rmtopology() simulated unrooted trees by default:TR <- rmtopology(100, 10)## these trees have 7 internal branches, so the maximum distance## between two of them is 14:DTR <- dist.topo(TR)table(DTR)Tests of Constant Diversification Rates
Description
This function computes two tests of the distribution of branchingtimes using the Cramér–von Mises and Anderson–Darlinggoodness-of-fit tests. By default, it is assumed that thediversification rate is constant, and an exponential distribution isassumed for the branching times. In this case, the expecteddistribution under this model is computed with a rate estimated fromthe data. Alternatively, the user may specify an expected cumulativedensity function (z): in this case,x andz mustbe of the same length. See the examples for how to compute the latterfrom a sample of expected branching times.
Usage
diversi.gof(x, null = "exponential", z = NULL)Arguments
x | a numeric vector with the branching times. |
null | a character string specifying the null distribution forthe branching times. Only two choices are possible: either |
z | used if |
Details
The Cramér–von Mises and Anderson–Darling testscompare the empirical density function (EDF) of the observations to anexpected cumulative density function. By contrast to theKolmogorov–Smirnov test where the greatest difference between thesetwo functions is used, in both tests all differences are taken intoaccount.
The distributions of both test statistics depend on the nullhypothesis, and on whether or not some parameters were estimated fromthe data. However, these distributions are not known precisely andcritical values were determined by Stephens (1974) usingsimulations. These critical values were used for the present function.
Value
A NULL value is returned, the results are simply printed.
Author(s)
Emmanuel Paradis
References
Paradis, E. (1998) Testing for constant diversification rates usingmolecular phylogenies: a general approach based on statistical testsfor goodness of fit.Molecular Biology and Evolution,15, 476–479.
Stephens, M. A. (1974) EDF statistics for goodness of fit and somecomparisons.Journal of the American Statistical Association,69, 730–737.
See Also
branching.times,diversi.timeltt.plot,birthdeath,yule,yule.cov
Examples
data(bird.families)x <- branching.times(bird.families)### suppose we have a sample of expected branching times `y';### for simplicity, take them from a uniform distribution:y <- runif(500, 0, max(x) + 1) # + 1 to avoid A2 = Inf### now compute the expected cumulative distribution:x <- sort(x)N <- length(x)ecdf <- numeric(N)for (i in 1:N) ecdf[i] <- sum(y <= x[i])/500### finally do the test:diversi.gof(x, "user", z = ecdf)Analysis of Diversification with Survival Models
Description
This functions fits survival models to a set of branching times, someof them may be known approximately (censored). Three models arefitted, Model A assuming constant diversification, Model B assumingthat diversification follows a Weibull law, and Model C assuming thatdiversification changes with a breakpoint at time ‘Tc’. The models arefitted by maximum likelihood.
Usage
diversi.time(x, census = NULL, censoring.codes = c(1, 0), Tc = NULL)Arguments
x | a numeric vector with the branching times. |
census | a vector of the same length than ‘x’ used as anindicator variable; thus, it must have only two values, one codingfor accurately known branching times, and the other for censoredbranching times. This argument can be of any mode (numeric, character,logical), or can even be a factor. |
censoring.codes | a vector of length two giving the codes usedfor |
Tc | a single numeric value specifying the break-point time tofit Model C. If none is provided, then it is set arbitrarily to themean of the analysed branching times. |
Details
The principle of the method is to consider each branching time as anevent: if the branching time is accurately known, then it is a failureevent; if it is approximately knwon then it is a censoring event. Ananalogy is thus made between the failure (or hazard) rate estimated bythe survival models and the diversification rate of the lineage. Timeis here considered from present to past.
Model B assumes a monotonically changing diversification rate. Theparameter that controls the change of this rate is called beta. Ifbeta is greater than one, then the diversification rate decreasesthrough time; if it is lesser than one, the the rate increases throughtime. If beta is equal to one, then Model B reduces to Model A.
Value
A NULL value is returned, the results are simply printed.
Author(s)
Emmanuel Paradis
References
Paradis, E. (1997) Assessing temporal variations in diversificationrates from phylogenies: estimation and hypothesistesting.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,264, 1141–1147.
See Also
branching.times,diversi.gofltt.plot,birthdeath,bd.ext,yule,yule.cov
Diversity Contrast Test
Description
This function performs the diversity contrast test comparing pairs ofsister-clades.
Usage
diversity.contrast.test(x, method = "ratiolog", alternative = "two.sided", nrep = 0, ...)Arguments
x | a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease or decrease diversification rate, and the second one thenumber of species in the sister-clades without the trait. Each rowrepresents a pair of sister-clades. |
method | a character string specifying the kind of test: |
alternative | a character string defining the alternativehypothesis: |
nrep | the number of replications of the randomization test; bydefault, a Wilcoxon test is done. |
... | arguments passed to the function |
Details
Ifmethod = "ratiolog", the test described in Barraclough etal. (1996) is performed. Ifmethod = "proportion", the versionin Barraclough et al. (1995) is used. Ifmethod = "difference",the signed difference is used (Sargent 2004). Ifmethod = "logratio",then this is Wiegmann et al.'s (1993) version. Thesefour tests are essentially different versions of the same test (Vamosiand Vamosi 2005, Vamosi 2007). See Paradis (2012) for a comparison oftheir statistical performance with other tests.
Ifnrep = 0, a Wilcoxon test is done on the species diversitycontrasts with the null hypothesis is that they are distributed aroundzero. Ifnrep > 0, a randomization procedure is done where thesigns of the diversity contrasts are randomly chosen. This is used tocreate a distribution of the test statistic which is compared with theobserved value (the sum of the diversity contrasts).
Value
a single numeric value with theP-value.
Author(s)
Emmanuel Paradis
References
Barraclough, T. G., Harvey, P. H. and Nee, S. (1995) Sexualselection and taxonomic diversity in passerine birds.Proceedings of the Royal Society of London. Series B. BiologicalSciences,259, 211–215.
Barraclough, T. G., Harvey, P. H., and Nee, S. (1996) Rate ofrbcL gene sequence evolution and species diversification inflowering plants (angiosperms).Proceedings of the Royal Societyof London. Series B. Biological Sciences,263, 589–591.
Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.
Sargent, R. D. (2004) Floral symmetry affects speciation rates inangiosperms.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,271, 603–608.
Vamosi, S. M. (2007) Endless tests: guidelines for analysing non-nestedsister-group comparisons. An addendum.Evolutionary EcologyResearch,9, 717.
Vamosi, S. M. and Vamosi, J. C. (2005) Endless tests: guidelines foranalysing non-nested sister-group comparisons.EvolutionaryEcology Research,7, 567–579.
Wiegmann, B., Mitter, C. and Farrell, B. 1993. Diversification ofcarnivorous parasitic insects: extraordinary radiation or specializeddead end?American Naturalist,142, 737–754.
See Also
slowinskiguyer.test,mcconwaysims.testrichness.yule.test
Examples
### data from Vamosi & Vamosi (2005):fleshy <- c(1, 1, 1, 1, 1, 3, 3, 5, 9, 16, 33, 40, 50, 100, 216, 393, 850, 947,1700)dry <- c(2, 64, 300, 89, 67, 4, 34, 10, 150, 35, 2, 60, 81, 1, 3, 1, 11, 1, 18)x <- cbind(fleshy, dry)diversity.contrast.test(x)diversity.contrast.test(x, alt = "g")diversity.contrast.test(x, alt = "g", nrep = 1e4)slowinskiguyer.test(x)mcconwaysims.test(x)dN/dS Ratio
Description
This function computes the pairwise ratios dN/dS for a set of alignedDNA sequences using Li's (1993) method.
Usage
dnds(x, code = 1, codonstart = 1, quiet = FALSE, details = FALSE, return.categories = FALSE)Arguments
x | an object of class |
code | an integer value giving the genetic code to beused. Currently, the codes 1 to 6 are supported. |
codonstart | an integer giving where to start the translation. Thisshould be 1, 2, or 3, but larger values are accepted and have foreffect to start the translation further within the sequence. |
quiet | single logical value: whether to indicate progress ofcalculations. |
details | single logical value (see details). |
return.categories | a logical value: if |
Details
Sinceape 5.6, the degeneracy of each codon is calculateddirectly from the genetic code using the functiontrans. A consequence is that ambiguous bases are ignored(seesolveAmbiguousBases).
Ifdetails = TRUE, a table is printed for each pair ofsequences giving the numbers of transitions and transversions for eachcategory of degeneracy (nondegenerate, twofold, and fourfold). This ishelpful when non-meaningful values are returned (e.g., NaN, Inf,negative values).
Value
an object of class"dist", or a numeric matrix ifreturn.categories = TRUE.
Author(s)
Emmanuel Paradis
References
Li, W.-H. (1993) Unbiased estimation of the rates of synonymous andnonsynonymous substitution.Journal of Molecular Evolution,36, 96–99.
See Also
AAbin,trans,alview,solveAmbiguousBases
Examples
data(woodmouse)res <- dnds(woodmouse, quiet = TRUE) # NOT correctres2 <- dnds(woodmouse, code = 2, quiet = TRUE) # using the correct codeidentical(res, res2) # FALSE...cor(res, res2) # ... but very close## There a few N's in the woodmouse data, but this does not affect## greatly the results:res3 <- dnds(solveAmbiguousBases(woodmouse), code = 2, quiet = TRUE)cor(res, res3)## a simple example showing the usefulness of 'details = TRUE'X <- as.DNAbin(matrix(c("C", "A", "G", "G", "T", "T"), 2, 3))alview(X)dnds(X, quiet = TRUE) # NaNdnds(X, details = TRUE) # only a TV at a nondegenerate siteRemove Tips in a Phylogenetic Tree
Description
drop.tip removes the terminal branches of a phylogenetic tree,possibly removing the corresponding internal branches.keep.tipdoes the opposite operation (i.e., returns the induced tree).
extract.clade does the inverse operation: it keeps all the tipsfrom a given node, and deletes all the other tips.
Usage
drop.tip(phy, tip, ...)## S3 method for class 'phylo'drop.tip(phy, tip, trim.internal = TRUE, subtree = FALSE, root.edge = 0, rooted = is.rooted(phy), collapse.singles = TRUE, interactive = FALSE, ...)## S3 method for class 'multiPhylo'drop.tip(phy, tip, ...)keep.tip(phy, tip, ...)## S3 method for class 'phylo'keep.tip(phy, tip, ...)## S3 method for class 'multiPhylo'keep.tip(phy, tip, ...)extract.clade(phy, node, root.edge = 0, collapse.singles = TRUE, interactive = FALSE)Arguments
phy | an object of class |
tip | a vector of mode numeric or character specifying the tipsto delete. |
trim.internal | a logical specifying whether to delete thecorresponding internal branches. |
subtree | a logical specifying whether to output in the tree howmany tips have been deleted and where. |
root.edge | an integer giving the number of internal branches tobe used to build the new root edge. This has no effect if |
rooted | a logical indicating whether the tree must be treated asrooted or not. This allows to force the tree to be considered asunrooted (see examples). See details about a possible root.edgeelement in the tree. |
collapse.singles | a logical specifying whether to delete theinternal nodes of degree 2. |
node | a node number or label. |
interactive | if |
... | arguments passed from and to methods. |
Details
The argumenttip can be either character or numeric. In thefirst case, it gives the labels of the tips to be deleted; in thesecond case the numbers of these labels in the vectorphy$tip.label are given.
This also applies tonode, but if this argument is characterand the tree has no node label, this results in an error. If more thanone value is given withnode (i.e., a vector of length two ormore), only the first one is used with a warning.
Iftrim.internal = FALSE, the new tips are given"NA" aslabels, unless there are node labels in the tree in which case theyare used.
Ifsubtree = TRUE, the returned tree has one or severalterminal branches named with node labels if available. Otherwise it isindicated how many tips have been removed (with a label"[x_tips]").This is done for as many monophyletic groups that have been deleted.
Note thatsubtree = TRUE impliestrim.internal = TRUE.
To undestand how the optionroot.edge works, see the examplesbelow. Ifrooted = FALSE and the tree has a root edge, thelatter is removed in the output.
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis, Klaus Schliep, Joseph Brown
See Also
Examples
data(bird.families)tip <- c("Eopsaltriidae", "Acanthisittidae", "Pittidae", "Eurylaimidae","Philepittidae", "Tyrannidae", "Thamnophilidae", "Furnariidae","Formicariidae", "Conopophagidae", "Rhinocryptidae", "Climacteridae","Menuridae", "Ptilonorhynchidae", "Maluridae", "Meliphagidae","Pardalotidae", "Petroicidae", "Irenidae", "Orthonychidae","Pomatostomidae", "Laniidae", "Vireonidae", "Corvidae","Callaeatidae", "Picathartidae", "Bombycillidae", "Cinclidae","Muscicapidae", "Sturnidae", "Sittidae", "Certhiidae","Paridae", "Aegithalidae", "Hirundinidae", "Regulidae","Pycnonotidae", "Hypocoliidae", "Cisticolidae", "Zosteropidae","Sylviidae", "Alaudidae", "Nectariniidae", "Melanocharitidae","Paramythiidae","Passeridae", "Fringillidae")plot(drop.tip(bird.families, tip))plot(drop.tip(bird.families, tip, trim.internal = FALSE))data(bird.orders)plot(drop.tip(bird.orders, 6:23, subtree = TRUE))plot(drop.tip(bird.orders, c(1:5, 20:23), subtree = TRUE))plot(drop.tip(bird.orders, c(1:20, 23), subtree = TRUE))plot(drop.tip(bird.orders, c(1:20, 23), subtree = TRUE, rooted = FALSE))### Examples of the use of `root.edge'tr <- read.tree(text = "(A:1,(B:1,(C:1,(D:1,E:1):1):1):1):1;")drop.tip(tr, c("A", "B"), root.edge = 0) # = (C:1,(D:1,E:1):1);drop.tip(tr, c("A", "B"), root.edge = 1) # = (C:1,(D:1,E:1):1):1;drop.tip(tr, c("A", "B"), root.edge = 2) # = (C:1,(D:1,E:1):1):2;drop.tip(tr, c("A", "B"), root.edge = 3) # = (C:1,(D:1,E:1):1):3;Draw Additional Edges on a Plotted Tree
Description
edges draws edges on a plotted tree.fancyarrowsenhancesarrows with triangle and harpoonheads; it can be called fromedges.
Usage
edges(nodes0, nodes1, arrows = 0, type = "classical", ...)fancyarrows(x0, y0, x1, y1, length = 0.25, angle = 30, code = 2, col = par("fg"), lty = par("lty"), lwd = par("lwd"), type = "triangle", ...)Arguments
nodes0,nodes1 | vectors of integers giving the tip and/or nodenumbers where to start and to end the edges (eventually recycled). |
arrows | an integer between 0 and 3; 0: lines (the default); 1:an arrow head is drawn at |
type | if the previous argument is not 0, the type of arrow head: |
x0,y0,x1,y1 | the coordinates of the start and end points for |
length,angle,code,col,lty,lwd | default options similar tothose of |
... | further arguments passed to |
Details
The first function is helpful when drawing reticulations on a phylogeny,especially if computed from the edge matrix.
fancyarrows does not work with log-transformed scale(s).
Author(s)
Emmanuel Paradis
See Also
Examples
set.seed(2)tr <- rcoal(6)plot(tr, "c")edges(10, 9, col = "red", lty = 2)edges(10:11, 8, col = c("blue", "green")) # recycling of 'nodes1'edges(1, 2, lwd = 2, type = "h", arrows = 3, col = "green")nodelabels()Evolutionary Networks
Description
evonet builds a network from a tree of class"phylo". There areprint,plot, andreorder methods as well as a few conversion functions.
Usage
evonet(phy, from, to = NULL)## S3 method for class 'evonet'print(x, ...)## S3 method for class 'evonet'plot(x, col = "blue", lty = 1, lwd = 1, alpha = 0.5, arrows = 0, arrow.type = "classical", ...)## S3 method for class 'evonet'Nedge(phy)## S3 method for class 'evonet'reorder(x, order = "cladewise", index.only = FALSE, ...)## S3 method for class 'evonet'as.phylo(x, ...)## S3 method for class 'evonet'as.networx(x, weight = NA, ...)## S3 method for class 'evonet'as.network(x, directed = TRUE, ...)## S3 method for class 'evonet'as.igraph(x, directed = TRUE, use.labels = TRUE, ...)as.evonet(x, ...)## S3 method for class 'phylo'as.evonet(x, ...)read.evonet(file = "", text = NULL, comment.char = "", ...)write.evonet(x, file = "", ...)Arguments
phy | an object of class |
x | an object of class |
from | a vector (or a matrix if |
to | a vector of the same length than |
col,lty,lwd | colors, line type and width of the reticulations(recycled if necessary). |
alpha | a value between 0 and 1 specifying the transparency ofthe reticulations. |
arrows,arrow.type | see |
order,index.only | see |
weight | a numeric vector giving the weights for thereticulations when converting to the class |
directed | a logical: should the network be considered asdirected? |
use.labels | a logical specifying whether to use the tip and nodelabels when building the network of class |
file,text,comment.char | see |
... | arguments passed to other methods. |
Details
evonet is a constructor function that checks the arguments.
The classes"networx","network", and"igraph"are defined in the packagesphangorn,network, andigraph, respectively.
read.evonet reads networks from files in extended newick format(Cardona et al. 2008).
Value
an object of classc("evonet", "phylo") which is made of anobject of class"phylo" plus an elementreticulation coding additional edges among nodes and uses thesame coding rules than theedge matrix.
The conversion functions return an object of the appropriate class.
Author(s)
Emmanuel Paradis, Klaus Schliep
References
Cardona, G., Rossell, F., and Valiente, G. (2008) Extended Newick: itis time for a standard representation of phylogeneticnetworks.BMC Bioinformatics,9, 532.
See Also
as.networx in packagephangorn
Examples
tr <- rcoal(5)(x <- evonet(tr, 6:7, 8:9))plot(x)## simple example of extended Newick format:(enet <- read.evonet(text = "((a:2,(b:1)#H1:1):1,(#H1,c:1):2);"))plot(enet, arrows=1)## from Fig. 2 in Cardona et al. 2008:z <- read.evonet(text ="((1,((2,(3,(4)Y#H1)g)e,(((Y#H1, 5)h,6)f)X#H2)c)a,((X#H2,7)d,8)b)r;")zplot(z)## Not run: if (require(igraph)) { plot(as.igraph(z))}## End(Not run)Incomplete distances and edge weights of unrooted topology
Description
This function implements a method for checking whether an incompleteset of distances satisfy certain conditions that might make ituniquely determine the edge weights of a given topology, T. It printsinformation about whether the graph with vertex set the set of leaves,denoted by X, and edge set the set of non-missing distance pairs,denoted by L, is connected or strongly non-bipartite. It then alsochecks whether L is a triplet cover for T.
Usage
ewLasso(X, phy)Arguments
X | a distance matrix. |
phy | an unrooted tree of class |
Details
Missing values must be represented by eitherNA or a negative value.
This implements a method for checking whether an incomplete set ofdistances satisfies certain conditions that might make it uniquelydetermine the edge weights of a given topology, T. It printsinformation about whether the graph, G, with vertex set the set ofleaves, denoted by X, and edge set the set of non-missing distancepairs, denoted by L, is connected or strongly non-bipartite. It alsochecks whether L is a triplet cover for T. If G is not connected, thenT does not need to be the only topology satisfying the inputincomplete distances. If G is not strongly non-bipartite then theedge-weights of the edges of T are not the unique ones for which theinput distance is satisfied. If L is a triplet cover, then the inputdistance matrix uniquely determines the edge weights of T. See Dresset al. (2012) for details.
Value
NULL, the results are printed in the console.
Author(s)
Andrei Popescu
References
Dress, A. W. M., Huber, K. T., and Steel, M. (2012) ‘Lassoing’ aphylogentic tree I: basic properties, shellings and covers.Journal of Mathematical Biology,65(1), 77–105.
Gamma-Statistic of Pybus and Harvey
Description
This function computes the gamma-statistic which summarizes theinformation contained in the inter-node intervals of a phylogeny. Itis assumed that the tree is ultrametric. Note that the function doesnot check that the tree is effectively ultrametric, so if it is not,the returned result may not be meaningful.
Usage
gammaStat(phy)Arguments
phy | an object of class |
Details
The gamma-statistic is a summary of the information contained in theinter-node intervals of a phylogeny; it follows, under the assumptionthat the clade diversified with constant rates, a normal distributionwith mean zero and standard-deviation unity (Pybus and Harvey2000). Thus, the null hypothesis that the clade diversified withconstant rates may be tested with2*(1 - pnorm(abs(gammaStat(phy)))) for a two-tailed test, or1 - pnorm(abs(gammaStat(phy))) for a one-tailed test, both returningthe corresponding P-value.
Value
a numeric vector of length one.
Author(s)
Emmanuel Paradis
References
Pybus, O. G. and Harvey, P. H. (2000) Testing macro-evolutionarymodels using incomplete molecular phylogenies.Proceedings ofthe Royal Society of London. Series B. Biological Sciences,267, 2267–2272.
See Also
branching.times,ltt.plot,skyline
Read Annotations from GenBank
Description
This function connects to the GenBank database and reads sequence annotationsusing accession number(s) given as argument.
Usage
getAnnotationsGenBank(access.nb, quiet = TRUE)Arguments
access.nb | a vector of mode character giving the accessionnumbers. |
quiet | a logical value indicating whether to show the progressof the downloads. |
Details
The sequence annotations (a.k.a. feature list) are returned in a dataframe with five or six columns: start, end, type, product, others, andgene (the last being optional). This is the same information that canbe downloaded from NCBI's Web interface by clicking on ‘Send to:’,‘File’, and then selecting ‘Feature Table’ under ‘Format’.
A warning is given if some features are incomplete (this informationis then dropped from the returned object).
A warning is given if some accession numbers are not found on GenBank.
Value
One of the followings: (i) a data frame ifaccess.nb contains asingle accession number; (ii) a list of data frames ifaccess.nb contains several accession numbers, the names are setwithaccess.nb (if some accession numbers are not found onGenBank, the corresponding entries are set toNULL); (iii)NULL if all accession numbers are not found on GenBank.
Author(s)
Emmanuel Paradis
References
https://www.ncbi.nlm.nih.gov/Sequin/table.html (Note: itseems this URL is broken; 2022-01-03)
See Also
Examples
## The 8 sequences of tanagers (Ramphocelus):ref <- c("U15717", "U15718", "U15719", "U15720", "U15721", "U15722", "U15723", "U15724")## Copy/paste or type the following commands if you## want to try them.## Not run: annot.rampho <- getAnnotationsGenBank(ref)annot.rampho## check all annotations are the same:unique(do.call(rbind, annot.rampho)[, -5])## End(Not run)Phylogenetic Tree of 193 HIV-1 Sequences
Description
This data set describes an estimated clock-like phylogeny of 193 HIV-1group M sequences sampled in the Democratic Republic of Congo.
Usage
data(hivtree.newick)data(hivtree.table)Format
hivtree.newick is a string with the tree in Newick format.The data framehivtree.table contains the corresponding internodedistances.
Source
This is a data example from Strimmer and Pybus (2001).
References
Strimmer, K. and Pybus, O. G. (2001) Exploring the demographic historyof DNA sequences using the generalized skyline plot.MolecularBiology and Evolution,18, 2298–2305.
See Also
coalescent.intervals,collapsed.intervals
Calculate Numbers of Phylogenetic Trees
Description
howmanytrees calculates the number of possible phylogenetictrees for a given number of tips.
LargeNumber is a utility function to compute (approximately)large numbers from the powera^b.
Usage
howmanytrees(n, rooted = TRUE, binary = TRUE, labeled = TRUE, detail = FALSE)LargeNumber(a, b)## S3 method for class 'LargeNumber'print(x, latex = FALSE, digits = 1, ...)Arguments
n | a positive numeric integer giving the number of tips. |
rooted | a logical indicating whether the trees are rooted(default is |
binary | a logical indicating whether the trees are bifurcating(default is |
labeled | a logical indicating whether the trees have tipslabeled (default is |
detail | a logical indicating whether the eventual intermediatecalculations should be returned (default is |
a,b | two numbers. |
x | an object of class |
latex | a logical value specifying whether to print the number inLaTeX code in addition to return it. |
digits | the number of digits printed for the real part of thelarge number (unused if |
... | (unused). |
Details
In the cases of labeled binary trees, the calculation is done directlyand a single numeric value is returned (or an object of class"LargeNumber").
For multifurcating trees, and bifurcating, rooted, unlabeled trees,the calculation is done iteratively for 1 ton tips. Thus theuser can print all the intermediate values ifdetail = TRUE, oronly a single value ifdetail = FALSE (the default).
For multifurcating trees, ifdetail = TRUE, a matrix isreturned with the number of tips as rows (named from1 ton), and the number of nodes as columns (named from1 ton - 1). For bifurcating, rooted, unlabeled trees, a vector isreturned with names equal to the number of tips (from1 ton).
The number of unlabeled trees (aka tree shapes) can be computed onlyfor the rooted binary cases.
Note that if an infinite value (Inf) is returned this does notmean that there is an infinite number of trees (this cannot be if thenumber of tips is finite), but that the calculation is beyond thelimits of the computer. Only for the cases of rooted, binary, labeledtopologies an approximate number is returned in the form a"LargeNumber" object.
Value
a single numeric value, an object of class"LargeNumber", or inthe case wheredetail = TRUE is used, a named vector ormatrix.
Author(s)
Emmanuel Paradis
References
Felsenstein, J. (2004)Inferring Phylogenies. Sunderland:Sinauer Associates.
Examples
### Table 3.1 in Felsenstein 2004:for (i in c(1:20, 30, 40, 50)) cat(paste(i, howmanytrees(i), sep = "\t"), sep ="\n")### Table 3.6:howmanytrees(8, binary = FALSE, detail = TRUE)Graphical Identification of Nodes and Tips
Description
This function allows to identify a clade on a plotted tree by clickingon the plot with the mouse. The tree, specified in the argumentx, must be plotted beforehand.
Usage
## S3 method for class 'phylo'identify(x, nodes = TRUE, tips = FALSE, labels = FALSE, quiet = FALSE, ...)Arguments
x | an object of class |
nodes | a logical specifying whether to identify the node. |
tips | a logical specifying whether to return the tipinformation. |
labels | a logical specifying whether to return the labels; bydefault only the numbers are returned. |
quiet | a logical controlling whether to print a message invitingthe user to click on the tree. |
... | further arguments to be passed to or from other methods. |
Details
By default, the clade is identified by its number as found in the‘edge’ matrix of the tree. Iftips = TRUE, the tips descendingfrom the identified node are returned, possibly together with thenode. Iflabels = TRUE, the labels are returned (if the treehas no node labels, then the node numbered is returned).
The node is identified by the shortest distance where the clickoccurs. If the click occurs close to a tip, the function returns itsinformation.
Value
A list with one or two vectors named"tips" and/or"nodes" with the identification of the tips and/or of thenodes.
Note
This function does not add anything on the plot, but it can be wrappedwith, e.g.,nodelabels (see example), or its results canbe sent to, e.g.,drop.tip.
Author(s)
Emmanuel Paradis
See Also
plot.phylo,nodelabels,identify for the generic function
Examples
## Not run: tr <- rtree(20)f <- function(col) { o <- identify(tr) nodelabels(node=o$nodes, pch = 19, col = col)}plot(tr)f("red") # click close to a nodef("green")## End(Not run)Plot of DNA Sequence Alignement
Description
This function plots an image of an alignment of nucleotide sequences.
Usage
## S3 method for class 'DNAbin'image(x, what, col, bg = "white", xlab = "", ylab = "", show.labels = TRUE, cex.lab = 1, legend = TRUE, grid = FALSE, show.bases = FALSE, base.cex = 1, base.font = 1, base.col = "black", scheme = "Ape_NT", ...)Arguments
x | a matrix of DNA sequences (class |
what | a vector of characters specifying the bases tovisualize. If missing, this is set to “a”, “g”, “c”, “t”,“n”, and “-” (in this order). |
col | a vector of colours. If missing, this is set to “red”,“yellow”, “green”, “blue”, “grey”, and “black”. If it isshorter (or longer) than |
bg | the colour used for nucleotides whose base is not among |
xlab | the label for thex-axis; none by default. |
ylab | Idem for they-axis. Note that by default, thelabels of the sequences are printed on they-axis (see next option). |
show.labels | a logical controlling whether the sequence labelsare printed ( |
cex.lab | a single numeric controlling the size of the sequence labels.Use |
legend | a logical controlling whether the legend is plotted( |
grid | a logical controlling whether to draw a grid ( |
show.bases | a logical controlling whether to show the base symbols( |
base.cex,base.font,base.col | control the aspect of the basesymbols (ignored if the previous is |
scheme | a predefined color scheme. For amino acid options are "Ape_AA","Zappo_AA", "Clustal", "Polarity" and "Transmembrane_tendency", fornucleotides "Ape_NT" and "RY_NT". |
... | further arguments passed to |
Details
The idea of this function is to allow flexible plotting and colouringof a nucleotide alignment. By default, the most common bases (a, g, c,t, and n) and alignment gap are plotted using a standard colourscheme.
It is possible to plot only one base specified aswhat with achosen colour: this might be useful to check, for instance, thedistribution of alignment gaps (image(x, "-")) or missing data(see examples).
Author(s)
Emmanuel Paradis, Klaus Schliep
See Also
DNAbin,del.gaps,alex,alview,all.equal.DNAbin,clustal,grid,image.AAbin
Examples
data(woodmouse)image(woodmouse)rug(seg.sites(woodmouse), -0.02, 3, 1)image(woodmouse, "n", "blue") # show missing dataimage(woodmouse, c("g", "c"), "green") # G+Cpar(mfcol = c(2, 2))### barcoding style:for (x in c("a", "g", "c", "t")) image(woodmouse, x, "black", cex.lab = 0.5, cex.axis = 0.7)par(mfcol = c(1, 1))### zoom on a portion of the data:image(woodmouse[11:15, 1:50], c("a", "n"), c("blue", "grey"))grid(50, 5, col = "black")### see the guanines on a black background:image(woodmouse, "g", "yellow", "black")### Amino acidX <- trans(woodmouse, 2)image(X) # default ape colorsimage(X, scheme="Clustal") # Clustal coloringTest for Binary Tree
Description
This function tests whether a phylogenetic tree is binary.
Usage
is.binary(phy)## S3 method for class 'phylo'is.binary(phy)## S3 method for class 'multiPhylo'is.binary(phy)## S3 method for class 'tree'is.binary(phy)Arguments
phy | an object of class |
Details
The test differs whether the tree is rooted or not. An urooted tree isconsidered binary if all its nodes are of degree three (i.e., threeedges connect to each node). A rooted tree is considered binary if allnodes (including the root node) have exactly two descendant nodes, sothat they are of degree three expect the root which is of degree 2.
The test ignores branch lengths. Consider usingdi2multiif you want to treat zero-branch lengths as resulting frommultichotomies.
is.binary.tree is deprecated and will be removed soon:currently it callsis.binary.
Value
a logical vector.
Author(s)
Emmanuel Paradis
See Also
is.rooted,is.ultrametric,multi2di
Examples
is.binary(rtree(10))is.binary(rtree(10, rooted = FALSE))is.binary(stree(10))x <- setNames(rmtree(10, 10), LETTERS[1:10])is.binary(x)Check Compatibility of Splits
Description
is.compatible is a generic function with a method for the class"bitsplits". It checks whether a set of splits is compatibleusing thearecompatible function.
Usage
is.compatible(obj)## S3 method for class 'bitsplits'is.compatible(obj)arecompatible(x, y, n)Arguments
obj | an object of class |
x,y | a vector of mode raw. |
n | the number of taxa in the splits. |
Value
TRUE if the splits are compatible,FALSE otherwise.
Author(s)
Andrei Popescu
See Also
Is Group Monophyletic
Description
This function tests whether a list of tip labels is monophyletic on a given tree.
Usage
is.monophyletic(phy, tips, reroot = !is.rooted(phy), plot = FALSE, ...)Arguments
phy | a phylogenetic tree description of class |
tips | a vector of mode numeric or character specifying the tips to be tested. |
reroot | a logical. If |
plot | a logical. If |
... | further arguments passed to |
Details
Ifphy is rooted, the test is done on the rooted tree, otherwisethe tree is first unrooted, then arbitrarily rerooted, in order to beindependent on the current position of the root. That is, the testasks iftips could be monophyletic given any favourably rootingofphy.
Ifphy is unrooted the test is done on an unrooted tree, unlessreroot = FALSE is specified.
If tip labels in the listtips are given as characters, they needto be spelled as in the objectphy.
Value
TRUE orFALSE.
Author(s)
Johan Nylanderjnylander@users.sourceforge.net
See Also
Examples
## Test one monophyletic and one paraphyletic group on the bird.orders tree ## Not run: data("bird.orders") ## Not run: is.monophyletic(phy = bird.orders, tips = c("Ciconiiformes", "Gruiformes")) ## Not run: is.monophyletic(bird.orders, c("Passeriformes", "Ciconiiformes", "Gruiformes"))Test if a Tree is Ultrametric
Description
This function tests whether a tree is ultrametric using the distancesfrom each tip to the root.
Usage
is.ultrametric(phy, ...)## S3 method for class 'phylo'is.ultrametric(phy, tol = .Machine$double.eps^0.5, option = 1, ...)## S3 method for class 'multiPhylo'is.ultrametric(phy, tol = .Machine$double.eps^0.5, option = 1, ...)Arguments
phy | an object of class |
tol | a numeric >= 0, variation below this value are considerednon-significant. |
option | an integer (1 or 2; see details). |
... | arguments passed among methods. |
Details
The test is based on the distances from each tip to the root and acriterion: ifoption = 1, the criterion is the scaled range((max - min/max)), ifoption = 2, the variance is used (thiswas the method used until ape 3.5). The default criterion is invariantto linear changes of the branch lengths.
Value
a logical vector.
Author(s)
Emmanuel Paradis
See Also
Examples
is.ultrametric(rtree(10))is.ultrametric(rcoal(10))Plot Multiple Chronograms on the Same Scale
Description
The main argument is a list of (rooted) trees which are plotted on thesame scale.
Usage
kronoviz(x, layout = length(x), horiz = TRUE, ..., direction = ifelse(horiz, "rightwards", "upwards"), side = 2)Arguments
x | a list of (rooted) trees of class |
layout | an integer giving the number of trees plottedsimultaneously; by default all. |
horiz | a logical specifying whether the trees should be plottedrightwards (the default) or upwards. |
... | further arguments passed to |
direction | a character string specifying the direction of thetree. Four values are possible: "rightwards" (the default),"leftwards", "upwards", and "downwards". |
side | Where to put the axis, see example. |
Details
The size of the individual plots is proportional to the size of thetrees.
Value
NULL
Author(s)
Emmanuel Paradis, Klaus Schliep
See Also
Examples
TR <- replicate(10, rcoal(sample(11:20, size = 1)), simplify = FALSE)kronoviz(TR)kronoviz(TR, side = 1)kronoviz(TR, horiz = FALSE, type = "c", show.tip.label = FALSE)kronoviz(TR, direction = "d", side = c(1,2))Label Management
Description
These functions work on a vector of character strings storing bi- or trinomial species names, typically “Genus_species_subspecies”.
Usage
label2table(x, sep = NULL, as.is = FALSE)stripLabel(x, species = FALSE, subsp = TRUE, sep = NULL)abbreviateGenus(x, genus = TRUE, species = FALSE, sep = NULL)Arguments
x | a vector of mode character. |
sep | the separator (a single character) between the taxonomic levels (see details). |
as.is | a logical specifying whether to convert characters into factors (like in |
species,subsp,genus | a logical specifying whether the taxonomic level is concerned by the operation. |
Details
label2table returns a data frame with three columns named “genus”, “species”, and “subspecies” (withNA if the level is missing).
stripLabel deletes the subspecies names from the input. Ifspecies = TRUE, the species names are also removed, thus returning only the genus names.
abbreviateGenus abbreviates the genus names keeping only the first letter. Ifspecies = TRUE, the species names are abbreviated.
By default, these functions try to guess what is the separator between the genus, species and/or subspecies names. If an underscore is present in the input, then this character is assumed to be the separator; otherwise, a space. If this does not work, you can specifysep to its appropriate value.
Value
A vector of mode character or a data frame.
Author(s)
Emmanuel Paradis
See Also
makeLabel,makeNodeLabel,mixedFontLabel,updateLabel,checkLabel
Examples
x <- c("Panthera_leo", "Panthera_pardus", "Panthera_onca", "Panthera_uncia", "Panthera_tigris_altaica", "Panthera_tigris_amoyensis")label2table(x)stripLabel(x)stripLabel(x, TRUE)abbreviateGenus(x)abbreviateGenus(x, species = TRUE)abbreviateGenus(x, genus = FALSE, species = TRUE)Ladderize a Tree
Description
This function reorganizes the internal structure of the tree to getthe ladderized effect when plotted.
Usage
ladderize(phy, right = TRUE)Arguments
phy | an object of class |
right | a logical specifying whether the smallest clade is on theright-hand side (when the tree is plotted upwards), or the opposite(if |
Author(s)
Emmanuel Paradis
See Also
Examples
tr <- rcoal(50)layout(matrix(1:4, 2, 2))plot(tr, main = "normal")plot(ladderize(tr), main = "right-ladderized")plot(ladderize(tr, FALSE), main = "left-ladderized")layout(matrix(1, 1))Leading and Trailing Alignment Gaps to N
Description
Substitutes leading and trailing alignment gaps in aligned sequencesintoN (i.e., A, C, G, or T). The gaps in the middle of thesequences are left unchanged.
Usage
latag2n(x)Arguments
x | an object of class |
Details
This function is called by others inape and inpegas. Itis documented here in case it needs to be called by other packages.
Value
an object of class"DNAbin".
Author(s)
Emmanuel Paradis
See Also
Examples
x <- as.DNAbin(matrix(c("-", "A", "G", "-", "T", "C"), 2, 3))y <- latag2n(x)alview(x)alview(y)Multiple regression through the origin
Description
Functionlmorigin computes a multiple linear regression and performs tests of significance of the equation parameters (F-test of R-square and t-tests of regression coefficients) using permutations.
The regression line can be forced through the origin. Testing the significance in that case requires a special permutation procedure. This option was developed for the analysis of independent contrasts, which requires regression through the origin. A permutation test, described by Legendre & Desdevises (2009), is needed to analyze contrasts that are not normally distributed.
Usage
lmorigin(formula, data, origin=TRUE, nperm=999, method=NULL, silent=FALSE)Arguments
formula | |
data | A data frame containing the two variables specified in the formula. |
origin |
|
nperm | Number of permutations for the tests. If |
method |
|
silent | Informative messages and the time to compute the tests will not be written to theR console if silent=TRUE. Useful when the function is called by a numerical simulation function. |
Details
The permutation F-test of R-square is always done by permutation of the raw data. When there is a single explanatory variable, permutation of the raw data is used for the t-test of the single regression coefficient, whatever the method chosen by the user. The rationale is found in Anderson & Legendre (1999).
Theprint.lmorigin function prints out the results of the parametric tests (in all cases) and the results of the permutational tests (when nperm > 0).
Value
reg | The regression output object produced by function |
p.param.t.2tail | Parametric probabilities for 2-tailed tests of the regression coefficients. |
p.param.t.1tail | Parametric probabilities for 1-tailed tests of the regression coefficients. Each test is carried out in the direction of the sign of the coefficient. |
p.perm.t.2tail | Permutational probabilities for 2-tailed tests of the regression coefficients. |
p.perm.t.1tail | Permutational probabilities for 1-tailed tests of the regression coefficients. Each test is carried out in the direction of the sign of the coefficient. |
p.perm.F | Permutational probability for the F-test of R-square. |
origin | TRUE is regression through the origin has been computed, FALSE if multiple regression with estimation of the intercept has been used. |
nperm | Number of permutations used in the permutation tests. |
method | Permutation method for the t-tests of the regression coefficients: |
var.names | Vector containing the names of the variables used in the regression. |
call | The function call. |
Author(s)
Pierre Legendre, Universite de Montreal
References
Anderson, M. J. and Legendre, P. (1999) An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model.Journal of Statistical Computation and Simulation,62, 271–303.
Legendre, P. and Desdevises, Y. (2009) Independent contrasts and regression through the origin.Journal of Theoretical Biology,259, 727–743.
Sokal, R. R. and Rohlf, F. J. (1995)Biometry - The principles andpractice of statistics in biological research. Third edition. NewYork: W. H. Freeman.
Examples
## Example 1 from Sokal & Rohlf (1995) Table 16.1## SO2 air pollution in 41 cities of the USAdata(lmorigin.ex1)out <- lmorigin(SO2 ~ ., data=lmorigin.ex1, origin=FALSE, nperm=99)out## Example 2: Contrasts computed on the phylogenetic tree of Lamellodiscus## parasites. Response variable: non-specificity index (NSI); explanatory## variable: maximum host size. Data from Table 1 of Legendre & Desdevises## (2009).data(lmorigin.ex2)out <- lmorigin(NSI ~ MaxHostSize, data=lmorigin.ex2, origin=TRUE, nperm=99)out## Example 3: random numbersy <- rnorm(50)X <- as.data.frame(matrix(rnorm(250),50,5))out <- lmorigin(y ~ ., data=X, origin=FALSE, nperm=99)outLineages Through Time Plot
Description
These functions provide tools for plotting the numbers of lineagesthrough time from phylogenetic trees.
Usage
ltt.plot(phy, xlab = "Time", ylab = "N", backward = TRUE, tol = 1e-6, ...)ltt.lines(phy, backward = TRUE, tol = 1e-6, ...)mltt.plot(phy, ..., dcol = TRUE, dlty = FALSE, legend = TRUE, xlab = "Time", ylab = "N", log = "", backward = TRUE, tol = 1e-6)ltt.coplot(phy, backward = TRUE, ...)ltt.plot.coords(phy, backward = TRUE, tol = 1e-6, type = "S")Arguments
phy | an object of class |
xlab | a character string (or a variable of mode character)giving the label for the |
ylab | idem for the |
backward | a logical value: should the time axis be traced fromthe present (the default), or from the root of the tree? |
tol | a numeric value (see details). |
... | in the cases of |
dcol | a logical specifying whether the different curves shouldbe differentiated with colors (default is |
dlty | a logical specifying whether the different curves shouldbe differentiated with patterns of dots and dashes (default is |
legend | a logical specifying whether a legend should beplotted. |
log | a character string specifying which axis(es) to belog-transformed; must be one of the followings: |
type | either |
Details
ltt.plot does a simple lineages through time (LTT)plot. Additional arguments (...) may be used to change, forinstance, the limits on the axes (withxlim and/orylim) or other graphical settings (col for the color,lwd for the line thickness,lty for the line type may beuseful; seepar for an exhaustive listing ofgraphical parameters). They-axis can be log-transformed byadding the following option:log = "y".
The optiontol is used as follows: first the most distant tipfrom the root is found, then all tips whose distance to the root isnot different from the previous one more thantol areconsidered to be contemporaneous with it.
If the tree is not ultrametric, the plot is done assuming the tips,except the most distant from the root, represent extinction events. Ifa root edge is present, it is taken into account.
ltt.lines adds a LTT curve to an existing plot. Additionalarguments (...) may be used to change the settings of the addedline.
mltt.plot does a multiple LTT plot taking as arguments one orseveral trees. These trees may be given as objects of class"phylo" (single trees) and/or"multiPhylo" (multipletrees). Any number of objects may be given. This function is mainlyfor exploratory analyses with the advantages that the axes are setproperly to view all lines, and the legend is plotted by default. Theplot will certainly make sense if all trees have theirmost-distant-from-the-root tips contemporaneous (i.e., trees with onlyextinct lineages will not be represented properly). For more flexiblesettings of line drawings, it may be better to combineltt.plot() with successive calls ofltt.lines() (seeexamples).
ltt.coplot is meant to show how to set a tree and a LTT plotson the same scales. All extra arguments modify only the appearance ofthe tree. The code can be easily edited and tailored.
Value
ltt.plot.coords returns a two-column matrix with the timepoints and the number of lineages, respectively.type = "S" returns the number of lineages to the left of (or "up to") the corresponding point in time, whiletype = "s" returns the number oflineages to the right of this point (i.e, between that time and the next).
Author(s)
Emmanuel Paradis
References
Harvey, P. H., May, R. M. and Nee, S. (1994) Phylogenies withoutfossils.Evolution,48, 523–529.
Nee, S., Holmes, E. C., Rambaut, A. and Harvey, P. H. (1995) Inferringpopulation history from molecular phylogenies.PhilosophicalTransactions of the Royal Society of London. Series B. BiologicalSciences,349, 25–31.
See Also
kronoviz,skyline,LTT,branching.times,birthdeath,bd.ext,yule.cov,bd.time;plot for the basic plotting function in R
Examples
data(bird.families)opar <- par(mfrow = c(2, 1))ltt.plot(bird.families)title("Lineages Through Time Plot of the Bird Families")ltt.plot(bird.families, log = "y")title(main = "Lineages Through Time Plot of the Bird Families", sub = "(with logarithmic transformation of the y-axis)")par(opar)### to plot the tree and the LTT plot togetherdata(bird.orders)layout(matrix(1:4, 2, 2))plot(bird.families, show.tip.label = FALSE)ltt.plot(bird.families, main = "Bird families")plot(bird.orders, show.tip.label = FALSE)ltt.plot(bird.orders, main = "Bird orders")layout(1)### better with ltt.coplot():ltt.coplot(bird.families, show.tip.label = FALSE, x.lim = 27.5)data(chiroptera)chiroptera <- compute.brlen(chiroptera)ltt.coplot(chiroptera, show.tip.label = FALSE, type = "c")### with extinct lineages and a root edge:omar <- par("mar")set.seed(31)tr <- rlineage(0.2, 0.15)tr$root.edge <- 5ltt.coplot(tr, show.tip.label = FALSE, x.lim = 55)## compare with:## ltt.coplot(drop.fossil(tr), show.tip.label = FALSE)layout(1)par(mar = omar)mltt.plot(bird.families, bird.orders)### Generates 10 random trees with 23 tips:TR <- replicate(10, rcoal(23), FALSE)### Give names to each tree:names(TR) <- paste("random tree", 1:10)### And specify the class of the list so that mltt.plot()### does not trash it!class(TR) <- "multiPhylo"mltt.plot(TR, bird.orders)### And now for something (not so) completely different:ltt.plot(bird.orders, lwd = 2)for (i in 1:10) ltt.lines(TR[[i]], lty = 2)legend(-20, 10, lwd = c(2, 1), lty = c(1, 2), bty = "n", legend = c("Bird orders", "Random (coalescent) trees"))Label Management
Description
This is a generic function with methods for character vectors, treesof class"phylo", lists of trees of class"multiPhylo",and DNA sequences of class"DNAbin". All options for the classcharacter may be used in the other methods.
Usage
makeLabel(x, ...)## S3 method for class 'character'makeLabel(x, len = 99, space = "_", make.unique = TRUE, illegal = "():;,[]", quote = FALSE, ...)## S3 method for class 'phylo'makeLabel(x, tips = TRUE, nodes = TRUE, ...)## S3 method for class 'multiPhylo'makeLabel(x, tips = TRUE, nodes = TRUE, ...)## S3 method for class 'DNAbin'makeLabel(x, ...)Arguments
x | a vector of mode character or an object for which labels areto be changed. |
len | the maximum length of the labels: those longer than ‘len’will be truncated. |
space | the character to replace spaces, tabulations, andlinebreaks. |
make.unique | a logical specifying whether duplicate labelsshould be made unique by appending numerals; |
illegal | a string specifying the characters to be deleted. |
quote | a logical specifying whether to quote the labels; |
tips | a logical specifying whether tip labels are to bemodified; |
nodes | a logical specifying whether node labels are to bemodified; |
... | further arguments to be passed to or from other methods. |
Details
The optionmake.unique does not work exactly in the same waythen the function of the same name: numbers are suffixed to all labelsthat are identical (without separator). See the examples.
If there are 10–99 identical labels, the labels returned are "xxx01","xxx02", etc, or "xxx001", "xxx002", etc, if they are 100–999, and soon. The number of digits added preserves the option ‘len’.
The default for ‘len’ makes labels short enough to be read byPhyML. Clustal accepts labels up to 30 character long.
Value
An object of the appropriate class.
Note
The current version does not perform well when trying to make veryshort unique labels (e.g., less than 5 character long).
Author(s)
Emmanuel Paradis
See Also
makeNodeLabel,make.unique,make.names,abbreviate,mixedFontLabel,label2table,updateLabel,checkLabel
Examples
x <- rep("a", 3)makeLabel(x)make.unique(x) # <- from R's basex <- rep("aaaaa", 2)makeLabel(x, len = 3) # made unique and of length 3makeLabel(x, len = 3, make.unique = FALSE)Makes Node Labels
Description
This function makes node labels in a tree in a flexible way.
Usage
makeNodeLabel(phy, ...)## S3 method for class 'phylo'makeNodeLabel(phy, method = "number", prefix = "Node", nodeList = list(), ...)## S3 method for class 'multiPhylo'makeNodeLabel(phy, method = "number", prefix = "Node", nodeList = list(), ...)Arguments
phy | an object of class |
method | a character string giving the method used to create thelabels. Three choices are possible: |
prefix | the prefix used if |
nodeList | a named list specifying how nodes are names if |
... | further arguments passed to |
Details
The three methods are described below:
“number”! The labels are created with 1, 2, ... prefixedwith the argument
prefix; thus the default is to haveNode1, Node2, ... Setprefix = ""to have only numbers.“md5sum”: For each node, the labels of the tips descendantfrom this node are extracted, sorted alphabetically, and writteninto a temporary file, then the md5sum of this file is extractedand used as label. This results in a 32-character string which isunique (even accross trees) for a given set of tip labels.
“user”: the argument
nodeListmust be a list withnames, the latter will be used as node labels. For each element ofnodeList, the tip labels of the tree are searched forpatterns present in this element: this is done usinggrep. Then the most recent common ancestor ofthe matching tips is given the corresponding names as labels. Thisis repeated for each element ofnodeList.
The method"user" can be used in combination with either of thetwo others (see examples). Note that this method only modifies thespecified node labels (so that if the other nodes have already labelsthey are not modified) while the two others change all labels.
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis
See Also
makeLabel,grep,mixedFontLabel,label2table,checkLabel
Examples
tr <-"((Pan_paniscus,Pan_troglodytes),((Homo_sapiens,Homo_erectus),Homo_abilis));"tr <- read.tree(text = tr)tr <- makeNodeLabel(tr, "u", nodeList = list(Pan = "Pan", Homo = "Homo"))plot(tr, show.node.label = TRUE)### does not erase the previous node labels:tr <- makeNodeLabel(tr, "u", nodeList = list(Hominid = c("Pan","Homo")))plot(tr, show.node.label = TRUE)### the two previous commands could be combined:L <- list(Pan = "Pan", Homo = "Homo", Hominid = c("Pan","Homo"))tr <- makeNodeLabel(tr, "u", nodeList = L)### combining different methods:tr <- makeNodeLabel(tr, c("n", "u"), prefix = "#", nodeList = list(Hominid = c("Pan","Homo")))plot(tr, show.node.label = TRUE)Mantel Test for Similarity of Two Matrices
Description
This function computes Mantel's permutation test for similarity of twomatrices. It permutes the rows and columns of the second matrixrandomly and calculates aZ-statistic.
Usage
mantel.test(m1, m2, nperm = 999, graph = FALSE, alternative = "two.sided", ...)Arguments
m1 | a numeric matrix giving a measure of pairwise distances,correlations, or similarities among observations. |
m2 | a second numeric matrix giving another measure of pairwisedistances, correlations, or similarities among observations. |
nperm | the number of times to permute the data. |
graph | a logical indicating whether to produce a summary graph(by default the graph is not plotted). |
alternative | a character string defining the alternativehypothesis: |
... | further arguments to be passed to |
Details
The function calculates aZ-statistic for the Mantel test, equal tothe sum of the pairwise product of the lower triangles of thepermuted matrices, for each permutation of rows and columns. Itcompares the permuted distribution with theZ-statistic observedfor the actual data.
The present implementation can analyse symmetric as well as (sinceversion 5.1 ofape) asymmetric matrices (see Mantel 1967,Sects. 4 and 5). The diagonals of both matrices are ignored.
Ifgraph = TRUE, the functions plots the density estimate ofthe permutation distribution along with the observedZ-statisticas a vertical line.
The... argument allows the user to give further options totheplot function: the title main be changed withmain=,the axis labels withxlab =, andylab =, and so on.
Value
z.stat | the |
p |
|
alternative | the alternative hypothesis. |
Author(s)
Original code in S by Ben Bolker, ported toR by Julien Claude
References
Mantel, N. (1967) The detection of disease clustering and ageneralized regression approach.Cancer Research,27,209–220.
Manly, B. F. J. (1986)Multivariate statistical methods: a primer.London: Chapman & Hall.
Examples
q1 <- matrix(runif(36), nrow = 6)q2 <- matrix(runif(36), nrow = 6)diag(q1) <- diag(q2) <- 0mantel.test(q1, q2, graph = TRUE, main = "Mantel test: a random example with 6 X 6 matricesrepresenting asymmetric relationships", xlab = "z-statistic", ylab = "Density", sub = "The vertical line shows the observed z-statistic")Three Matrices
Description
Three matrices respectively representing Serological (asymmetric),DNA hybridization (asymmetric) and Anatomical (symmetric) distancesamong 9 families.
Usage
data(mat3)Format
A data frame with 27 observations and 9 variables.
Source
Lapointe, F.-J., J. A. W. Kirsch and J. M. Hutcheon. 1999. Totalevidence, consensus, and bat phylogeny: a distance-basedapproach. Molecular Phylogenetics and Evolution 11: 55-66.
See Also
Five Trees
Description
Three partly similar trees, two independent trees.
Usage
data(mat5M3ID)Format
A data frame with 250 observations and 50 variables.
Source
Data provided by V. Campbell.
See Also
Five Independent Trees
Description
Five independent additive trees.
Usage
data(mat5Mrand)Format
A data frame with 250 observations and 50 variables.
Source
Data provided by V. Campbell.
See Also
Matrix Exponential
Description
This function computes the exponential of a square matrix using aspectral decomposition.
Usage
matexpo(x)Arguments
x | a square matrix of mode numeric. |
Value
a numeric matrix of the same dimensions than ‘x’.
Author(s)
Emmanuel Paradis
Examples
### a simple rate matrix:m <- matrix(0.1, 4, 4)diag(m) <- -0.3### towards equilibrium:for (t in c(1, 5, 10, 50)) print(matexpo(m*t))McConway-Sims Test of Homogeneous Diversification
Description
This function performs the McConway–Sims test that a trait orvariable does not affect diversification rate.
Usage
mcconwaysims.test(x)Arguments
x | a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease or decrease diversification rate, and the second one the number ofspecies in the sister-clades without the trait. Eachrow represents a pair of sister-clades. |
Details
The McConway–Sims test compares a series of sister-clades where oneof the two is characterized by a trait supposed to affectdiversification rate. The null hypothesis is that the trait does notaffect diversification. The alternative hypothesis is thatdiversification rate is increased or decreased by the trait (bycontrast to the Slowinski–Guyer test). The test is a likelihood-ratioof a null Yule model and an alternative model with two parameters.
Value
a data frame with the\chi^2, the number of degrees offreedom, and theP-value.
Author(s)
Emmanuel Paradis
References
McConway, K. J. and Sims, H. J. (2004) A likelihood-based method fortesting for nonstochastic variation of diversification rates inphylogenies.Evolution,58, 12–23.
Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.
See Also
balance,slowinskiguyer.test,rc ingeiger,shift.test inapTreeshape
Examples
### simulate 10 clades with lambda = 0.1 and mu = 0.09:n0 <- replicate(10, balance(rbdtree(.1, .09, Tmax = 35))[1])### simulate 10 clades with lambda = 0.15 and mu = 0.1:n1 <- replicate(10, balance(rbdtree(.15, .1, Tmax = 35))[1])x <- cbind(n1, n0)mcconwaysims.test(x)slowinskiguyer.test(x)richness.yule.test(x, 35)Reversible Jump MCMC to Infer Demographic History
Description
These functions implement a reversible jump MCMC framework to infer the demographic history,as well as corresponding confidence bands,from a genealogical tree. The computed demographic history is a continousand smooth function in time.mcmc.popsize runs the actual MCMC chain and outputs information about thesampling steps,extract.popsize generates from this MCMCoutput a table of population size in time, andplot.popsize andlines.popsizeprovide utility functions to plot the corresponding demographic functions.
Usage
mcmc.popsize(tree,nstep, thinning=1, burn.in=0,progress.bar=TRUE, method.prior.changepoints=c("hierarchical", "fixed.lambda"), max.nodes=30, lambda=0.5, gamma.shape=0.5, gamma.scale=2, method.prior.heights=c("skyline", "constant", "custom"), prior.height.mean, prior.height.var)extract.popsize(mcmc.out, credible.interval=0.95, time.points=200, thinning=1, burn.in=0)## S3 method for class 'popsize'plot(x, show.median=TRUE, show.years=FALSE, subst.rate, present.year, xlab = NULL, ylab = "Effective population size", log = "y", ...)## S3 method for class 'popsize'lines(x, show.median=TRUE,show.years=FALSE, subst.rate, present.year, ...)Arguments
tree | Either an ultrametric tree (i.e. an object of class |
nstep | Number of MCMC steps, i.e. length of the Markov chain (suggested value: 10,000-50,000). |
thinning | Thinning factor (suggest value: 10-100). |
burn.in | Number of steps dropped from the chain to allow for a burn-in phase (suggest value: 1000). |
progress.bar | Show progress bar during the MCMC run. |
method.prior.changepoints | If |
max.nodes | Upper limit for the number of internal nodes of the approximating spline (default: 30). |
lambda | Smoothing parameter. For |
gamma.shape | Shape parameter of the gamma function from which |
gamma.scale | Scale parameter of the gamma function from which |
method.prior.heights | Determines the prior for the heights of the change points.If |
prior.height.mean | Function describing the mean of the prior distribution for the heights(only used if |
prior.height.var | Function describing the variance of the prior distribution for the heights(only used if |
mcmc.out | Output from |
credible.interval | Probability mass of the confidence band (default: 0.95). |
time.points | Number of discrete time points in the table output by |
x | Table with population size versus time, as computed by |
show.median | Plot median rather than mean as point estimate for demographic function (default: TRUE). |
show.years | Option that determines whether the time is plotted in units ofof substitutions (default) or in years (requires specification of substution rateand year of present). |
subst.rate | Substitution rate (see option show.years). |
present.year | Present year (see option show.years). |
xlab | label on the x-axis (depends on the value of |
ylab | label on the y-axis. |
log | log-transformation of axes; by default, the y-axis islog-transformed. |
... | Further arguments to be passed on to |
Details
Please refer to Opgen-Rhein et al. (2005) for methodological details, and the help page ofskyline for information on a related approach.
Author(s)
Rainer Opgen-Rhein and Korbinian Strimmer. Parts of the rjMCMCsampling procedure are adapted fromR code by Karl Broman.
References
Opgen-Rhein, R., Fahrmeir, L. and Strimmer, K. 2005. Inference ofdemographic history from genealogical trees using reversible jumpMarkov chain Monte Carlo.BMC Evolutionary Biology,5,6.
See Also
skyline andskylineplot.
Examples
# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# run mcmc chainmcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!# make list of population size versus timepopsize <- extract.popsize(mcmc.out)# plot and compare with skyline plotsk <- skyline(tree.hiv)plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)Mixed Font Labels for Plotting
Description
This function helps to format labels with bits of text in differentfont shapes (italics, bold, or bolditalics) and differentseparators. The output is intended to be used for plotting.
Usage
mixedFontLabel(..., sep = " ", italic = NULL, bold = NULL, parenthesis = NULL, always.upright = c("sp.", "spp.", "ssp."))Arguments
... | vectors of mode character to be formatted. They may beof different lengths in which case the shortest ones arerecycled. |
sep | a vector of mode character giving the separators to beprinted between the elements in |
italic | a vector of integers specifying the elements in |
bold | id. in boldface. |
parenthesis | id. within parentheses. |
always.upright | of vector of mode character giving the stringsto not print in italics. Use |
Details
The idea is to have different bits of text in different vectors thatare put together to make a vector ofR expressions. This vector isinterpreted by graphical functions to format the text. A simple usemay bemixedFontLabel(genus, species, italic = 1:2), but it ismore interesting when mixing fonts (see examples).
To have an element in bolditalics, its number must given in bothitalic andbold.
The vector returned by this function may be assigned as thetip.label element of a tree of class"phylo", or even asitsnode.label element.
Value
A vector of mode expression.
Author(s)
Emmanuel Paradis
See Also
makeLabel,makeNodeLabel,label2table,updateLabel,checkLabel
Examples
tr <- read.tree(text = "((a,(b,c)),d);")genus <- c("Gorilla", "Pan", "Homo", "Pongo")species <- c("gorilla", "spp.", "sapiens", "pygmaeus")geo <- c("Africa", "Africa", "World", "Asia")tr$tip.label <- mixedFontLabel(genus, species, geo, italic = 1:2, parenthesis = 3)layout(matrix(c(1, 2), 2))plot(tr)tr$tip.label <- mixedFontLabel(genus, species, geo, sep = c(" ", " | "), italic = 1:2, bold = 3)plot(tr)layout(1)Find Most Recent Common Ancestors Between Pairs
Description
mrca returns for each pair of tips (and nodes) its mostrecent common ancestor (MRCA).
getMRCA returns the MRCA of two or more tips.
Usage
mrca(phy, full = FALSE)getMRCA(phy, tip)Arguments
phy | an object of class |
full | a logical indicating whether to return the MRCAs amongall tips and nodes (if |
tip | a vector of mode numeric or character specifying the tips;can also be node numbers. |
Details
Formrca, the diagonal is set to the number of the tips (andnodes iffull = TRUE). Iffull = FALSE, the colnames andrownames are set with the tip labels of the tree; otherwise thenumbers are given as names.
ForgetMRCA, iftip is of length one or zero thenNULL is returned.
Value
a matrix of mode numeric (mrca) or a single numeric value(getMRCA).
Author(s)
Emmanuel Paradis, Klaus Schliep, Joseph W. Brown
Minimum Spanning Tree
Description
The functionmst finds the minimum spanning tree between a set ofobservations using a matrix of pairwise distances.
Theplot method plots the minimum spanning tree showing thelinks where the observations are identified by their numbers.
Usage
mst(X)## S3 method for class 'mst'plot(x, graph = "circle", x1 = NULL, x2 = NULL, ...)Arguments
X | either a matrix that can be interpreted as a distance matrix,or an object of class |
x | an object of class |
graph | a character string indicating the type of graph to plotthe minimum spanning tree; two choices are possible: |
x1 | a numeric vector giving the coordinates of the observationson thex-axis. Both |
x2 | a numeric vector giving the coordinates of the observationson they-axis. Both |
... | further arguments to be passed to |
Details
These functions provide two ways to plot the minimum spanning tree whichtry to space as much as possible the observations in order to show asclearly as possible the links. The optiongraph = "circle"simply plots regularly the observations on a circle, whereasgraph = "nsca" uses a non-symmetric correspondence analysiswhere each observation is represented at the centroid of its neighbours.
Alternatively, the user may use any system of coordinates for theobsevations, for instance a principal components analysis (PCA) if thedistances were computed from an original matrix of continous variables.
Value
an object of class"mst" which is a square numeric matrix of sizeequal to the number of observations with either1 if a linkbetween the corresponding observations was found, or0otherwise. The names of the rows and columns of the distance matrix,if available, are given as rownames and colnames to the returned object.
Author(s)
Yvonnick Noelnoel@univ-lille3.fr,Julien Claudejulien.claude@umontpellier.fr andEmmanuel Paradis
See Also
Examples
require(stats)X <- matrix(runif(200), 20, 10)d <- dist(X)PC <- prcomp(X)M <- mst(d)opar <- par(mfcol = c(2, 2))plot(M)plot(M, graph = "nsca")plot(M, x1 = PC$x[, 1], x2 = PC$x[, 2])par(opar)Collapse and Resolve Multichotomies
Description
These two functions collapse or resolve multichotomies in phylogenetictrees.
Usage
multi2di(phy, ...)## S3 method for class 'phylo'multi2di(phy, random = TRUE, equiprob = TRUE, ...)## S3 method for class 'multiPhylo'multi2di(phy, random = TRUE, equiprob = TRUE, ...)di2multi(phy, ...)## S3 method for class 'phylo'di2multi(phy, tol = 1e-08, ...)## S3 method for class 'multiPhylo'di2multi(phy, tol = 1e-08, ...)Arguments
phy | an object of class |
random | a logical value specifying whether to resolve themultichotomies randomly (the default) or in the order they appear inthe tree (if |
equiprob | a logical value: should topologies generated in equalprobabilities; see details in |
tol | a numeric value giving the tolerance to consider a branchlength significantly greater than zero. |
... | arguments passed among methods. |
Details
multi2di transforms all multichotomies into a series ofdichotomies with one (or several) branch(es) of length zero.
di2multi deletes all branches smaller thantol andcollapses the corresponding dichotomies into a multichotomy.
Value
an object of the same class than the input.
Author(s)
Emmanuel Paradis
See Also
Examples
data(bird.families)is.binary(bird.families)is.binary(multi2di(bird.families))all.equal(di2multi(multi2di(bird.families)), bird.families)### To see the results of randomly resolving a trichotomy:tr <- read.tree(text = "(a:1,b:1,c:1);")layout(matrix(1:4, 2, 2))for (i in 1:4) plot(multi2di(tr), use.edge.length = FALSE, cex = 1.5)layout(1)Manipulating Lists of Trees
Description
These are extraction and replacement operators for lists of treesstored in the class"multiPhylo".
Usage
## S3 method for class 'multiPhylo'x[i]## S3 method for class 'multiPhylo'x[[i]]## S3 method for class 'multiPhylo'x$name## S3 replacement method for class 'multiPhylo'x[i] <- value## S3 replacement method for class 'multiPhylo'x[[i]] <- value## S3 replacement method for class 'multiPhylo'x$i <- valueArguments
x,value | an object of class |
i | index(ices) of the tree(s) to select from a list; this may be avector of integers, logicals, or names. |
name | a character string specifying the tree to be extracted. |
Details
The subsetting operator[ keeps the class correctly("multiPhylo").
The replacement operators check the labels ofvalue ifxhas a single vector of tip labels for all trees (see examples).
Value
An object of class"phylo" ([[,$) or of class"multiPhylo" ([ and the replacement operators).
Author(s)
Emmanuel Paradis
See Also
Examples
x <- rmtree(10, 20)names(x) <- paste("tree", 1:10, sep = "")x[1:5]x[1] # subsettingx[[1]] # extractionx$tree1 # same than abovex[[1]] <- rtree(20)y <- .compressTipLabel(x)## up to here 'x' and 'y' have exactly the same information## but 'y' has a unique vector of tip labels for all the treesx[[1]] <- rtree(10) # no errortry(y[[1]] <- rtree(10)) # errortry(x[1] <- rtree(20)) # error## use instead one of the two:x[1] <- list(rtree(20))x[1] <- c(rtree(20))x[1:5] <- rmtree(5, 20) # replacementx[11:20] <- rmtree(10, 20) # elongationx # 20 treesMinimum Variance Reduction
Description
Phylogenetic tree construction based on the minimum variance reduction.
Usage
mvr(X, V)mvrs(X, V, fs = 15)Arguments
X | a distance matrix. |
V | a variance matrix. |
fs | agglomeration criterion parameter: it is coerced as aninteger and must at least equal to one. |
Details
The MVR method can be seen as a version of BIONJ which is notrestricted to the Poison model of variance (Gascuel 2000).
Value
an object of class"phylo".
Author(s)
Andrei Popescu
References
Criscuolo, A. and Gascuel, O. (2008). Fast NJ-like algorithms to dealwith incomplete distance matrices.BMC Bioinformatics, 9.
Gascuel, O. (2000). Data model and classification by trees: theminimum variance reduction (MVR) method.Journal ofClassification,17, 67–99.
See Also
Examples
data(woodmouse)rt <- dist.dna(woodmouse, variance = TRUE)v <- attr(rt, "variance")tr <- mvr(rt, v)plot(tr, "u")Neighbor-Joining Tree Estimation
Description
This function performs the neighbor-joining tree estimation of Saitouand Nei (1987).
Usage
nj(X)Arguments
X | a distance matrix; may be an object of class “dist”. |
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis
References
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a newmethod for reconstructing phylogenetic trees.Molecular Biologyand Evolution,4, 406–425.
Studier, J. A. and Keppler, K. J. (1988) A note on theneighbor-joining algorithm of Saitou and Nei.Molecular Biologyand Evolution,5, 729–731.
See Also
write.tree,read.tree,dist.dna,bionj,fastme,njs
Examples
### From Saitou and Nei (1987, Table 1):x <- c(7, 8, 11, 13, 16, 13, 17, 5, 8, 10, 13, 10, 14, 5, 7, 10, 7, 11, 8, 11, 8, 12, 5, 6, 10, 9, 13, 8)M <- matrix(0, 8, 8)M[lower.tri(M)] <- xM <- t(M)M[lower.tri(M)] <- xdimnames(M) <- list(1:8, 1:8)tr <- nj(M)plot(tr, "u")### a less theoretical exampledata(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw)Tree Reconstruction from Incomplete Distances With NJ* or bio-NJ*
Description
Reconstructs a phylogenetic tree from a distance matrix with possiblymissing values.
Usage
njs(X, fs = 15)bionjs(X, fs = 15)Arguments
X | a distance matrix. |
fs | arguments of the agglomerative criterion: it iscoerced as an integer and must at least equal to one. |
Details
Missing values represented by eitherNA or any negative number.
Basically, the Q* criterion is applied to all the pairs of leaves, andthes highest scoring ones are chosen for further analysis bythe agglomeration criteria that better handle missing distances (seereferences for details).
Value
an object of class"phylo".
Author(s)
Andrei Popescu
References
Criscuolo, A., Gascuel, O. (2008) Fast NJ-like algorithms to deal withincomplete distance matrices.BMC Bioinformatics,9,166.
See Also
Examples
data(woodmouse)d <- dist.dna(woodmouse)dm <- ddm[sample(length(dm), size = 3)] <- NAdist.topo(njs(dm), nj(d)) # often 0dm[sample(length(dm), size = 10)] <- NAdist.topo(njs(dm), nj(d)) # sometimes 0node.dating
Description
Estimate the dates of a rooted phylogenetic tree from the tip dates.
Usage
estimate.mu(t, node.dates, p.tol = 0.05)estimate.dates(t, node.dates, mu = estimate.mu(t, node.dates), min.date = -.Machine$double.xmax, show.steps = 0, opt.tol = 1e-8, nsteps = 1000, lik.tol = 0, is.binary = is.binary.phylo(t))Arguments
t | an object of class "phylo" |
node.dates | a numeric vector of dates for the tips, in the sameorder as 't$tip.label' or a vector of dates for all of the nodes. |
p.tol | p-value cutoff for failed regression. |
mu | mutation rate. |
min.date | the minimum bound on the dates of nodes |
show.steps | print the log likelihood every show.steps. If 0 willsupress output. |
opt.tol | tolerance for optimization precision. |
lik.tol | tolerance for likelihood comparison. |
nsteps | the maximum number of steps to run. |
is.binary | if TRUE, will run a faster optimization method thatonly works if the tree is binary; otherwise will use optimize() asthe optimization method. |
Details
This code duplicates the functionality of the program Tip.Dates (see references).The dates of the internal nodes of 't' are estimated using a maximum likelihoodapproach.
't' must be rooted and have branch lengths in units of expected substitutions persite.
'node.dates' can be either a numeric vector of dates for the tips or a numericvector for all of the nodes of 't'. 'estimate.mu' will use all of the valuesgiven in 'node.dates' to estimate the mutation rate. Dates can be censored withNA. 'node.dates' must contain all of the tip dates when it is a parameter of'estimate.dates'. If only tip dates are given, then 'estimate.dates' will run aninitial step to estimate the dates of the internal nodes. If 'node.dates'contains dates for some of the nodes, 'estimate.dates' will use those dates aspriors in the inital step. If all of the dates for nodes are given, then'estimate.dates' will not run the inital step.
If 'is.binary' is set to FALSE, 'estimate.dates' uses the "optimize" function asthe optimization method. By default, R's "optimize" function uses a precisionof ".Machine$double.eps^0.25", which is about 0.0001 on a 64-bit system. Thisshould be set to a smaller value if the branch lengths of 't' are very short. If'is.binary' is set to TRUE, estimate dates uses calculus to deterimine the maximumlikelihood at each step, which is faster. The bounds of permissible values arereduced by 'opt.tol'.
'estimate.dates' has several criteria to decide how many steps it will run. If'lik.tol' and 'nsteps' are both 0, then 'estimate.dates' will only run the initialstep. If 'lik.tol' is greater than 0 and 'nsteps' is 0, then 'estimate.dates'will run until the difference between successive steps is less than 'lik.tol'. If'lik.tol' is 0 and 'nsteps' is greater than 0, then 'estimate.dates' will run theinital step and then 'nsteps' steps. If 'lik.tol' and 'nsteps' are both greaterthan 0, then 'estimate.dates' will run the inital step and then either 'nsteps'steps or until the difference between successive steps is less than 'lik.tol'.
Value
The estimated mutation rate as a numeric vector of length one for estimate.mu.
The estimated dates of all of the nodes of the tree as a numeric vector withlength equal to the number of nodes in the tree.
Note
This model assumes that the tree follows a molecular clock. It only performs arudimentary statistical test of the molecular clock hypothesis.
Author(s)
Bradley R. Jones <email: brj1@sfu.ca>
References
Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihoodapproach.Journal of Molecular Evolution,17, 368–376.
Rambaut, A. (2000) Estimating the rate of molecular evolution:incorporating non-contemporaneous sequences into maximum likelihoodphylogenies.Bioinformatics,16, 395–399.
Jones, Bradley R., and Poon, Art F. Y. (2016)node.dating: dating ancestors in phylogenetic trees in RBioinformatics,33, 932–934.
See Also
Examples
t <- rtree(100)tip.date <- rnorm(t$tip.label, mean = node.depth.edgelength(t)[1:Ntip(t)])^2t <- rtt(t, tip.date)mu <- estimate.mu(t, tip.date)## Run for 100 stepsnode.date <- estimate.dates(t, tip.date, mu, nsteps = 100)## Run until the difference between successive log likelihoods is## less than $10^{-4}$ starting with the 100th step's resultsnode.date <- estimate.dates(t, node.date, mu, nsteps = 0, lik.tol = 1e-4)## To rescale the tree over timet$edge.length <- node.date[t$edge[, 2]] - node.date[t$edge[, 1]]Depth and Heights of Nodes and Tips
Description
These functions return the depths or heights of nodes and tips.
Usage
node.depth(phy, method = 1)node.depth.edgelength(phy)node.height(phy, clado.style = FALSE)Arguments
phy | an object of class "phylo". |
method | an integer value (1 or 2); 1: the node depths areproportional to the number of tips descending from each node, 2:they are evenly spaced. |
clado.style | a logical value; if |
Details
node.depth computes the depth of a node depending on the valueofmethod (see the optionnode.depth inplot.phylo). The value of 1 is given to the tips.
node.depth.edgelength does the same but using branch lengths.
node.height computes the heights of nodes and tips as plottedby a phylogram or a cladogram.
Value
A numeric vector indexed with the node numbers of the matrix ‘edge’ ofphy.
Author(s)
Emmanuel Paradis
See Also
Labelling the Nodes, Tips, and Edges of a Tree
Description
These functions add labels to or near the nodes, the tips, or theedges of a tree using text or plotting symbols. The text can beframed.
Usage
nodelabels(text, node, adj = c(0.5, 0.5), frame = "rect", pch = NULL, thermo = NULL, pie = NULL, piecol = NULL, col = "black", bg = "lightblue", horiz = FALSE, width = NULL, height = NULL, ...)tiplabels(text, tip, adj = c(0.5, 0.5), frame = "rect", pch = NULL, thermo = NULL, pie = NULL, piecol = NULL, col = "black", bg = "yellow", horiz = FALSE, width = NULL, height = NULL, offset = 0, ...)edgelabels(text, edge, adj = c(0.5, 0.5), frame = "rect", pch = NULL, thermo = NULL, pie = NULL, piecol = NULL, col = "black", bg = "lightgreen", horiz = FALSE, width = NULL, height = NULL, date = NULL, ...)Arguments
text | a vector of mode character giving the text to beprinted. Can be left empty. |
node | a vector of mode numeric giving the numbers of the nodeswhere the text or the symbols are to be printed. Can be left empty. |
tip | a vector of mode numeric giving the numbers of the tipswhere the text or the symbols are to be printed. Can be left empty. |
edge | a vector of mode numeric giving the numbers of the edgeswhere the text or the symbols are to be printed. Can be left empty. |
adj | one or two numeric values specifying the horizontal andvertical, respectively, justification of the text or symbols. Bydefault, the text is centered horizontally and vertically. If asingle value is given, this alters only the horizontal position ofthe text. |
frame | a character string specifying the kind of frame to beprinted around the text. This must be one of "rect" (the default),"circle", "none", or any unambiguous abbreviation of these. |
pch | a numeric giving the type of plotting symbol to be used;this is eventually recycled. See |
thermo | a numeric vector giving some proportions (values between0 and 1) for each node, or a numeric matrix giving some proportions(the rows must sum to one). It can be a data frame which is thenconverted into a matrix. |
pie | same than |
piecol | a list of colours (given as a character vector) to beused by |
col | a character string giving the color to be used for thetext or the plotting symbols; this is eventually recycled. |
bg | a character string giving the color to be used for thebackground of the text frames or of the plotting symbols if itapplies; this is eventually recycled. |
... | further arguments passed to the |
horiz,width,height | parameters controlling the aspect ofthermometers; by default, their width and height are determinedautomatically. |
offset | offset of the tip labels (can be negative). |
date | specifies the positions of labels on edges of chronogramswith respect to the time scale. |
Details
These three functions have the same optional arguments and the samefunctioning.
If the argumentstext is missing andpch andthermo are left asNULL, then the numbers of the nodes(or of the tips) are printed.
Ifnode,tip, oredge is missing, then the textor the symbols are printed on all nodes, tips, or edges.
The optioncex can be used to change the size of all types oflabels.
A simple call of these functions with no arguments (e.g.,nodelabels()) prints the numbers of all nodes (or tips).
In the case oftiplabels, it would be useful to play with theoptionsx.lim andlabel.offset (and possiblyshow.tip.label) ofplot.phylo in most cases (see theexamples).
Author(s)
Emmanuel Paradis, Ben Bolker, and Jim Lemon
See Also
plot.phylo,edges,mixedFontLabel
Examples
tr <- read.tree(text = "((Homo,Pan),Gorilla);")plot(tr)nodelabels("7.3 Ma", 4, frame = "r", bg = "yellow", adj = 0)nodelabels("5.4 Ma", 5, frame = "c", bg = "tomato", font = 3)## A trick by Liam Revell when there are many categories:plot(tr, x.lim = c(-1, 4))nodelabels(node = 4, pie = matrix(rep(1, 100), 1), cex = 5)op <- par(fg = "transparent")nodelabels(node = 5, pie = matrix(rep(1, 100), 1), cex = 5)par(op)data(bird.orders)plot(bird.orders, use.edge.length = FALSE, font = 1)bs <- round(runif(22, 90, 100), 0) # some imaginary bootstrap valuesbs2 <- round(runif(22, 90, 100), 0)bs3 <- round(runif(22, 90, 100), 0)nodelabels(bs, adj = 1.2)nodelabels(bs2, adj = -0.2, bg = "yellow")### something more classicalplot(bird.orders, use.edge.length = FALSE, font = 1)nodelabels(bs, adj = -0.2, frame = "n", cex = 0.8)nodelabels(bs2, adj = c(1.2, 1), frame = "n", cex = 0.8)nodelabels(bs3, adj = c(1.2, -0.2), frame = "n", cex = 0.8)### the same but we play with the fontplot(bird.orders, use.edge.length = FALSE, font = 1)nodelabels(bs, adj = -0.2, frame = "n", cex = 0.8, font = 2)nodelabels(bs2, adj = c(1.2, 1), frame = "n", cex = 0.8, font = 3)nodelabels(bs3, adj = c(1.2, -0.2), frame = "n", cex = 0.8)plot(bird.orders, "c", use.edge.length = FALSE, font = 1)nodelabels(thermo = runif(22), cex = .8)plot(bird.orders, "u", FALSE, font = 1, lab4ut = "a")nodelabels(cex = .75, bg = "yellow")### representing two characters at the tips (you could have as many### as you want)plot(bird.orders, "c", FALSE, font = 1, label.offset = 3, x.lim = 31, no.margin = TRUE)tiplabels(pch = 21, bg = gray(1:23/23), cex = 2, adj = 1.4)tiplabels(pch = 19, col = c("yellow", "red", "blue"), adj = 2.5, cex = 2)### This can be used to highlight tip labels:plot(bird.orders, font = 1)i <- c(1, 7, 18)tiplabels(bird.orders$tip.label[i], i, adj = 0)### Some random data to compare piecharts and thermometres:tr <- rtree(15)x <- runif(14, 0, 0.33)y <- runif(14, 0, 0.33)z <- runif(14, 0, 0.33)x <- cbind(x, y, z, 1 - x - y - z)layout(matrix(1:2, 1, 2))plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(pie = x, cex = 1.3)text(4.5, 15, "Are you \"pie\"...", font = 4, cex = 1.5)plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(thermo = x, col = rainbow(4), cex = 1.3)text(4.5, 15, "... or \"thermo\"?", font = 4, cex = 1.5)plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(thermo = x, col = rainbow(4), cex = 1.3)plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(thermo = x, col = rainbow(4), width = 3, horiz = TRUE)layout(1)plot(tr, main = "Showing Edge Lengths")edgelabels(round(tr$edge.length, 3), srt = 90)plot(tr, "p", FALSE)edgelabels("above", adj = c(0.5, -0.25), bg = "yellow")edgelabels("below", adj = c(0.5, 1.25), bg = "lightblue")Find Paths of Nodes
Description
This function finds paths of nodes in a tree. The nodes can beinternal and/or terminal (i.e., tips).
Usage
nodepath(phy, from = NULL, to = NULL)Arguments
phy | an object of class |
from,to | integers giving node or tip numbers. |
Details
By default, this function returns all the paths from the root to eachtip of the tree. If both argumentsfrom andto arespecified, the shortest path of nodes linking them is returned.
Value
a list of vectors of integers (by default), or a single vector ofintegers.
Author(s)
Emmanuel Paradis
See Also
Examples
tr <- rtree(2)nodepath(tr)nodepath(tr, 1, 2)Test of host-parasite coevolution
Description
Functionparafit tests the hypothesis of coevolution between a clade of hosts and a clade of parasites. The null hypothesis (H0) of the global test is that the evolution of the two groups, as revealed by the two phylogenetic trees and the set of host-parasite association links, has been independent. Tests of individual host-parasite links are also available as an option.
The method, which is described in detail in Legendre et al. (2002), requires some estimates of the phylogenetic trees or phylogenetic distances, and also a description of the host-parasite associations (H-P links) observed in nature.
Usage
parafit(host.D, para.D, HP, nperm = 999, test.links = FALSE, seed = NULL, correction = "none", silent = FALSE)Arguments
host.D | A matrix of phylogenetic or patristic distances among the hosts (object class: |
para.D | A matrix of phylogenetic or patristic distances among the parasites (object class: |
HP | A rectangular matrix with hosts as rows and parasites as columns. The matrix contains 1's when a host-parasite link has been observed in nature between the host in the row and the parasite in the column, and 0's otherwise. |
nperm | Number of permutations for the tests. If |
test.links |
|
seed |
|
correction | Correction methods for negative eigenvalues (details below): |
silent | Informative messages and the time to compute the tests will not be written to theR console if silent=TRUE. Useful when the function is called by a numerical simulation function. |
Details
Two types of test are produced by the program: a global test of coevolution and, optionally, a test on the individual host-parasite (H-P) link.
The function computes principal coordinates for the host and the parasite distance matrices. The principal coordinates (all of them) act as a complete representation of either the phylogenetic distance matrix or the phylogenetic tree.
Phylogenetic distance matrices are normally Euclidean. Patristic distance matrices are additive, thus they are metric and Euclidean. Euclidean matrices are fully represented by real-valued principal coordinate axes. For non-Euclidean matrices, negative eigenvalues are produced; complex principal coordinate axes are associated with the negative eigenvalues. So, the program rejects matrices that are not Euclidean and stops.
Negative eigenvalues can be corrected for by one of two methods: the Lingoes or the Caillez correction. It is up to the user to decide which correction method should be applied. This is done by selecting the optioncorrection="lingoes" orcorrection="cailliez". Details on these correction methods are given in the help file of thepcoa function.
The principle of the global test is the following (H0: independent evolution of the hosts and parasites): (1) Compute matrix D = C t(A) B. Note: D is a fourth-corner matrix (sensu Legendre et al. 1997), where A is the H-P link matrix, B is the matrix of principal coordinates computed from the host.D matrix, and C is the matrix of principal coordinates computed from the para.D matrix. (2) Compute the statistic ParaFitGlobal, the sum of squares of all values in matrix D. (3) Permute at random, separately, each row of matrix A, obtaining matrix A.perm. Compute D.perm = C
The test of each individual H-P link is carried out as follows (H0: this particular link is random): (1) Remove one link (k) from matrix A. (2) Compute matrix D = C t(A) B. (3a) Compute trace(k), the sum of squares of all values in matrix D. (3b) Compute the statistic ParaFitLink1 = (trace - trace(k)) where trace is the ParaFitGlobal statistic. (3c) Compute the statistic ParaFitLink2 = (trace - trace(k)) / (tracemax - trace) where tracemax is the maximum value that can be taken by trace. (4) Permute at random, separately, each row of matrix A, obtaining A.perm. Use the same sequences of permutations as were used in the test of ParaFitGlobal. Using the values of trace and trace.perm saved during the global test, compute the permuted values of the two statistics, ParaFit1.perm and ParaFit2.perm. (5) Repeat step 4 a large number of times. (6) Add the reference value of ParaFit1 to the distribution of ParaFit1.perm values; add the reference value of ParaFit2 to the distribution of ParaFit2.perm values. Calculate the permutational probabilities associated to ParaFit1 and ParaFit2.
Theprint.parafit function prints out the results of the global test and, optionally, the results of the tests of the individual host-parasite links.
Value
ParaFitGlobal | The statistic of the global H-P test. |
p.global | The permutational p-value associated with the ParaFitGlobal statistic. |
link.table | The results of the tests of individual H-P links, including the ParaFitLink1 and ParaFitLink2 statistics and the p-values obtained from their respective permutational tests. |
para.per.host | Number of parasites per host. |
host.per.para | Number of hosts per parasite. |
nperm | Number of permutations for the tests. |
Author(s)
Pierre Legendre, Universite de Montreal
References
Hafner, M. S, P. D. Sudman, F. X. Villablanca, T. A. Spradling, J. W. Demastes and S. A. Nadler. 1994. Disparate rates of molecular evolution in cospeciating hosts and parasites.Science,265, 1087–1090.
Legendre, P., Y. Desdevises and E. Bazin. 2002. A statistical test for host-parasite coevolution.Systematic Biology,51(2), 217–234.
See Also
Examples
## Gopher and lice data from Hafner et al. (1994)data(gopher.D)data(lice.D)data(HP.links)res <- parafit(gopher.D, lice.D, HP.links, nperm=99, test.links=TRUE)# res # or else: print(res)Principal Coordinate Analysis
Description
Functionpcoa computes principal coordinate decomposition(also called classical scaling) of a distance matrix D (Gower 1966). Itimplements two correction methods for negative eigenvalues.
Usage
pcoa(D, correction="none", rn=NULL)## S3 method for class 'pcoa'biplot(x, Y=NULL, plot.axes = c(1,2), dir.axis1=1, dir.axis2=1, rn=NULL, main=NULL, ...)Arguments
D | A distance matrix of class |
correction | Correction methods for negative eigenvalues (detailsbelow): |
rn | An optional vector of row names, of length n, for the nobjects. |
x | Output object from |
Y | Any rectangular data table containing explanatory variablesto be projected onto the ordination plot. That table may contain,for example, the community composition data used to compute D, orany transformation of these data; see examples. |
plot.axes | The two PCoA axes to plot. |
dir.axis1 | = -1 to revert axis 1 for the projection of pointsand variables. Default value: +1. |
dir.axis2 | = -1 to revert axis 2 for the projection of pointsand variables. Default value: +1. |
main | An optional title. |
... | Other graphical arguments passed to function. |
Details
This function implements two methods for correcting for negativevalues in principal coordinate analysis (PCoA). Negative eigenvaluescan be produced in PCoA when decomposing distance matrices produced bycoefficients that are not Euclidean (Gower and Legendre 1986,Legendreand Legendre 1998).
Inpcoa, when negative eigenvalues are present in thedecomposition results, the distance matrix D can be modified usingeither the Lingoes or the Cailliez procedure to produce resultswithout negative eigenvalues.
In the Lingoes (1971) procedure, a constant c1, equal to twiceabsolute value of the largest negative value of the original principalcoordinate analysis, is added to each original squared distance in thedistance matrix, except the diagonal values. A newe principalcoordinate analysis, performed on the modified distances, has at most(n-2) positive eigenvalues, at least 2 null eigenvalues, and nonegative eigenvalue.
In the Cailliez (1983) procedure, a constant c2 is added to theoriginal distances in the distance matrix, except the diagonalvalues. The calculation of c2 is described in Legendre and Legendre(1998). A new principal coordinate analysis, performed on the modifieddistances, has at most (n-2) positive eigenvalues, at least 2 nulleigenvalues, and no negative eigenvalue.
In all cases, only the eigenvectors corresponding to positiveeigenvalues are shown in the output list. The eigenvectors are scaledto the square root of the corresponding eigenvalues. Gower (1966) hasshown that eigenvectors scaled in that way preserve the originaldistance (in the D matrix) among the objects. These eigenvectors canbe used to plot ordination graphs of the objects.
We recommend not to use PCoA to produce ordinations from the chord,chi-square, abundance profile, or Hellinger distances. It is easier tofirst transform the community composition data using the followingtransformations, available in thedecostand function of thevegan package, and then carry out a principal componentanalysis (PCA) on the transformed data:
Chord transformation: decostand(spiders,"normalize")
Transformation to relative abundance profiles:decostand(spiders,"total")
Hellinger transformation: decostand(spiders,"hellinger")
Chi-square transformation: decostand(spiders,"chi.square")
The ordination results will be identical and the calculationsshorter. This two-step ordination method, called transformation-basedPCA (tb-PCA), was described by Legendre and Gallagher (2001).
Thebiplot.pcoa function produces plots for any pair ofprincipal coordinates. The original variables can be projected ontothe ordination plot.
Value
correction | The values of parameter |
note | A note describing the type of correction done, if any. |
values | The eigenvalues and related information: |
Eigenvalues | All eigenvalues (positive, null, negative). |
Relative_eig | Relative eigenvalues. |
Corr_eig | Corrected eigenvalues (Lingoes correction); Legendreand Legendre (1998, p. 438, eq. 9.27). |
Rel_corr_eig | Relative eigenvalues after Lingoes or Cailliezcorrection. |
Broken_stick | Expected fractions of variance under the brokenstick model. |
Cumul_eig | Cumulative relative eigenvalues. |
Cum_corr_eig | Cumulative corrected relative eigenvalues. |
Cumul_br_stick | Cumulative broken stick fractions. |
vectors | The principal coordinates with positive eigenvalues. |
trace | The trace of the distance matrix. This is also the sum ofall eigenvalues, positive and negative. |
vectors.cor | The principal coordinates with positiveeigenvalues from the distance matrix corrected using the methodspecified by parameter |
trace.cor | The trace of the corrected distance matrix. This isalso the sum of its eigenvalues. |
Author(s)
Pierre Legendre, Universite de Montreal
References
Cailliez, F. (1983) The analytical solution of the additive constantproblem.Psychometrika,48, 305–308.
Gower, J. C. (1966) Some distance properties of latent root and vectormethods used in multivariate analysis.Biometrika,53,325–338.
Gower, J. C. and Legendre, P. (1986) Metric and Euclidean propertiesof dissimilarity coefficients.Journal of Classification,3, 5–48.
Legendre, P. and Gallagher, E. D. (2001) Ecologically meaningfultransformations for ordination of species data.Oecologia,129, 271–280.
Legendre, P. and Legendre, L. (1998)Numerical Ecology, 2ndEnglish edition. Amsterdam: Elsevier Science BV.
Lingoes, J. C. (1971) Some boundary conditions for a monotone analysisof symmetric matrices.Psychometrika,36, 195–203.
Examples
## Oribatid mite data from Borcard and Legendre (1994)## Not run: if (require(vegan)) {data(mite) # Community composition data, 70 peat cores, 35 species## Select rows 1:30. Species 35 is absent from these rows. Transform to logmite.log <- log(mite[1:30, -35] + 1) # Equivalent: log1p(mite[1:30, -35])## Principal coordinate analysis and simple ordination plotmite.D <- vegdist(mite.log, "bray")res <- pcoa(mite.D)res$valuesbiplot(res)## Project unstandardized and standardized species on the PCoA ordination plotmite.log.st = apply(mite.log, 2, scale, center=TRUE, scale=TRUE)par(mfrow=c(1,2))biplot(res, mite.log)biplot(res, mite.log.st)# Reverse the ordination axes in the plotpar(mfrow=c(1,2))biplot(res, mite.log, dir.axis1=-1, dir.axis2=-1)biplot(res, mite.log.st, dir.axis1=-1, dir.axis2=-1)}## End(Not run)Tree Annotation
Description
phydataplot plots data on a tree in a way that adapts to thetype of tree.ring does the same for circular trees.
Both functions match the data with the labels of the tree.
Usage
phydataplot(x, phy, style = "bars", offset = 1, scaling = 1, continuous = FALSE, width = NULL, legend = "below", funcol = rainbow, ...)ring(x, phy, style = "ring", offset = 1, ...)Arguments
x | a vector, a factor, a matrix, or a data frame. |
phy | the tree (which must be already plotted). |
style | a character string specifying the type of graphics; canbe abbreviated (see details). |
offset | the space between the tips of the tree and the plot. |
scaling | the scaling factor to apply to the data. |
continuous | (used if style="mosaic") a logical specifyingwhether to treat the values in |
width | (used if style = "mosaic") the width of the cells; bydefault, all the available space is used. |
legend | (used if style = "mosaic") the place where to draw thelegend; one of |
funcol | (used if style = "mosaic") the function used to generatethe colours (see details and examples). |
... | further arguments passed to the graphical functions. |
Details
The possible values forstyle are “bars”, “segments”,“image”, “arrows”, “boxplot”, “dotchart”, or “mosaic” forphydataplot, and “ring”, “segments”, or “arrows” forring.
style = "image" works only with square matrices (e.g.,similarities). If you want to plot a DNA alignment in the same waythanimage.DNAbin, trystyle = "mosaic".
style = "mosaic" can plot any kind of matrices, possibly afterdiscretizing its values (usingcontinuous). The default colourpalette is taken from the functionrainbow.If you want to use specified colours, a function simply returning thevector of colours must be used, possibly with names if you want toassign a specific colour to each value (see examples).
Note
For the moment, only rightwards trees are supported (does not apply tocircular trees).
Author(s)
Emmanuel Paradis
See Also
plot.phylo,nodelabels,fancyarrows
Examples
## demonstrates matching with names:tr <- rcoal(n <- 10)x <- 1:nnames(x) <- tr$tip.labelplot(tr, x.lim = 11)phydataplot(x, tr)## shuffle x but matching names with tip labels reorders them:phydataplot(sample(x), tr, "s", lwd = 3, lty = 3)## adapts to the tree:plot(tr, "f", x.l = c(-11, 11), y.l = c(-11, 11))phydataplot(x, tr, "s")## leave more space with x.lim to show a barplot and a dotchart:plot(tr, x.lim = 22)phydataplot(x, tr, col = "yellow")phydataplot(x, tr, "d", offset = 13)ts <- rcoal(N <- 100)X <- rTraitCont(ts) # names are setdd <- dist(X)op <- par(mar = rep(0, 4))plot(ts, x.lim = 10, cex = 0.4, font = 1)phydataplot(as.matrix(dd), ts, "i", offset = 0.2)par(xpd = TRUE, mar = op$mar)co <- c("blue", "red"); l <- c(-2, 2)X <- X + abs(min(X)) # move scale so X >= 0plot(ts, "f", show.tip.label = FALSE, x.lim = l, y.lim = l, open.angle = 30)phydataplot(X, ts, "s", col = co, offset = 0.05)ring(X, ts, "ring", col = co, offset = max(X) + 0.1) # the same info as a ring## as many rings as you want...co <- c("blue", "yellow")plot(ts, "r", show.tip.label = FALSE, x.l = c(-1, 1), y.l = c(-1, 1))for (o in seq(0, 0.4, 0.2)) { co <- rev(co) ring(0.2, ts, "r", col = rep(co, each = 5), offset = o)}lim <- c(-5, 5)co <- rgb(0, 0.4, 1, alpha = 0.1)y <- seq(0.01, 1, 0.01)plot(ts, "f", x.lim = lim, y.lim = lim, show.tip.label = FALSE)ring(y, ts, offset = 0, col = co, lwd = 0.1)for (i in 1:3) { y <- y + 1 ring(y, ts, offset = 0, col = co, lwd = 0.1)}## rings can be in the backgroundplot(ts, "r", plot = FALSE)ring(1, ts, "r", col = rainbow(100), offset = -1)par(new = TRUE)plot(ts, "r", font = 1, edge.color = "white")## might be more useful:co <- c("lightblue", "yellow")plot(ts, "r", plot = FALSE)ring(0.1, ts, "r", col = sample(co, size = N, rep = TRUE), offset = -.1)par(new = TRUE)plot(ts, "r", font = 1)## if x is matrix:tx <- rcoal(m <- 20)X <- runif(m, 0, 0.5); Y <- runif(m, 0, 0.5)X <- cbind(X, Y, 1 - X - Y)rownames(X) <- tx$tip.labelplot(tx, x.lim = 6)co <- rgb(diag(3))phydataplot(X, tx, col = co)## a variation:plot(tx, show.tip.label = FALSE, x.lim = 5)phydataplot(X, tx, col = co, offset = 0.05, border = NA)plot(tx, "f", show.tip.label = FALSE, open.angle = 180)ring(X, tx, col = co, offset = 0.05)Z <- matrix(rnorm(m * 5), m)rownames(Z) <- rownames(X)plot(tx, x.lim = 5)phydataplot(Z, tx, "bo", scaling = .5, offset = 0.5, boxfill = c("gold", "skyblue"))## plot an alignment with a NJ tree:data(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)phydataplot(woodmouse[, 1:50], trw, "m", 0.02, border = NA)## use type = "mosaic" on a 30x5 matrix:tr <- rtree(n <- 30)p <- 5x <- matrix(sample(3, size = n*p, replace = TRUE), n, p)dimnames(x) <- list(paste0("t", 1:n), LETTERS[1:p])plot(tr, x.lim = 35, align.tip = TRUE, adj = 1)phydataplot(x, tr, "m", 2)## change the aspect:plot(tr, x.lim = 35, align.tip = TRUE, adj = 1)phydataplot(x, tr, "m", 2, width = 2, border = "white", lwd = 3, legend = "side")## user-defined colour:f <- function(n) c("yellow", "blue", "red")phydataplot(x, tr, "m", 18, width = 2, border = "white", lwd = 3, legend = "side", funcol = f)## alternative colour function...:## fb <- function(n) c("3" = "red", "2" = "blue", "1" = "yellow")## ... but since the values are sorted alphabetically,## both f and fb will produce the same plot.## use continuous = TRUE with two different scales:x[] <- 1:(n*p)plot(tr, x.lim = 35, align.tip = TRUE, adj = 1)phydataplot(x, tr, "m", 2, width = 1.5, continuous = TRUE, legend = "side", funcol = colorRampPalette(c("white", "darkgreen")))phydataplot(x, tr, "m", 18, width = 1.5, continuous = 5, legend = "side", funcol = topo.colors)Fits a Bunch of Models with PhyML
Description
This function calls PhyML and fits successively 28 models of DNAevolution. The results are saved on disk, as PhyML usually does, andreturned inR as a vector with the log-likelihood value of each model.
Usage
phymltest(seqfile, format = "interleaved", itree = NULL, exclude = NULL, execname = NULL, append = TRUE)## S3 method for class 'phymltest'print(x, ...)## S3 method for class 'phymltest'summary(object, ...)## S3 method for class 'phymltest'plot(x, main = NULL, col = "blue", ...)Arguments
seqfile | a character string giving the name of the file thatcontains the DNA sequences to be analysed by PhyML. |
format | a character string specifying the format of the DNAsequences: either |
itree | a character string giving the name of a file with a treein Newick format to be used as an initial tree by PhyML. If |
exclude | a vector of mode character giving the models to beexcluded from the analysis. These must be among those below, andfollow the same syntax. |
execname | a character string specifying the name of the PhyMLexecutable. This argument can be left as |
append | a logical indicating whether to erase previous PhyMLoutput files if present; the default is to not erase. |
x | an object of class |
object | an object of class |
main | a title for the plot; if left |
col | a colour used for the segments showing the AIC values (blueby default). |
... | further arguments passed to or from other methods. |
Details
The present function requires version 3.0.1 of PhyML; it won't work witholder versions.
The user must take care to set correctly the three different pathsinvolved here: the path to PhyML's binary, the path to the sequencefile, and the path to R's working directory. The function should workif all three paths are different. Obviously, there should be no problemif they are all the same.
The following syntax is used for the models:
"X[Y][Z]00[+I][+G]"
where "X" is the first letter of the author of the model, "Y" and "Z"are possibly other co-authors of the model, "00" is the year of thepublication of the model, and "+I" and "+G" indicates whether thepresence of invariant sites and/or a gamma distribution ofsubstitution rates have been specified. Thus, Kimura's model isdenoted "K80" and not "K2P". The exception to this rule is the generaltime-reversible model which is simply denoted "GTR" model.
The seven substitution models used are: "JC69", "K80", "F81", "F84","HKY85", "TN93", and "GTR". These models are then altered by addingthe "+I" and/or "+G", resulting thus in four variants for each of them(e.g., "JC69", "JC69+I", "JC69+G", "JC69+I+G"). Some of these modelsare described in the help page ofdist.dna.
When a gamma distribution of substitution rates is specified, fourcategories are used (which is PhyML's default behaviour), and the“alpha” parameter is estimated from the data.
For the models with a different substition rate for transitions andtransversions, these rates are left free and estimated from the data(and not constrained with a ratio of 4 as in PhyML's default).
The optionpath2exec has been removed in the present version:the path to PhyML's executable can be specified with the optionexecname.
Value
phymltest returns an object of class"phymltest": anumeric vector with the models as names.
Theprint method prints an object of class"phymltest"as matrix with the name of the models, the number of free parameters,the log-likelihood value, and the value of the Akaike informationcriterion (AIC = -2 * loglik + 2 * number of free parameters)
Thesummary method prints all the possible likelihood ratiotests for an object of class"phymltest".
Theplot method plots the values of AIC of an object of class"phymltest" on a vertical scale.
Note
It is important to note that the models fitted by this function isonly a small fraction of the models possible with PhyML. For instance,it is possible to vary the number of categories in the (discretized)gamma distribution of substitution rates, and many parameters can befixed by the user. The results from the present function should ratherbe taken as indicative of a best model.
Author(s)
Emmanuel Paradis
References
Posada, D. and Crandall, K. A. (2001) Selecting the best-fit model ofnucleotide substitution.Systematic Biology,50,580–601.
Guindon, S. and Gascuel, O. (2003) A simple, fast, and accuratealgorithm to estimate large phylogenies by maximum likelihood.Systematic Biology,52, 696–704.http://www.atgc-montpellier.fr/phyml/
See Also
Examples
### A `fake' example with random likelihood values: it does not### make sense, but does not need PhyML and gives you a flavour### of what the output looks like:x <- runif(28, -100, -50)names(x) <- ape:::.phymltest.modelclass(x) <- "phymltest"xsummary(x)plot(x)plot(x, main = "", col = "red")### This example needs PhyML, copy/paste or type the### following commands if you want to try them, eventually### changing setwd() and the options of phymltest()## Not run: setwd("D:/phyml_v2.4/exe") # under Windowsdata(woodmouse)write.dna(woodmouse, "woodmouse.txt")X <- phymltest("woodmouse.txt")Xsummary(X)plot(X)## End(Not run)Phylogenetically Independent Contrasts
Description
Compute the phylogenetically independent contrasts using the methoddescribed by Felsenstein (1985).
Usage
pic(x, phy, scaled = TRUE, var.contrasts = FALSE, rescaled.tree = FALSE)Arguments
x | a numeric vector. |
phy | an object of class |
scaled | logical, indicates whether the contrasts should bescaled with their expected variances (default to |
var.contrasts | logical, indicates whether the expectedvariances of the contrasts should be returned (default to |
rescaled.tree | logical, if |
Details
Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.
The user must be careful here since the function requires that bothseries of names perfectly match. If both series of names do not match,the values in thex are taken to be in the same order than thetip labels ofphy, and a warning message is issued.
Value
either a vector of phylogenetically independent contrasts (ifvar.contrasts = FALSE), or a two-column matrix with thephylogenetically independent contrasts in the first column and theirexpected variance in the second column (ifvar.contrasts = TRUE). If the tree has node labels, these are used as labels of thereturned object.
Ifrescaled.tree = TRUE, a list is returned with two elementsnamed “contr” with the above results and “rescaled.tree” with thetree and its rescaled branch lengths (see Felsenstein 1985).
Author(s)
Emmanuel Paradis
References
Felsenstein, J. (1985) Phylogenies and the comparative method.American Naturalist,125, 1–15.
See Also
read.tree,compar.gee,compar.lynch,pic.ortho,varCompPhylip
Examples
### The example in Phylip 3.5c (originally from Lynch 1991)x <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = x)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)names(X) <- names(Y) <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")pic.X <- pic(X, tree.primates)pic.Y <- pic(Y, tree.primates)cor.test(pic.X, pic.Y)lm(pic.Y ~ pic.X - 1) # both regressionslm(pic.X ~ pic.Y - 1) # through the originPhylogenetically Independent Orthonormal Contrasts
Description
This function computes the orthonormal contrasts using the methoddescribed by Felsenstein (2008). Only a single trait can be analyzed;there can be several observations per species.
Usage
pic.ortho(x, phy, var.contrasts = FALSE, intra = FALSE)Arguments
x | a numeric vector or a list of numeric vectors. |
phy | an object of class |
var.contrasts | logical, indicates whether the expectedvariances of the contrasts should be returned (default to |
intra | logical, whether to return the intraspecific contrasts. |
Details
The datax can be in two forms: a vector if there is a singleobservation for each species, or a list whose elements are vectorscontaining the individual observations for each species. These vectorsmay be of different lengths.
Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.
Value
either a vector of contrasts, or a two-column matrix with thecontrasts in the first column and their expected variances in thesecond column (ifvar.contrasts = TRUE). If the tree has nodelabels, these are used as labels of the returned object.
Ifintra = TRUE, the attribute"intra", a list ofvectors with the intraspecific contrasts orNULL for thespecies with a one observation, is attached to the returned object.
Author(s)
Emmanuel Paradis
References
Felsenstein, J. (2008) Comparative methods with sampling error andwithin-species variation: Contrasts revisited and revised.American Naturalist,171, 713–725.
See Also
Examples
tr <- rcoal(30)### a single observation per species:x <- rTraitCont(tr)pic.ortho(x, tr)pic.ortho(x, tr, TRUE)### different number of observations per species:x <- lapply(sample(1:5, 30, TRUE), rnorm)pic.ortho(x, tr, intra = TRUE)Plot a Correlogram
Description
These functions plot correlagrams previously computed withcorrelogram.formula.
Usage
## S3 method for class 'correlogram'plot(x, legend = TRUE, test.level = 0.05, col = c("grey", "red"), type = "b", xlab = "", ylab = "Moran's I", pch = 21, cex = 2, ...) ## S3 method for class 'correlogramList'plot(x, lattice = TRUE, legend = TRUE, test.level = 0.05, col = c("grey", "red"), xlab = "", ylab = "Moran's I", type = "b", pch = 21, cex = 2, ...)Arguments
x | an object of class |
legend | should a legend be added on the plot? |
test.level | the level used to discriminate the plotting symbolswith colours considering the P-values. |
col | two colours for the plotting symbols: the first one is usedif the P-value is greater than or equal to |
type | the type of plot to produce (see |
xlab | an optional character string for the label on the x-axis(none by default). |
ylab | the default label on the y-axis. |
pch | the type of plotting symbol. |
cex | the default size for the plotting symbols. |
lattice | when plotting several correlograms, should they beplotted in trellis-style with lattice (the default), or together onthe same plot? |
... | other parameters passed to the |
Details
When plotting several correlograms with lattice, some options have noeffect:legend,type, andpch (pch=19 isalways used in this situation).
When usingpch between 1 and 20 (i.e., non-filled symbols, thecolours specified incol are also used for the lines joiningthe points. To keep black lines, it is better to leavepchbetween 21 and 25.
Author(s)
Emmanuel Paradis
See Also
Plot Phylogenies
Description
These functions plot phylogenetic trees.
Usage
## S3 method for class 'phylo'plot(x, type = "phylogram", use.edge.length = TRUE, node.pos = NULL, show.tip.label = TRUE, show.node.label = FALSE, edge.color = NULL, edge.width = NULL, edge.lty = NULL, node.color = NULL, node.width = NULL, node.lty = NULL, font = 3, cex = par("cex"), adj = NULL, srt = 0, no.margin = FALSE, root.edge = FALSE, label.offset = 0, underscore = FALSE, x.lim = NULL, y.lim = NULL, direction = "rightwards", lab4ut = NULL, tip.color = par("col"), plot = TRUE, rotate.tree = 0, open.angle = 0, node.depth = 1, align.tip.label = FALSE, ...)## S3 method for class 'multiPhylo'plot(x, layout = 1, ...)Arguments
x | an object of class |
type | a character string specifying the type of phylogeny to bedrawn; it must be one of "phylogram" (the default), "cladogram","fan", "unrooted", "radial", "tidy", or any unambiguous abbreviationof these. |
use.edge.length | a logical indicating whether to use the edgelengths of the phylogeny to draw the branches (the default) or not(if |
node.pos | a numeric taking the value 1 or 2 which specifies thevertical position of the nodes with respect to their descendants. If |
show.tip.label | a logical indicating whether to show the tiplabels on the phylogeny (defaults to |
show.node.label | a logical indicating whether to show the nodelabels on the phylogeny (defaults to |
edge.color | a vector of mode character giving the colours usedto draw the branches of the plotted phylogeny. These are taken to bein the same order than the component |
edge.width | a numeric vector giving the width of the branches ofthe plotted phylogeny. These are taken to be in the same order thanthe component |
edge.lty | same as the previous argument but for line types;1: plain, 2: dashed, 3: dotted, 4: dotdash, 5: longdash, 6: twodash. |
node.color | a vector of mode character giving the colours usedto draw the perpendicular lines associated with each node of theplotted phylogeny. These are taken to bein the same order than the component |
node.width | as the previous argument, but for line widths. |
node.lty | as the previous argument, but for line types;1: plain, 2: dashed, 3: dotted, 4: dotdash, 5: longdash, 6: twodash. |
font | an integer specifying the type of font for the labels: 1(plain text), 2 (bold), 3 (italic, the default), or 4 (bolditalic). |
cex | a numeric value giving the factor scaling of the tip andnode labels (Character EXpansion). The default is to take thecurrent value from the graphical parameters. |
adj | a numeric specifying the justification of the text stringsof the labels: 0 (left-justification), 0.5 (centering), or 1(right-justification). This option has no effect if |
srt | a numeric giving how much the labels are rotated in degrees(negative values are allowed resulting in clock-like rotation); thevalue has an effect respectively to the value of |
no.margin | a logical. If |
root.edge | a logical indicating whether to draw the root edge(defaults to FALSE); this has no effect if ‘use.edge.length = FALSE’or if ‘type = "unrooted"’. |
label.offset | a numeric giving the space between the nodes andthe tips of the phylogeny and their corresponding labels. Thisoption has no effect if |
underscore | a logical specifying whether the underscores in tiplabels should be written as spaces (the default) or left as are (if |
x.lim | a numeric vector of length one or two giving the limit(s)of the x-axis. If |
y.lim | same than above for the y-axis. |
direction | a character string specifying the direction of thetree. Four values are possible: "rightwards" (the default),"leftwards", "upwards", and "downwards". |
lab4ut | (= labels for unrooted trees) a character stringspecifying the display of tip labels for unrooted trees (can beabbreviated): either |
tip.color | the colours used for the tip labels, eventuallyrecycled (see examples). |
plot | a logical controlling whether to draw the tree. If |
rotate.tree | for "fan", "unrooted", or "radial" trees: therotation of the whole tree in degrees (negative values areaccepted). |
open.angle | if |
node.depth | an integer value (1 or 2) used if branch lengths arenot used to plot the tree; 1: the node depths are proportional tothe number of tips descending from each node (the default and was theonly possibility previously), 2: they are evenly spaced. |
align.tip.label | a logical value or an integer. If |
layout | the number of trees to be plotted simultaneously. |
... | further arguments to be passed to |
Details
Ifx is a list of trees (i.e., an object of class"multiPhylo"), then any further argument may be passed with... and could be any one of those listed above for a singletree.
The font format of the labels of the nodes and the tips is the same.
Ifno.margin = TRUE, the margins are set to zero and are notrestored after plotting the tree, so that the user can access thecoordinates system of the plot.
The option ‘node.pos’ allows the user to alter the vertical position(i.e., ordinates) of the nodes. Ifnode.pos = 1, then theordinate of a node is the mean of the ordinates of its directdescendants (nodes and/or tips). Ifnode.pos = 2, then theordinate of a node is the mean of the ordinates of all the tips ofwhich it is the ancestor. Ifnode.pos = NULL (the default),then its value is determined with respect to other options: iftype = "phylogram" then ‘node.pos = 1’; iftype = "cladogram" anduse.edge.length = FALSE then ‘node.pos = 2’;iftype = "cladogram" anduse.edge.length = TRUE then‘node.pos = 1’. Remember that in this last situation, the branchlengths make sense when projected on the x-axis.
Ifadj is not specified, then the value is determined withrespect todirection: ifdirection = "leftwards" thenadj = 1 (0 otherwise).
If the argumentsx.lim andy.lim are not specified by theuser, they are determined roughly by the function. This may not alwaysgive a nice result: the user may check these values with the(invisibly) returned list (see “Value:”).
If you usealign.tip.label = TRUE withtype = "fan", youwill have certainly to setx.lim andy.lim manually.
If you resize manually the graphical device (windows or X11) you mayneed to replot the tree.
Value
plot.phylo returns invisibly a list with the followingcomponents which values are those used for the current plot:
type | |
use.edge.length | |
node.pos | |
node.depth | |
show.tip.label | |
show.node.label | |
font | |
cex | |
adj | |
srt | |
no.margin | |
label.offset | |
x.lim | |
y.lim | |
direction | |
tip.color | |
Ntip | |
Nnode | |
root.time | |
align.tip.label |
Note
The argumentasp cannot be passed with....
Author(s)
Emmanuel Paradis, Martin Smith, Damien de Vienne
References
van der Ploeg, A. (2014) Drawing non-layered tidy trees in lineartime.Journal of Software: Practice and Experience,44,1467–1484.
See Also
read.tree,trex,kronoviz,add.scale.bar,axisPhylo,nodelabels,edges,plot for the basic plotting function in R
Examples
### An extract from Sibley and Ahlquist (1990)x <- "(((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3):6.3,Tyto_alba:13.5);"tree.owls <- read.tree(text= x)plot(tree.owls)### Show the types of trees.layout(matrix(1:6, 3, 2))plot(tree.owls, main = "With branch lengths")plot(tree.owls, type = "c")plot(tree.owls, type = "u")plot(tree.owls, use.edge.length = FALSE, main = "Without branch lengths")plot(tree.owls, type = "c", use.edge.length = FALSE)plot(tree.owls, type = "u", use.edge.length = FALSE)layout(1)data(bird.orders)### using random colours and thicknessplot(bird.orders, edge.color = sample(colors(), length(bird.orders$edge)/2), edge.width = sample(1:10, length(bird.orders$edge)/2, replace = TRUE))title("Random colours and branch thickness")### rainbow colouring...X <- c("red", "orange", "yellow", "green", "blue", "purple")plot(bird.orders, edge.color = sample(X, length(bird.orders$edge)/2, replace = TRUE), edge.width = sample(1:10, length(bird.orders$edge)/2, replace = TRUE))title("Rainbow colouring")plot(bird.orders, type = "c", use.edge.length = FALSE, edge.color = sample(X, length(bird.orders$edge)/2, replace = TRUE), edge.width = rep(5, length(bird.orders$edge)/2))segments(rep(0, 6), 6.5:1.5, rep(2, 6), 6.5:1.5, lwd = 5, col = X)text(rep(2.5, 6), 6.5:1.5, paste(X, "..."), adj = 0)title("Character mapping...")plot(bird.orders, "u", font = 1, cex = 0.75)data(bird.families)plot(bird.families, "u", lab4ut = "axial", font = 1, cex = 0.5)plot(bird.families, "r", font = 1, cex = 0.5)### cladogram with oblique tip labelsplot(bird.orders, "c", FALSE, direction = "u", srt = -40, x.lim = 25.5)### facing trees with different informations...tr <- bird.orderstr$tip.label <- rep("", 23)layout(matrix(1:2, 1, 2), c(5, 4))plot(bird.orders, "c", FALSE, adj = 0.5, no.margin = TRUE, label.offset = 0.8, edge.color = sample(X, length(bird.orders$edge)/2, replace = TRUE), edge.width = rep(5, length(bird.orders$edge)/2))text(7.5, 23, "Facing trees with\ndifferent informations", font = 2)plot(tr, "p", direction = "l", no.margin = TRUE, edge.width = sample(1:10, length(bird.orders$edge)/2, replace = TRUE))### Recycling of arguments gives a lot of possibilities### for tip labels:plot(bird.orders, tip.col = c(rep("red", 5), rep("blue", 18)), font = c(rep(3, 5), rep(2, 17), 1))plot(bird.orders, tip.col = c("blue", "green"), cex = 23:1/23 + .3, font = 1:3)co <- c(rep("blue", 9), rep("green", 35))plot(bird.orders, "f", edge.col = co)plot(bird.orders, edge.col = co)layout(1)## tidy treestr <- rtree(100)layout(matrix(1:2, 2))plot(tr)axis(2)plot(tr, "t")axis(2)## around 20 percent gain on the y-axisExtra Fuctions to Plot and Annotate Phylogenies
Description
These are extra functions to plot and annotate phylogenies, mostlycalling basic graphical functions inape.
Usage
plotBreakLongEdges(phy, n = 1, ...)drawSupportOnEdges(value, ...)Arguments
phy | an object of class |
n | the numner of long branches to be broken. |
value | the values to be printed on the internal branches of the tree. |
... | further arguments to be passed to |
Details
drawSupportOnEdges assumes the tree is unrooted, so the vectorvalue should have as many values than the number of internalbranches (= number of nodes - 1). If there is one additional value, itis assumed that it relates to the root node and is dropped (see examples).
Value
NULL
Author(s)
Emmanuel Paradis
See Also
plot.phylo,edgelabels,boot.phylo,plotTreeTime
Examples
tr <- rtree(10)tr$edge.length[c(1, 18)] <- 100op <- par(mfcol = 1:2)plot(tr); axisPhylo()plotBreakLongEdges(tr, 2); axisPhylo()## from ?boot.phylo:f <- function(x) nj(dist.dna(x))data(woodmouse)tw <- f(woodmouse) # NJ tree with K80 distanceset.seed(1)## bootstrap with 100 replications:(bp <- boot.phylo(tw, woodmouse, f, quiet = TRUE))## the first value relates to the root node and is always 100## it is ignored below:plot(tw, "u")drawSupportOnEdges(bp)## more readable but the tree is really unrooted:plot(tw)drawSupportOnEdges(bp)par(op)Plot Variance Components
Description
Plot previously estimated variance components.
Usage
## S3 method for class 'varcomp'plot(x, xlab = "Levels", ylab = "Variance", type = "b", ...)Arguments
x | Avarcomp object |
xlab | x axis label |
ylab | y axis label |
type | plot type ("l", "p" or "b", see |
... | Further argument sent to the |
Value
The same asxyplot.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
See Also
Plot Tree With Time Axis
Description
This function plots a non-ultrametric tree where the tips are notcontemporary together with their dates on the x-axis.
Usage
plotTreeTime(phy, tip.dates, show.tip.label = FALSE, y.lim = NULL, color = TRUE, ...)Arguments
phy | an object of class |
tip.dates | a vector of the same length than the number of tipsin |
show.tip.label | a logical value; see |
y.lim | by default, one fifth of the plot is left below the tree;use this option to change this behaviour. |
color | a logical value specifying whether to use colors for thelines linking the tips to the time axis. If |
... | other arguments to be passed to |
Details
The vectortip.dates may be numeric or of class“Date”. In either case, the time axis is setaccordingly. The length of this vector must be equal to the number oftips of the tree: the dates are matched to the tips numbers. Missingvalues are allowed.
Value
NULL
Author(s)
Emmanuel Paradis
See Also
Examples
dates <- as.Date(.leap.seconds)tr <- rtree(length(dates))plotTreeTime(tr, dates)## handling NA's:dates[11:26] <- NAplotTreeTime(tr, dates)## dates can be on an arbitrary scale, e.g., [-1, 1]:plotTreeTime(tr, runif(Ntip(tr), -1, 1))Compact Display of a Phylogeny
Description
These functions prints a compact summary of a phylogeny, or a list ofphylogenies, on the console.
Usage
## S3 method for class 'phylo'print(x, printlen = 6 ,...)## S3 method for class 'multiPhylo'print(x, details = FALSE ,...)## S3 method for class 'multiPhylo'str(object, ...)Arguments
x | an object of class |
object | an object of class |
printlen | the number of labels to print (6 by default). |
details | a logical indicating whether to print information onall trees. |
... | further arguments passed to or from other methods. |
Value
NULL.
Author(s)
Ben Bolker and Emmanuel Paradis
See Also
read.tree,summary.phylo,print for the genericR function
Examples
x <- rtree(10)print(x)print(x, printlen = 10)x <- rmtree(2, 10)print(x)print(x, TRUE)str(x)Random DNA Sequences
Description
This function generates random sets of DNA sequences.
Usage
rDNAbin(n, nrow, ncol, base.freq = rep(0.25, 4), prefix = "Ind_")Arguments
n | a vector of integers giving the lengths of the sequences. Canbe missing in which case |
nrow,ncol | two single integer values giving the number ofsequences and the number of sites, respectively (ignored if |
base.freq | the base frequencies. |
prefix | the prefix used to give labels to the sequences; bydefault these are Ind_1, ... Ind_n (or Ind_nrow). |
Details
Ifn is used, this function generates a list with sequence lengths given by the values inn. Ifn is missing, a matrix isgenerated.
The purpose of this function is to generate a set of sequences of aspecific size. To simulate sequences on a phylogenetic tree, seesimSeq inphangorn (very efficient), andthe packagephylosim (more for pedagogy).
Value
an object of class"DNAbin".
Note
It is not recommended to use this function to generate objects largerthan two billion bases (2 Gb).
Author(s)
Emmanuel Paradis
See Also
Examples
rDNAbin(1:10)rDNAbin(rep(10, 10))rDNAbin(nrow = 10, ncol = 10)Continuous Character Simulation
Description
This function simulates the evolution of a continuous character along aphylogeny. The calculation is done recursively from the root. SeeParadis (2012, pp. 232 and 324) for an introduction.
Usage
rTraitCont(phy, model = "BM", sigma = 0.1, alpha = 1, theta = 0, ancestor = FALSE, root.value = 0, ...)Arguments
phy | an object of class |
model | a character (either |
sigma | a numeric vector giving the standard-deviation of therandom component for each branch (can be a single value). |
alpha | if |
theta | if |
ancestor | a logical value specifying whether to return thevalues at the nodes as well (by default, only the values at the tipsare returned). |
root.value | a numeric giving the value at the root. |
... | further arguments passed to |
Details
There are three possibilities to specifymodel:
"BM": a Browian motion model is used. If the argumentssigmahas more than one value, its length must be equal to thethe branches of the tree. This allows to specify a model with variablerates of evolution. You must be careful that branch numbering is donewith the tree in “postorder” order: to see the order of the branchesyou can use:tr <- reorder(tr, "po"); plor(tr); edgelabels().The argumentsalphaandthetaare ignored."OU": an Ornstein-Uhlenbeck model is used. The aboveindexing rule is used for the three parameterssigma,alpha, andtheta. This may be interesting for the lastone to model varying phenotypic optima. The exact updating formulafrom Gillespie (1996) are used which are reduced to BM formula ifalpha = 0.A function: it must be of the form
foo(x, l)wherexis the trait of the ancestor andlis the branchlength. It must return the value of the descendant. The argumentssigma,alpha, andthetaare ignored.
Value
A numeric vector with names taken from the tip labels ofphy. Ifancestor = TRUE, the node labels are used ifpresent, otherwise, “Node1”, “Node2”, etc.
Author(s)
Emmanuel Paradis
References
Gillespie, D. T. (1996) Exact numerical simulation of theOrnstein-Uhlenbeck process and its integral.Physical Review E,54, 2084–2091.
Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.
See Also
Examples
data(bird.orders)rTraitCont(bird.orders) # BM with sigma = 0.1### OU model with two optima:tr <- reorder(bird.orders, "postorder")plot(tr)edgelabels()theta <- rep(0, Nedge(tr))theta[c(1:4, 15:16, 23:24)] <- 2## sensitive to 'alpha' and 'sigma':rTraitCont(tr, "OU", theta = theta, alpha=.1, sigma=.01)### an imaginary model with stasis 0.5 time unit after a node, then### BM evolution with sigma = 0.1:foo <- function(x, l) { if (l <= 0.5) return(x) x + (l - 0.5)*rnorm(1, 0, 0.1)}tr <- rcoal(20, br = runif)rTraitCont(tr, foo, ancestor = TRUE)### a cumulative Poisson process:bar <- function(x, l) x + rpois(1, l)(x <- rTraitCont(tr, bar, ancestor = TRUE))plot(tr, show.tip.label = FALSE)Y <- x[1:20]A <- x[-(1:20)]nodelabels(A)tiplabels(Y)Discrete Character Simulation
Description
This function simulates the evolution of a discrete character along aphylogeny. Ifmodel is a character or a matrix, evolution issimulated with a Markovian model; the transition probabilities arecalculated for each branch withP = e^{Qt} whereQ is therate matrix given bymodel andt is the branch length.The calculation is done recursively from the root. See Paradis (2006,p. 101) for a general introduction applied to evolution.
Usage
rTraitDisc(phy, model = "ER", k = if (is.matrix(model)) ncol(model) else 2, rate = 0.1, states = LETTERS[1:k], freq = rep(1/k, k), ancestor = FALSE, root.value = 1, ...)Arguments
phy | an object of class |
model | a character, a square numeric matrix, or a functionspecifying the model (see details). |
k | the number of states of the character. |
rate | the rate of change used if |
states | the labels used for the states; by default “A”, “B”,... |
freq | a numeric vector giving the equilibrium relativefrequencies of each state; by default the frequencies are equal. |
ancestor | a logical value specifying whether to return thevalues at the nodes as well (by default, only the values at the tipsare returned). |
root.value | an integer giving the value at the root (by default,it's the first state). To have a random value, use |
... | further arguments passed to |
Details
There are three possibilities to specifymodel:
A matrix: it must be a numeric square matrix; the diagonal isalways ignored. The arguments
kandrateare ignored.A character: these are the same short-cuts than in the function
ace:"ER"is an equal-rates model,"ARD"is an all-rates-different model, and"SYM"is a symmetricalmodel. Note that the argumentratemust be of the appropriatelength, i.e., 1,k(k - 1), ork(k - 1)/2for the three models,respectively. The rate matrixQis then filled column-wise.A function: it must be of the form
foo(x, l)wherexis the trait of the ancestor andlis the branchlength. It must return the value of the descendant as an integer.
Value
A factor with names taken from the tip labels ofphy. Ifancestor = TRUE, the node labels are used if present,otherwise, “Node1”, “Node2”, etc.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2006)Analyses of Phylogenetics and Evolution withR. New York: Springer.
See Also
Examples
data(bird.orders)### the two followings are the same:rTraitDisc(bird.orders)rTraitDisc(bird.orders, model = matrix(c(0, 0.1, 0.1, 0), 2))### two-state model with irreversibility:rTraitDisc(bird.orders, model = matrix(c(0, 0, 0.1, 0), 2))### simple two-state model:tr <- rcoal(n <- 40, br = runif)x <- rTraitDisc(tr, ancestor = TRUE)plot(tr, show.tip.label = FALSE)nodelabels(pch = 19, col = x[-(1:n)])tiplabels(pch = 19, col = x[1:n])### an imaginary model with stasis 0.5 time unit after a node, then### random evolution:foo <- function(x, l) { if (l < 0.5) return(x) sample(2, size = 1)}tr <- rcoal(20, br = runif)x <- rTraitDisc(tr, foo, ancestor = TRUE)plot(tr, show.tip.label = FALSE)co <- c("blue", "yellow")cot <- c("white", "black")Y <- x[1:20]A <- x[-(1:20)]nodelabels(A, bg = co[A], col = cot[A])tiplabels(Y, bg = co[Y], col = cot[Y])Multivariate Character Simulation
Description
This function simulates the evolution of a multivariate set of traitsalong a phylogeny. The calculation is done recursively from theroot.
Usage
rTraitMult(phy, model, p = 1, root.value = rep(0, p), ancestor = FALSE, asFactor = NULL, trait.labels = paste("x", 1:p, sep = ""), ...)Arguments
phy | an object of class |
model | a function specifying the model (see details). |
p | an integer giving the number of traits. |
root.value | a numeric vector giving the values at the root. |
ancestor | a logical value specifying whether to return thevalues at the nodes as well (by default, only the values at the tipsare returned). |
asFactor | the indices of the traits that are returned as factors(discrete traits). |
trait.labels | a vector of mode character giving the names of thetraits. |
... | further arguments passed to |
Details
The model is specified with anR function of the formfoo(x, l) wherex is a vector of the traits of the ancestor andl is the branch length. Other arguments may be added. Thefunction must return a vector of lengthp.
Value
A data frame withp columns whose names are given bytrait.labels and row names taken from the labels of the tree.
Author(s)
Emmanuel Paradis
See Also
Examples
## correlated evolution of 2 continuous traits:mod <- function(x, l) { y1 <- rnorm(1, x[1] + 0.5*x[2], 0.1) y2 <- rnorm(1, 0.5*x[1] + x[2], 0.1) c(y1, y2)}set.seed(11)tr <- makeNodeLabel(rcoal(20))x <- rTraitMult(tr, mod, 2, ancestor = TRUE)op <- par(mfcol = c(2, 1))plot(x, type = "n")text(x, labels = rownames(x), cex = 0.7)oq <- par(mar = c(0, 1, 0, 1), xpd = TRUE)plot(tr, font = 1, cex = 0.7)nodelabels(tr$node.label, cex = 0.7, adj = 1)par(c(op, oq))Read DNA Sequences from GenBank via Internet
Description
This function connects to the GenBank database, and reads nucleotidesequences using accession numbers given as arguments.
Usage
read.GenBank(access.nb, seq.names = access.nb, species.names = TRUE, as.character = FALSE, chunk.size = 400, quiet = TRUE, type = "DNA")Arguments
access.nb | a vector of mode character giving the accession numbers. |
seq.names | the names to give to each sequence; by default theaccession numbers are used. |
species.names | a logical indicating whether to attribute thespecies names to the returned object. |
as.character | a logical controlling whether to return thesequences as an object of class |
chunk.size | the number of sequences downloaded together (seedetails). |
quiet | a logical value indicating whether to show the progressof the downloads. If |
type | a character specifying to download "DNA" (nucleotide) or"AA" (amino acid) sequences. |
Details
The function uses the sitehttps://www.ncbi.nlm.nih.gov/ fromwhere the sequences are retrieved.
Ifspecies.names = TRUE, the returned list has an attribute"species" containing the names of the species taken from thefield “ORGANISM” in GenBank.
Sinceape 3.6, this function retrieves the sequences in FASTAformat: this is more efficient and more flexible (scaffolds andcontigs can be read) than what was done in previous versions. Theoptiongene.names has been removed inape 5.4; thisinformation is also present in the description.
Settingspecies.names = FALSE is much faster (could be usefulif you read a series of scaffolds or contigs, or if you already havethe species names).
The argumentchunk.size is set by default to 400 which islikely to work in many cases. If an error occurs such as “Cannot openfile ...” showing the list of the accession numbers, then you maytry decreasingchunk.size to 200 or 300.
Ifquiet = FALSE, the display is done chunk by chunk, so themessage “Downloading sequences: 400 / 400 ...” means that thedownload from sequence 1 to sequence 400 is under progress (it is notpossible to display a more accurate message because the downloadmethod depends on the platform).
Value
A list of DNA sequences made of vectors of class"DNAbin", orof single characters (ifas.character = TRUE) with twoattributes (species and description).
Author(s)
Emmanuel Paradis and Klaus Schliep
See Also
read.dna,write.dna,dist.dna,DNAbin
Examples
## This won't work if your computer is not connected## to the Internet## Get the 8 sequences of tanagers (Ramphocelus)## as used in Paradis (1997)ref <- c("U15717", "U15718", "U15719", "U15720", "U15721", "U15722", "U15723", "U15724")## Copy/paste or type the following commands if you## want to try them.## Not run: Rampho <- read.GenBank(ref)## get the species names:attr(Rampho, "species")## build a matrix with the species names and the accession numbers:cbind(attr(Rampho, "species"), names(Rampho))## print the first sequence## (can be done with `Rampho$U15717' as well)Rampho[[1]]## the description from each FASTA sequence:attr(Rampho, "description")## End(Not run)Read Tree File in CAIC Format
Description
This function reads one tree from a CAIC file.A second file containing branch lengths values may also be passed (experimental).
Usage
read.caic(file, brlen = NULL, skip = 0, comment.char = "#", ...)Arguments
file | a file name specified by either a variable of mode character, or a double-quoted string. |
brlen | a file name for the branch lengths file. |
skip | the number of lines of the input file to skip before beginning to read data (this is passed directly to scan()). |
comment.char | a single character, the remaining of the line after this character is ignored (this is passed directly to scan()). |
... | Further arguments to be passed to scan(). |
Details
Read a tree from a file in the format used by the CAIC and MacroCAIc program.
Value
an object of class"phylo".
Warning
The branch length support is still experimental and was not fully tested.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
References
Purvis, A. and Rambaut, A. (1995) Comparative analysis by independentcontrasts (CAIC): an Apple Macintosh application for analysingcomparative data.CABIOS,11 :241–251.
See Also
Examples
## The same example than in read.tree, without branch lengths.## An extract from Sibley and Ahlquist (1990)fl <- tempfile("tree", fileext = ".tre")cat("AAA","Strix_aluco","AAB","Asio_otus", "AB","Athene_noctua","B","Tyto_alba", file = fl, sep = "\n")tree.owls <- read.caic(fl)plot(tree.owls)tree.owlsunlink(fl) # delete the file "ex.tre"Read DNA Sequences in a File
Description
These functions read DNA sequences in a file, and returns a matrix or alist of DNA sequences with the names of the taxa read in the file asrownames or names, respectively. By default, the sequences are returnedin binary format, otherwise (ifas.character = TRUE) inlowercase.
Usage
read.dna(file, format = "interleaved", skip = 0, nlines = 0, comment.char = "#", as.character = FALSE, as.matrix = NULL)read.FASTA(file, type = "DNA")read.fastq(file, offset = -33)Arguments
file | a file name specified by either a variable of mode character,or a double-quoted string. Can also be aconnection (whichwill be opened for reading if necessary, and if so |
format | a character string specifying the format of the DNAsequences. Four choices are possible: |
skip | the number of lines of the input file to skip beforebeginning to read data (ignored for FASTA files; see below). |
nlines | the number of lines to be read (by default the file isread untill its end; ignored for FASTA files)). |
comment.char | a single character, the remaining of the lineafter this character is ignored (ignored for FASTA files). |
as.character | a logical controlling whether to return thesequences as an object of class |
as.matrix | (used if |
type | a character string giving the type of the sequences: one of |
offset | the value to be added to the quality scores (the defaultapplies to the Sanger format and should work for most recent FASTQfiles). |
Details
read.dna follows the interleaved and sequential formats definedin PHYLIP (Felsenstein, 1993) but with the original feature than thereis no restriction on the lengths of the taxa names. For these twoformats, the first line of the file must contain the dimensions of thedata (the numbers of taxa and the numbers of nucleotides); thesequences are considered as aligned and thus must be of the samelengths for all taxa. For the FASTA and FASTQ formats, the conventionsdefined in the references are followed; the sequences are taken asnon-aligned. For all formats, the nucleotides can be arranged in anyway with blanks and line-breaks inside (with the restriction that thefirst ten nucleotides must be contiguous for the interleaved andsequential formats, see below). The names of the sequences are read inthe file. Particularities for each format are detailed below.
Interleaved: the function starts to read the sequences after itfinds one or more spaces (or tabulations). All characters before thesequences are taken as the taxa names after removing the leading andtrailing spaces (so spaces in taxa names are not allowed). It isassumed that the taxa names are not repeated in the subsequentblocks of nucleotides.
Sequential: the same criterion than for the interleaved formatis used to start reading the sequences and the taxa names; thesequences are then read until the number of nucleotides specified inthe first line of the file is reached. This is repeated for each taxa.
Clustal: this is the format output by the Clustal programs(.aln). It is close to the interleaved format: the differences arethat the dimensions of the data are not indicated in the file, andthe names of the sequences are repeated in each block.
FASTA: this looks like the sequential format but the taxa names(or a description of the sequence) are on separate lines beginningwith a ‘greater than’ character ‘>’ (there may be leading spacesbefore this character). These lines are taken as taxa names afterremoving the ‘>’ and the possible leading and trailing spaces. Allthe data in the file before the first sequence are ignored.
The FASTQ format is explained in the references.
Compressed files must be read through connections (see examples).read.fastq can read compressed files directly (seeexamples).
Value
a matrix or a list (ifformat = "fasta") of DNA sequencesstored in binary format, or of mode character (ifas.character = "TRUE").
read.FASTA always returns a list of class"DNAbin" or"AAbin".
read.fastq returns a list of class"DNAbin" with anatrribute"QUAL" (see examples).
Author(s)
Emmanuel Paradis and RJ Ewing
References
Anonymous. FASTA format.https://en.wikipedia.org/wiki/FASTA_format
Anonymous. FASTQ format.https://en.wikipedia.org/wiki/FASTQ_format
Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version3.5c. Department of Genetics, University of Washington.http://evolution.genetics.washington.edu/phylip/phylip.html
See Also
read.GenBank,write.dna,DNAbin,dist.dna,woodmouse
Examples
## 1. Simple text filesTEXTfile <- tempfile("exdna", fileext = ".txt")## 1a. Extract from data(woodmouse) in sequential format:cat("3 40","No305 NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT","No304 ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT","No306 ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",file = TEXTfile, sep = "\n")ex.dna <- read.dna(TEXTfile, format = "sequential")str(ex.dna)ex.dna## 1b. The same data in interleaved format, ...cat("3 40","No305 NTTCGAAAAA CACACCCACT","No304 ATTCGAAAAA CACACCCACT","No306 ATTCGAAAAA CACACCCACT"," ACTAAAANTT ATCAGTCACT"," ACTAAAAATT ATCAACCACT"," ACTAAAAATT ATCAATCACT",file = TEXTfile, sep = "\n")ex.dna2 <- read.dna(TEXTfile)## 1c. ... in clustal format, ...cat("CLUSTAL (ape) multiple sequence alignment", "","No305 NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT","No304 ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT","No306 ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT"," ************************** ****** ****",file = TEXTfile, sep = "\n")ex.dna3 <- read.dna(TEXTfile, format = "clustal")## 1d. ... and in FASTA formatFASTAfile <- tempfile("exdna", fileext = ".fas")cat(">No305","NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",">No304","ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",">No306","ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",file = FASTAfile, sep = "\n")ex.dna4 <- read.dna(FASTAfile, format = "fasta")## The 4 data objects are the same:identical(ex.dna, ex.dna2)identical(ex.dna, ex.dna3)identical(ex.dna, ex.dna4)## 2. How to read GZ compressed files## create a GZ file and open a connection:GZfile <- tempfile("exdna", fileext = ".fas.gz")con <- gzfile(GZfile, "wt")## write the data using the connection:cat(">No305", "NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT", ">No304", "ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT", ">No306", "ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT", file = con, sep = "\n")close(con) # close the connection## read the GZ'ed file:ex.dna5 <- read.dna(gzfile(GZfile), "fasta")## This example is with a FASTA file but this works as well## with the other formats described above.## All 5 data objects are identical:identical(ex.dna, ex.dna5)unlink(c(TEXTfile, FASTAfile, GZfile)) # clean-up## Not run: ## 3. How to read files from a ZIP archive## NOTE: since ape 5.7-1, all files in these examples are written## in the temporary directory, thus the following commands work## best when run in the user's working directory.## write the woodmouse data in a FASTA file:data(woodmouse)write.dna(woodmouse, "woodmouse.fas", "fasta")## archive a FASTA file in a ZIP file:zip("myarchive.zip", "woodmouse.fas")## Note: the file myarchive.zip is created if necessary## Read the FASTA file from the ZIP archive without extraction:wood2 <- read.dna(unz("myarchive.zip", "woodmouse.fas"), "fasta")## Alternatively, unzip the archive:fl <- unzip("myarchive.zip")## the previous command eventually creates locally## the fullpath archived with 'woodmouse.fas'wood3 <- read.dna(fl, "fasta")identical(woodmouse, wood2)identical(woodmouse, wood3)## End(Not run)## read a FASTQ file from 1000 Genomes:## Not run: a <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/"file <- "SRR062641.filt.fastq.gz"URL <- paste0(a, file)download.file(URL, file)## If the above command doesn't work, you may copy/paste URL in## a Web browser instead.X <- read.fastq(file)X # 109,811 sequences## get the qualities of the first sequence:(qual1 <- attr(X, "QUAL")[[1]])## the corresponding probabilities:10^(-qual1/10)## get the mean quality for each sequence:mean.qual <- sapply(attr(X, "Q"), mean)## can do the same for var, sd, ...## End(Not run)Read GFF Files
Description
This function reads a file in general feature format version 3 (GFF3)and returns a data frame.
Usage
read.gff(file, na.strings = c(".", "?"), GFF3 = TRUE)Arguments
file | a file name specified by a character string. |
na.strings | the strings in the GFF file that will be convertedas NA's (missing values). |
GFF3 | a logical value specifying whether if the file isformatted according to version 3 of GFF. |
Details
The returned data frame has its (column) names correctly set (seeReferences) and the categorical variables (seqid, source, type,strand, and phase) set as factors.
This function should be more efficient than usingread.delim.
GFF2 (aka GTF) files can also be read: useGFF3 = FALSE to havethe correct field names. Note that GFF2 files and GFF3 files have thesame structure, although some fields are slightly different (seereference).
The file can be gz-compressed (see examples), but not zipped.
Value
NULL
Author(s)
Emmanuel Paradis
References
https://en.wikipedia.org/wiki/General_feature_format
Examples
## Not run: ## requires to be connected on Internetd <- "https://ftp.ensembl.org/pub/release-86/gff3/homo_sapiens/"f <- "Homo_sapiens.GRCh38.86.chromosome.MT.gff3.gz"download.file(paste0(d, f), "mt_gff3.gz")## If the above command doesn't work, you may copy/paste the full URL in## a Web browser instead.gff.mito <- read.gff("mt_gff3.gz")## the lengths of the sequence features:gff.mito$end - (gff.mito$start - 1)table(gff.mito$type)## where the exons start:gff.mito$start[gff.mito$type == "exon"]## End(Not run)Read Tree File in Nexus Format
Description
This function reads one or several trees in a NEXUS file.
Usage
read.nexus(file, tree.names = NULL, force.multi = FALSE)Arguments
file | a file name specified by either a variable of mode character,or a double-quoted string. |
tree.names | if there are several trees to be read, a vector ofmode character giving names to the individual trees (by default,this uses the labels in the NEXUS file if these are present). |
force.multi | a logical value; if |
Details
The present implementation tries to follow as much as possible theNEXUS standard (but see the restriction below on TRANSLATIONtables). Only the block “TREES” is read; the other data can be readwith other functions (e.g.,read.dna,read.table, ...).
If a TRANSLATION table is present it is assumed that only the tiplabels are translated and they are all translated with integerswithout gap. Consequently, if nodes have labels in the tree(s) theyare read as they are and not looked for in the translation table. Thelogic behind this is that in the vast majority of cases, node labelswill be support values rather than proper taxa names. This isconsistent withwrite.nexus which translates only thetip labels.
Usingforce.multi = TRUE when the file contains a single treemakes possible to keep the tree name (as names of the list).
‘read.nexus’ tries to represent correctly trees with a badlyrepresented root edge (i.e. with an extra pair of parentheses). Forinstance, the tree "((A:1,B:1):10);" will be read like "(A:1,B:1):10;"but a warning message will be issued in the former case as this isapparently not a valid Newick format. If there are two root edges(e.g., "(((A:1,B:1):10):10);"), then the tree is not read and an errormessage is issued.
Value
an object of class"phylo" or"multiPhylo".
Author(s)
Emmanuel Paradis
References
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.
See Also
read.tree,write.nexus,write.tree,read.nexus.data,write.nexus.data
Read Character Data In NEXUS Format
Description
read.nexus.data reads a file with sequences in the NEXUSformat.nexus2DNAbin is a helper function to convert the outputfrom the previous function into the class"DNAbin".
For the moment, only sequence data (DNA or protein) are supported.
Usage
read.nexus.data(file)nexus2DNAbin(x)Arguments
file | a file name specified by either a variable of modecharacter, or a double-quoted string. |
x | an object output by |
Details
This parser tries to read data from a file written in arestricted NEXUS format (see examples below).
Please see files ‘data.nex’ and ‘taxacharacters.nex’ forexamples of formats that will work.
Some noticeable exceptions from the NEXUS standard (non-exhaustivelist):
I: Comments must be either on separate lines or at theend of lines. Examples:
[Comment]— OKTaxon ACGTACG [Comment]— OK[Comment line 1Comment line 2]— NOT OK!Tax[Comment]on ACG[Comment]T— NOT OK!II: No spaces (or comments) are allowed in thesequences. Examples:
name ACGT— OKname AC GT— NOT OK!III: No spaces are allowed in taxon names, not even ifnames are in single quotes. That is, single-quoted names are nottreated as such by the parser. Examples:
Genus_species— OK'Genus_species'— OK'Genus species'— NOT OK!IV: The trailing
endthat closes thematrixmust be on a separate line. Examples:taxon AACCGGTend;— OKtaxon AACCGGT;end;— OKtaxon AACCCGT; end;— NOT OK!V: Multistate characters are not allowed. That is,NEXUS allows you to specify multiple character states at acharacter position either as an uncertainty,
(XY), or as anactual appearance of multiple states,{XY}. This isinformation is not handled by the parser. Examples:taxon 0011?110— OKtaxon 0011{01}110— NOT OK!taxon 0011(01)110— NOT OK!VI: The number of taxa must be on the same line as
ntax. The same applies tonchar. Examples:ntax = 12— OKntax =12— NOT OK!VII: The word “matrix” can not occur anywhere inthe file before the actual
matrixcommand, unless it is ina comment. Examples:BEGIN CHARACTERS;TITLE 'Data in file "03a-cytochromeB.nex"';DIMENSIONS NCHAR=382;FORMAT DATATYPE=Protein GAP=- MISSING=?;["This is The Matrix"]— OKMATRIXBEGIN CHARACTERS;TITLE 'Matrix in file "03a-cytochromeB.nex"';— NOT OK!DIMENSIONS NCHAR=382;FORMAT DATATYPE=Protein GAP=- MISSING=?;MATRIX
Value
A list of sequences each made of a single vector of mode characterwhere each element is a (phylogenetic) character state.
Author(s)
Johan Nylander, Thomas Guillerme, and Klaus Schliep
References
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.
See Also
read.nexus,write.nexus,write.nexus.data
Examples
## Use read.nexus.data to read a file in NEXUS format into object x## Not run: x <- read.nexus.data("file.nex")Read Tree File in Parenthetic Format
Description
This function reads a file which contains one or several trees inparenthetic format known as the Newick or New Hampshire format.
Usage
read.tree(file = "", text = NULL, tree.names = NULL, skip = 0, comment.char = "", keep.multi = FALSE, ...)Arguments
file | a file name specified by either a variable of mode character,or a double-quoted string; if |
text | alternatively, the name of a variable of mode characterwhich contains the tree(s) in parenthetic format. By default, thisis ignored (set to |
tree.names | if there are several trees to be read, a vector ofmode character that gives names to the individual trees; if |
skip | the number of lines of the input file to skip beforebeginning to read data (this is passed directly to |
comment.char | a single character, the remaining of the lineafter this character is ignored (this is passed directly to |
keep.multi | if |
... | further arguments to be passed to |
Details
The default option forfile allows to type directly the tree onthe keyboard (or possibly to copy from an editor and paste in R'sconsole) with, e.g.,mytree <- read.tree().
‘read.tree’ tries to represent correctly trees with a badlyrepresented root edge (i.e. with an extra pair of parentheses). Forinstance, the tree "((A:1,B:1):10);" will be read like "(A:1,B:1):10;"but a warning message will be issued in the former case as this isapparently not a valid Newick format. If there are two root edges(e.g., "(((A:1,B:1):10):10);"), then the tree is not read and an errormessage is issued.
If there are any characters preceding the first "(" in a line thenthis is assigned to the name. This is returned when a "multiPhylo"object is returned andtree.names = NULL.
Untilape 4.1, the default ofcomment.char was"#"(as inscan). This has been changed so that extended Newickfiles can be read.
Value
an object of class"phylo" with the following components:
edge | a two-column matrix of mode numeric where each rowrepresents an edge of the tree; the nodes and the tips aresymbolized with numbers; the tips are numbered 1, 2, ..., and thenodes are numbered after the tips. For each row, the first columngives the ancestor. |
edge.length | (optional) a numeric vector giving the lengths of thebranches given by |
tip.label | a vector of mode character giving the names of thetips; the order of the names in this vector corresponds to the(positive) number in |
Nnode | the number of (internal) nodes. |
node.label | (optional) a vector of mode character giving thenames of the nodes. |
root.edge | (optional) a numeric value giving the length of thebranch at the root if it exists. |
If several trees are read in the file, the returned object is of class"multiPhylo", and is a list of objects of class"phylo".The name of each tree can be specified bytree.names, or can beread from the file (see details).
Author(s)
Emmanuel Paradis and Daniel Lawsondan.lawson@bristol.ac.uk
References
Felsenstein, J. The Newick tree format.http://evolution.genetics.washington.edu/phylip/newicktree.html
Olsen, G. Interpretation of the "Newick's 8:45" tree format standard.http://evolution.genetics.washington.edu/phylip/newick_doc.html
Paradis, E. (2020) Definition of Formats for Coding Phylogenetic Treesin R.https://emmanuelparadis.github.io/misc/FormatTreeR.pdf
Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.
See Also
write.tree,read.nexus,write.nexus,scan for the basic Rfunction to read data in a file
Examples
### An extract from Sibley and Ahlquist (1990)s <- "owls(((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3):6.3,Tyto_alba:13.5);"treefile <- tempfile("tree", fileext = ".tre")cat(s, file = treefile, sep = "\n")tree.owls <- read.tree(treefile)str(tree.owls)tree.owlstree.owls <- read.tree(treefile, keep.multi = TRUE)tree.owlsnames(tree.owls)unlink(treefile) # clean-up### Only the first three species using the option `text'TREE <- "((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3);"TREEtree.owls.bis <- read.tree(text = TREE)str(tree.owls.bis)tree.owls.bis## tree with singleton nodes:ts <- read.tree(text = "((((a))),d);")plot(ts, node.depth = 2) # the default will overlap the singleton node with the tipnodelabels()## 'skeleton' tree with a singleton node:tx <- read.tree(text = "(((,)),);")plot(tx, node.depth = 2)nodelabels()## a tree with single quoted labels (the 2nd label is not quoted## because it has no white spaces):z <- "(('a: France, Spain (Europe)',b),'c: Australia [Outgroup]');"tz <- read.tree(text = z)plot(tz, font = 1)Continuous Ancestral Character Estimation
Description
This function estimates ancestral character states, and the associateduncertainty, for continuous characters. It mainly works as the acefunction, from which it differs, first, in the fact that computationsare not performed by numerical optimisation but through matrixcalculus. Second, besides classical Brownian-based reconstructionmethods, it reconstructs ancestral states under Arithmetic BrownianMotion (ABM, i.e. Brownian with linear trend) and Ornstein-Uhlenbeckprocess (OU, i.e. Brownian with an attractive optimum).
Usage
reconstruct(x, phyInit, method = "ML", alpha = NULL, low_alpha = 0.0001, up_alpha = 1, CI = TRUE)Arguments
x | a numerical vector. |
phyInit | an object of class |
method | a character specifying the method used forestimation. Six choices are possible: |
alpha | a numerical value which accounts for the attractive strength parameter of |
low_alpha | a lower bound for alpha, used only with methods |
up_alpha | an upper bound for alpha, used only with methods |
CI | a logical specifying whether to return the 95% confidenceintervals of the ancestral state estimates. |
Details
For"ML","REML" and"GLS", the default model is Brownian motion. This modelcan be fitted by maximumlikelihood (method = "ML", Felsenstein 1973, Schluter et al. 1997) - the default, residual maximum likelihood (method = "REML"), or generalized leastsquares (method = "GLS", Martins and Hansen 1997, Garland T and Ives AR 2000)."GLS_ABM" is based on Brownian motion with trend model. Both"GLS_OU" and"GLS_OUS" are based on Ornstein-Uhlenbeck model."GLS_OU" and"GLS_OUS" differs in the fact that"GLS_OUS" assume that the process starts from the optimum, while the root state has to be estimated for"GLS_OU", which may rise some issues (see Royer-Carenzi and Didier, 2016). Users may provide the attractive strength parameteralpha, for these two models."GLS_ABM","GLS_OU" and"GLS_OUS" are all fitted by generalized least squares (Royer-Carenzi and Didier, 2016).
Value
an object of class"ace" with the following elements:
ace | the estimates of theancestral character values. |
CI95 | the estimated 95%confidence intervals. |
sigma2 | if |
loglik | if |
Note
GLS_ABM should not be used on ultrametric tree.
GLS_OU may lead to aberrant reconstructions.
Author(s)
Manuela Royer-Carenzi, Gilles Didier
References
Felsenstein, J. (1973) Maximum likelihood estimation of evolutionarytrees from continuous characters.American Journal of HumanGenetics,25, 471–492.
Garland T. and Ives A.R. (2000) Using the past to predict the present:confidence intervals for regression equations in phylogeneticcomparative methods.American Naturalist,155,346–364.
Martins, E. P. and Hansen, T. F. (1997) Phylogenies and thecomparative method: a general approach to incorporating phylogeneticinformation into the analysis of interspecific data.AmericanNaturalist,149, 646–667.
Royer-Carenzi, M. and Didier, G. (2016) A comparison of ancestralstate reconstruction methods for quantitativecharacters.Journal of Theoretical Biology,404,126–142.
Schluter, D., Price, T., Mooers, A. O. and Ludwig, D. (1997)Likelihood of ancestor states in adaptive radiation.Evolution,51, 1699–1711.
Yang, Z. (2006)Computational Molecular Evolution. Oxford:Oxford University Press.
See Also
Reconstruction of ancestral sequences can be done with the packagephangorn (see function?ancestral.pml).
Examples
### Some random data...data(bird.orders)x <- rnorm(23, m=100)### Reconstruct ancestral quantitative characters:reconstruct(x, bird.orders)reconstruct(x, bird.orders, method = "GLS_OUS", alpha=NULL)Internal Reordering of Trees
Description
reorder changes the internal structure of a phylogeny stored asan object of class"phylo". The tree returned is the same thanthe one input, but the ordering of the edges could be different.
cladewise andpostorder are convenience functions toreturn only the indices of the reordered edge matrices (see examples).
Usage
## S3 method for class 'phylo'reorder(x, order = "cladewise", index.only = FALSE, ...)## S3 method for class 'multiPhylo'reorder(x, order = "cladewise", ...)cladewise(x)postorder(x)Arguments
x | an object of class |
order | a character string: either |
index.only | should the function return only the ordered indicesof the rows of the edge matrix? |
... | further arguments passed to or from other methods. |
Details
Because in a tree coded as an object of class"phylo" eachbranch is represented by a row in the element ‘edge’, there is anarbitrary choice for the ordering of these rows.reorder allowsto reorder these rows according to three rules: in the"cladewise" order each clade is formed by a series ofcontiguous rows. In the"postorder" order, the rows arearranged so that computations following pruning-like algorithm thetree (or postorder tree traversal) can be done by descending alongthese rows (conversely, a preorder tree traversal can be performed bymoving from the last to the first row). The"pruningwise" orderis an alternative “pruning” order which is actually a bottom-uptraversal order (Valiente 2002). (This third choice might be removedin the future as it merely duplicates the second one which is moreefficient.) The possible multichotomies and branch lengths are preserved.
Note that for a given order, there are several possible orderings ofthe rows of ‘edge’.
Value
an object of class"phylo" (with the attribute"order"set accordingly), or a numeric vector ifindex.only = TRUE; ifx is of class"multiPhylo", then an object of the sameclass.
Author(s)
Emmanuel Paradis
References
Valiente, G. (2002)Algorithms on Trees and Graphs. New York:Springer.
See Also
read.tree to read tree files in Newick format,reorder for the generic function
Examples
data(bird.families)tr <- reorder(bird.families, "postorder")all.equal(bird.families, tr) # uses all.equal.phylo actuallyall.equal.list(bird.families, tr) # bypasses the generic## get the number of descendants for each tip or node:nr_desc <- function(x) { res <- numeric(max(x$edge)) res[1:Ntip(x)] <- 1L for (i in postorder(x)) { tmp <- x$edge[i,1] res[tmp] <- res[tmp] + res[x$edge[i, 2]] } res}## apply it to a random tree:tree <- rtree(10)plot(tree, show.tip.label = FALSE)tiplabels()nodelabels()nr_desc(tree)Test of Diversification-Shift With the Yule Process
Description
This function performs a test of shift in diversification rate usingprobabilities from the Yule process.
Usage
richness.yule.test(x, t)Arguments
x | a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease or decrease diversification rate, and the second one the number ofspecies in the sister-clades without the trait. Eachrow represents a pair of sister-clades. |
t | a numeric vector giving the divergence times of each pair ofclades in |
Value
a data frame with the\chi^2, the number of degrees offreedom (= 1), and theP-value.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.
See Also
slowinskiguyer.test,mcconwaysims.test,diversity.contrast.test
Examples
### see example(mcconwaysims.test)Tree Simulation Under the Time-Dependent Birth–Death Models
Description
These three functions simulate phylogenies under any time-dependentbirth–death model:rlineage generates a complete tree includingthe species going extinct before present;rbdtree generates atree with only the species living at present (thus the tree isultrametric);rphylo generates a tree with a fixed number ofspecies at present time.drop.fossil is a utility function toremove the extinct species.
Usage
rlineage(birth, death, Tmax = 50, BIRTH = NULL, DEATH = NULL, eps = 1e-6)rbdtree(birth, death, Tmax = 50, BIRTH = NULL, DEATH = NULL, eps = 1e-6)rphylo(n, birth, death, BIRTH = NULL, DEATH = NULL, T0 = 50, fossils = FALSE, eps = 1e-06)drop.fossil(phy, tol = 1e-8)Arguments
birth,death | a numeric value or a (vectorized) functionspecifying how speciation and extinction rates vary through time. |
Tmax | a numeric value giving the length of the simulation. |
BIRTH,DEATH | a (vectorized) function which is the primitiveof |
eps | a numeric value giving the time resolution of thesimulation; this may be increased (e.g., 0.001) to shortencomputation times. |
n | the number of species living at present time. |
T0 | the time at present (for the backward-in-time algorithm). |
fossils | a logical value specifying whether to output thelineages going extinct. |
phy | an object of class |
tol | a numeric value giving the tolerance to consider a speciesas extinct. |
Details
These three functions use continuous-time algorithms:rlineageandrbdtree use the forward-in-time algorithms described inParadis (2011), whereasrphylo uses a backward-in-timealgorithm from Stadler (2011). The models are time-dependentbirth–death models as described in Kendall (1948). Speciation(birth) and extinction (death) rates may be constant or vary throughtime according to anR function specified by the user. In the lattercase,BIRTH and/orDEATH may be used if the primitivesofbirth anddeath are known. In these functions time isthe formal argument and must be namedt.
Note thatrphylo simulates trees in a way similar to whatthe packageTreeSim does, the difference is in theparameterization of the time-dependent models which is here the samethan used in the two other functions. In this parameterization scheme,time is measured from past to present (see details in Paradis 2015which includes a comparison of these algorithms).
The difference betweenrphylo andrphylo(... fossils = TRUE) is the same than betweenrbdtree andrlineage.
Value
An object of class"phylo".
Author(s)
Emmanuel Paradis
References
Kendall, D. G. (1948) On the generalized “birth-and-death”process.Annals of Mathematical Statistics,19, 1–15.
Paradis, E. (2011) Time-dependent speciation and extinction fromphylogenies: a least squares approach.Evolution,65,661–672.
Paradis, E. (2015) Random phylogenies and the distribution ofbranching times.Journal of Theoretical Biology,387,39–45.
Stadler, T. (2011) Simulating trees with a fixed number of extantspecies.Systematic Biology,60, 676–684.
See Also
yule,yule.time,birthdeath,rtree,stree
Examples
set.seed(10)plot(rlineage(0.1, 0)) # Yule process with lambda = 0.1plot(rlineage(0.1, 0.05)) # simple birth-death processb <- function(t) 1/(1 + exp(0.2*t - 1)) # logisticlayout(matrix(0:3, 2, byrow = TRUE))curve(b, 0, 50, xlab = "Time", ylab = "")mu <- 0.07segments(0, mu, 50, mu, lty = 2)legend("topright", c(expression(lambda), expression(mu)), lty = 1:2, bty = "n")plot(rlineage(b, mu), show.tip.label = FALSE)title("Simulated with 'rlineage'")plot(rbdtree(b, mu), show.tip.label = FALSE)title("Simulated with 'rbdtree'")Roots Phylogenetic Trees
Description
root reroots a phylogenetic tree with respect to the specifiedoutgroup or at the node specified innode.
unroot unroots a phylogenetic tree, or returns it unchanged ifit is already unrooted.
is.rooted tests whether a tree is rooted.
Usage
root(phy, ...)## S3 method for class 'phylo'root(phy, outgroup, node = NULL, resolve.root = FALSE, interactive = FALSE, edgelabel = FALSE, ...)## S3 method for class 'multiPhylo'root(phy, outgroup, ...)unroot(phy, ...)## S3 method for class 'phylo'unroot(phy, collapse.singles = FALSE, keep.root.edge = FALSE, ...)## S3 method for class 'multiPhylo'unroot(phy, collapse.singles = FALSE, keep.root.edge = FALSE, ...)is.rooted(phy)## S3 method for class 'phylo'is.rooted(phy)## S3 method for class 'multiPhylo'is.rooted(phy)Arguments
phy | an object of class |
outgroup | a vector of mode numeric or character specifying thenew outgroup. |
node | alternatively, a node number where to root the tree. |
resolve.root | a logical specifying whether to resolve the newroot as a bifurcating node. |
interactive | if |
edgelabel | a logical value specifying whether to treat nodelabels as edge labels and thus eventually switching them so thatthey are associated with the correct edges when using |
collapse.singles | a logical value specifying wether to call |
keep.root.edge | a logical value. If |
... | arguments passed among methods (e.g., when rooting listsof trees). |
Details
The argumentoutgroup can be either character or numeric. Inthe first case, it gives the labels of the tips of the new outgroup;in the second case the numbers of these labels in the vectorphy$tip.label are given.
Ifoutgroup is of length one (i.e., a single value), then thetree is rerooted using the node below this tip as the new root.
Ifoutgroup is of length two or more, the most recent commonancestor (MRCA)of the ingroup is used as the new root. Notethat the tree is unrooted before being rerooted, so that ifoutgroup is already the outgroup, then the returned tree is notthe same than the original one (see examples). Ifoutgroup isnot monophyletic, the operation fails and an error message is issued.
Ifresolve.root = TRUE,root adds a zero-length branchbelow the MRCA of the ingroup.
A tree is considered rooted if either only two branches connect to theroot, or if there is aroot.edge element. In all other cases,is.rooted returnsFALSE.
Value
an object of class"phylo" or"multiPhylo" forroot andunroot; a logical vector foris.rooted.
Note
The use ofresolve.root = TRUE together withnode =gives an error if the specified node is the current root of thetree. This is because there is an ambiguity when resolving a node inan unrooted tree with no explicit outgroup. If the node is not thecurrent root, the ambiguity is solved arbitrarily by considering theclade on the right ofnode (when the tree is plotted bydefault) as the ingroup. See a detailed explanation there:
https://www.mail-archive.com/r-sig-phylo@r-project.org/msg03805.html.
Author(s)
Emmanuel Paradis
References
Czech, L., Huerta-Cepas, J. and Stamatakis, A. (2017) A criticalreview on the use of support values in tree viewers and bioinformaticstoolkits.Molecular Biology and Evolution,34,1535–1542.doi:10.1093/molbev/msx055
See Also
bind.tree,drop.tip,nodelabels,identify.phylo
Examples
data(bird.orders)plot(root(bird.orders, 1))plot(root(bird.orders, 1:5))tr <- root(bird.orders, 1)is.rooted(bird.orders) # yesis.rooted(tr) # no### This is because the tree has been unrooted first before rerooting.### You can delete the outgroup...is.rooted(drop.tip(tr, "Struthioniformes"))### ... or resolve the basal trichotomy in two ways:is.rooted(multi2di(tr))is.rooted(root(bird.orders, 1, r = TRUE))### To keep the basal trichotomy but forcing the tree as rooted:tr$root.edge <- 0is.rooted(tr)x <- setNames(rmtree(10, 10), LETTERS[1:10])is.rooted(x)Swapping Sister Clades
Description
For a given node,rotate exchanges the position of two cladesdescending from this node. It can handle dichotomies as well aspolytomies. In the latter case, two clades from the polytomy areselected for swapping.
rotateConstr rotates internal branches giving a constraint onthe order of the tips.
Usage
rotate(phy, node, polytom = c(1, 2))rotateConstr(phy, constraint)Arguments
phy | an object of class |
node | a vector of mode numeric or character specifying thenumber of the node. |
polytom | a vector of mode numeric and length two specifying thetwo clades that should be exchanged in a polytomy. |
constraint | a vector of mode character specifying the order ofthe tips as they should appear when plotting the tree (from bottomto top). |
Details
phy can be either rooted or unrooted, contain polytomies and lackbranch lengths. In the presence of very short branch lengths it isconvenient to plot the phylogenetic tree without branch lengths in orderto identify the number of the node in question.
node can be any of the interior nodes of a phylogenetic treeincluding the root node. Number of the nodes can be identified by thenodelabels function. Alternatively, you can specify a vector of lengthtwo that contains either the number or the names of two tips thatcoalesce in the node of interest.
If the node subtends a polytomy, any two clades of the the polytomycan be chosen by polytom. On a plotted phylogeny, the clades arenumbered from bottom to top and polytom is used to index the twoclades one likes to swop.
Value
an object of class"phylo".
Author(s)
Christoph Heiblheibl@lmu.de, Emmanuel Paradis
See Also
plot.phylo,nodelabels,root,drop.tip
Examples
# create a random tree:tre <- rtree(25)# visualize labels of internal nodes:plot(tre, use.edge.length=FALSE)nodelabels()# rotate clades around node 30:tre.new <- rotate(tre, 30)# compare the results:par(mfrow=c(1,2)) # split graphical deviceplot(tre) # plot old treplot(tre.new) # plot new tree# visualize labels of terminal nodes:plot(tre)tiplabels()# rotate clades containing nodes 12 and 20:tre.new <- rotate(tre, c(12, 21))# compare the results:par(mfrow=c(1,2)) # split graphical deviceplot(tre) # plot old treplot(tre.new) # plot new tree# or you migth just specify tiplabel names:tre.new <- rotate(tre, c("t3", "t14"))# compare the results:par(mfrow=c(1,2)) # devide graphical deviceplot(tre) # plot old treplot(tre.new) # plot new tree# a simple example for rotateConstr:A <- read.tree(text = "((A,B),(C,D));")B <- read.tree(text = "(((D,C),B),A);")B <- rotateConstr(B, A$tip.label)plot(A); plot(B, d = "l")# something more interesting (from ?cophyloplot):tr1 <- rtree(40)## drop 20 randomly chosen tips:tr2 <- drop.tip(tr1, sample(tr1$tip.label, size = 20))## rotate the root and reorder the whole:tr2 <- rotate(tr2, 21)tr2 <- read.tree(text = write.tree(tr2))X <- cbind(tr2$tip.label, tr2$tip.label) # association matrixcophyloplot(tr1, tr2, assoc = X, space = 28)## before reordering tr2 we have to find the constraint:co <- tr2$tip.label[order(match(tr2$tip.label, tr1$tip.label))]newtr2 <- rotateConstr(tr2, co)cophyloplot(tr1, newtr2, assoc = X, space = 28)Generate Random Trees
Description
These functions generate trees by splitting randomly the edges(rtree andrtopology) or randomly clustering the tips(rcoal).rtree andrtopology generate generaltrees, andrcoal generates coalescent trees. The algorithms aredescribed in Paradis (2012) and in a vignette in this package.
Usage
rtree(n, rooted = TRUE, tip.label = NULL, br = runif, equiprob = FALSE, ...)rtopology(n, rooted = FALSE, tip.label = NULL, br = runif, ...)rcoal(n, tip.label = NULL, br = "coalescent", ...)rmtree(N, n, rooted = TRUE, tip.label = NULL, br = runif, equiprob = FALSE, ...)rmtopology(N, n, rooted = FALSE, tip.label = NULL, br = runif, ...)Arguments
n | an integer giving the number of tips in the tree. |
rooted | a logical indicating whether the tree should be rooted(the default). |
tip.label | a character vector giving the tip labels; if notspecified, the tips "t1", "t2", ..., are given. |
br | one of the following: (i) anR function used to generate thebranch lengths ( |
equiprob | (new sinceape 5.4-1) a logical specifyingwhether topologies are generated in equal frequencies. If, |
... | further argument(s) to be passed to |
N | an integer giving the number of trees to generate. |
Details
The trees generated are bifurcating. Ifrooted = FALSE in(rtree), the tree is trifurcating at its root.
The optionequiprob = TRUE generatesunlabelledtopologies in equal frequencies. This is more complicated for thelabelled topologies (see the vignette “RandomTopologies”).
The default function to generate branch lengths inrtree isrunif. If further arguments are passed tobr, they needto be tagged (e.g.,min = 0, max = 10).
rmtree calls successivelyrtree and set the class ofthe returned object appropriately.
Value
An object of class"phylo" or of class"multiPhylo" inthe case ofrmtree orrmtopology.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.
See Also
stree,rlineage, vignette“RandomTopologies”.
Examples
layout(matrix(1:9, 3, 3))### Nine random trees:for (i in 1:9) plot(rtree(20))### Nine random cladograms:for (i in 1:9) plot(rtree(20, FALSE), type = "c")### generate 4 random trees of bird orders:data(bird.orders)layout(matrix(1:4, 2, 2))for (i in 1:4) plot(rcoal(23, tip.label = bird.orders$tip.label), no.margin = TRUE)layout(1)par(mar = c(5, 4, 4, 2))Root a Tree by Root-to-Tip Regression
Description
This function roots a phylogenetic tree with dated tips in the locationmost compatible with the assumption of a strict molecular clock.
Usage
rtt(t, tip.dates, ncpu = 1, objective = correlation, opt.tol = .Machine$double.eps^0.25)Arguments
t | an object of class |
tip.dates | a vector of sampling times associated to the tips of |
ncpu | number of cores to use. |
objective | one of |
opt.tol | tolerance for optimization precision. |
Details
This function duplicates one part the functionality of the programPath-O-Gen (see references). The root position is chosen to producethe best linear regression of root-to-tip distances against samplingtimes.
t must have branch lengths in units of expected substitutionsper site.
tip.dates should be a vector of sampling times, in any timeunit, with time increasing toward the present. For example, this maybe in units of “days since study start” or “years since 10,000BCE”, but not “millions of yearsago”.
Settingncpu to a value larger than 1 requires theparallellibrary.
objective is the measure which will be used to define the“goodness” of a regression fit. It may be one of"correlation"(strongest correlation between tip date and distance from root),"rms" (lowest root-mean-squared error), or"rsquared"(highest R-squared value).
opt.tol is used to optimize the location of the root along the bestbranch. By default, R'soptimize function uses a precision of.Machine$double.eps^0.25, which is about 0.0001 on a 64-bit system.This should be set to a smaller value if the branch lengths oft arevery short.
Value
an object of class"phylo".
Note
This function only chooses the best root. It does not rescale the branchlengths to time, or perform a statistical test of the molecular clockhypothesis.
Author(s)
Rosemary McCloskeyrmccloskey@cfenet.ubc.ca,Emmanuel Paradis
References
Rambaut, A. (2009). Path-O-Gen: temporal signal investigationtool.
Rambaut, A. (2000). Estimating the rate of molecular evolution:incorporating non-contemporaneous sequences into maximum likelihoodphylogenies.Bioinformatics,16, 395-399.
Examples
t <- rtree(100)tip.date <- rnorm(t$tip.label)^2rtt(t, tip.date)Find Segregating Sites in DNA Sequences
Description
This function gives the indices of segregating (polymorphic) sites ina sample of DNA sequences.
Usage
seg.sites(x, strict = FALSE, trailingGapsAsN = TRUE)Arguments
x | a matrix or a list which contains the DNA sequences. |
strict | a logical value; if |
trailingGapsAsN | a logical value; if |
Details
If the sequences are in a list, they must all be of the same length.
Ifstrict = FALSE (the default), the following rule is used todetermine if a site is polymorphic or not in the presence of ambiguousbases: ‘A’ and ‘R’ are not interpreted as different, ‘A’ and ‘Y’ areinterpreted as different, and ‘N’ and any other base (ambiguous ornot) are interpreted as not different. Ifstrict = TRUE, allletters are considered different.
Alignment gaps are considered different from all letters except forthe leading and trailing gaps iftrailingGapsAsN = TRUE (whichis the default).
Value
A numeric (integer) vector giving the indices of the segregatingsites.
Author(s)
Emmanuel Paradis
See Also
base.freq,theta.s,nuc.div (last two inpegas)
Examples
data(woodmouse)y <- seg.sites(woodmouse)ylength(y)Skyline Plot Estimate of Effective Population Size
Description
skyline computes thegeneralized skyline plot estimate of effective population sizefrom an estimated phylogeny. The demographic history is approximated bya step-function. The number of parameters of the skyline plot (i.e. its smoothness)is controlled by a parameterepsilon.
find.skyline.epsilon searches for an optimal value of theepsilon parameter,i.e. the value that maximizes the AICc-corrected log-likelihood (logL.AICc).
Usage
skyline(x, ...)## S3 method for class 'phylo'skyline(x, ...)## S3 method for class 'coalescentIntervals'skyline(x, epsilon=0, ...)## S3 method for class 'collapsedIntervals'skyline(x, old.style=FALSE, ...)find.skyline.epsilon(ci, GRID=1000, MINEPS=1e-6, ...)Arguments
x | Either an ultrametric tree (i.e. an object of class |
epsilon | collapsing parameter that controls the amount of smoothing(allowed range: from |
old.style | Parameter to choose between two slightly different variants of thegeneralized skyline plot (Strimmer and Pybus, pers. comm.). The default value |
ci | coalescent intervals (i.e. an object of class |
GRID | Parameter for the grid search for |
MINEPS | Parameter for the grid search for |
... | Any of the above parameters. |
Details
skyline implements thegeneralized skyline plot introduced inStrimmer and Pybus (2001). Forepsilon = 0 thegeneralized skyline plot degenerates to theclassic skyline plot described inPybus et al. (2000). The latter is in turn directly related to lineage-through-time plots(Nee et al., 1995).
Value
skyline returns an object of class"skyline" with the following entries:
time | A vector with the time at the end of each coalescentinterval (i.e. the accumulated interval lengths from the beginning of the first intervalto the end of an interval) |
interval.length | A vector with the length of each interval. |
population.size | A vector with the effective population size of each interval. |
parameter.count | Number of free parameters in the skyline plot. |
epsilon | The value of the underlying smoothing parameter. |
logL | Log-likelihood of skyline plot (see Strimmer and Pybus, 2001). |
logL.AICc | AICc corrected log-likelihood (see Strimmer and Pybus, 2001). |
find.skyline.epsilon returns the value of theepsilon parameterthat maximizeslogL.AICc.
Author(s)
Korbinian Strimmer
References
Strimmer, K. and Pybus, O. G. (2001) Exploring the demographic historyof DNA sequences using the generalized skyline plot.MolecularBiology and Evolution,18, 2298–2305.
Pybus, O. G, Rambaut, A. and Harvey, P. H. (2000) An integratedframework for the inference of viral population history fromreconstructed genealogies.Genetics,155, 1429–1437.
Nee, S., Holmes, E. C., Rambaut, A. and Harvey, P. H. (1995) Inferringpopulation history from molecular phylogenies.PhilosophicalTransactions of the Royal Society of London. Series B. BiologicalSciences,349, 25–31.
See Also
coalescent.intervals,collapsed.intervals,skylineplot,ltt.plot.
Examples
# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# corresponding coalescent intervalsci <- coalescent.intervals(tree.hiv) # from tree# collapsed intervalscl1 <- collapsed.intervals(ci,0)cl2 <- collapsed.intervals(ci,0.0119)#### classic skyline plot ####sk1 <- skyline(cl1) # from collapsed intervals sk1 <- skyline(ci) # from coalescent intervalssk1 <- skyline(tree.hiv) # from treesk1plot(skyline(tree.hiv))skylineplot(tree.hiv) # shortcutplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)#### generalized skyline plot ####sk2 <- skyline(cl2) # from collapsed intervalssk2 <- skyline(ci, 0.0119) # from coalescent intervalssk2 <- skyline(tree.hiv, 0.0119) # from treesk2plot(sk2)# classic and generalized skyline plot together in one plotplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)# find optimal epsilon parameter using AICc criterionfind.skyline.epsilon(ci)sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilonsk3$epsilonDrawing Skyline Plot Graphs
Description
These functions provide various ways to drawskyline plot graphson the current graphical device. Note thatskylineplot(z, ...) is simplya shortcut forplot(skyline(z, ...)).The skyline plot itself is an estimate of effective population size through time,and is computed using the functionskyline.
Usage
## S3 method for class 'skyline'plot(x, show.years=FALSE, subst.rate, present.year, ...)## S3 method for class 'skyline'lines(x, show.years=FALSE, subst.rate, present.year, ...)skylineplot(z, ...)skylineplot.deluxe(tree, ...)Arguments
x | skyline plot data (i.e. an object of class |
z | Either an ultrametric tree (i.e. an object of class |
tree | ultrametric tree (i.e. an object of class |
show.years | option that determines whether the time is plotted in units ofof substitutions (default) or in years (requires specification of substution rateand year of present). |
subst.rate | substitution rate (see option show.years). |
present.year | present year (see option show.years). |
... | further arguments to be passed on to |
Details
Seeskyline for more details (incl. references) about the skyline plot method.
Author(s)
Korbinian Strimmer
See Also
plot andlines for the basic plottingfunction in R,coalescent.intervals,skyline
Examples
# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree#### classic skyline plotskylineplot(tree.hiv) # shortcut#### plot classic and generalized skyline plots and estimate epsilonsk.opt <- skylineplot.deluxe(tree.hiv)sk.opt$epsilon#### classic and generalized skyline plot ####sk1 <- skyline(tree.hiv)sk2 <- skyline(tree.hiv, 0.0119)# use years rather than substitutions as unit for the time axisplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2, show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)#### various skyline plots for different epsilonslayout(mat= matrix(1:6,2,3,byrow=TRUE))ci <- coalescent.intervals(tree.hiv)plot(skyline(ci, 0.0));title(main="0.0")plot(skyline(ci, 0.007));title(main="0.007")plot(skyline(ci, 0.0119),col=4);title(main="0.0119")plot(skyline(ci, 0.02));title(main="0.02")plot(skyline(ci, 0.05));title(main="0.05")plot(skyline(ci, 0.1));title(main="0.1")layout(mat= matrix(1:1,1,1,byrow=TRUE))Slowinski-Guyer Test of Homogeneous Diversification
Description
This function performs the Slowinski–Guyer test that a trait orvariable does not increase diversification rate.
Usage
slowinskiguyer.test(x, detail = FALSE)Arguments
x | a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease diversification rate, and the second one the number ofspecies in the corresponding sister-clade without the trait. Eachrow represents a pair of sister-clades. |
detail | if |
Details
The Slowinski–Guyer test compares a series of sister-clades where oneof the two is characterized by a trait supposed to increasediversification rate. The null hypothesis is that the trait does notaffect diversification. If the trait decreased diversification rate,then the null hypothesis cannot be rejected.
The present function has mainly a historical interest. TheSlowinski–Guyer test generally performs poorly: see Paradis (2012)alternatives and the functions cited below.
Value
a data frame with the\chi^2, the number of degrees offreedom, and theP-value. Ifdetail = TRUE, a list isreturned with the data frame and a vector of individualP-values for each pair of sister-clades.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.
Slowinski, J. B. and Guyer, C. (1993) Testing whether certain traitshave caused amplified diversification: an improved method based on amodel of random speciation and extinction.American Naturalist,142, 1019–1024.
See Also
balance,mcconwaysims.test,diversity.contrast.test,richness.yule.test,rc ingeiger,shift.test inapTreeshape
Examples
### from Table 1 in Slowinski and Guyer(1993):viviparous <- c(98, 8, 193, 36, 7, 128, 2, 3, 23, 70)oviparous <- c(234, 17, 100, 4, 1, 12, 6, 1, 481, 11)x <- data.frame(viviparous, oviparous)slowinskiguyer.test(x, TRUE) # 'P ~ 0.32' in the paperxalt <- xxalt[3, 2] <- 1slowinskiguyer.test(xalt)Solve Ambiguous Bases in DNA Sequences
Description
Replaces ambiguous bases in DNA sequences (R, Y, W, ...) by A, G, C,or T.
Usage
solveAmbiguousBases(x, method = "columnwise", random = TRUE)Arguments
x | a matrix of class |
method | the method used (no other choice than the default forthe moment; see details). |
random | a logical value (see details). |
Details
The replacements of ambiguous bases are done columwise. First, thebase frequencies are counted: if no ambiguous base is found in thecolumn, nothing is done. By default (i.e., ifrandom = TRUE),the replacements are done by random sampling using the frequencies ofthe observed compatible, non-ambiguous bases. For instance, if theambiguous base is Y, it is replaced by either C or T using theirobserved frequencies as probabilities. Ifrandom = FALSE, thegreatest of these frequencies is used. If there are no compatiblebases in the column, equal probabilities are used. For instance, ifthe ambiguous base is R, and only C and T are observed, then it isreplaced by either A or G with equal probabilities.
Alignment gaps are not changed; see the functionlatag2nto change the leading and trailing gaps.
Value
a matrix of class"DNAbin".
Author(s)
Emmanuel Paradis
See Also
Examples
X <- as.DNAbin(matrix(c("A", "G", "G", "R"), ncol = 1))alview(solveAmbiguousBases(X)) # R replaced by either A or Galview(solveAmbiguousBases(X, random = FALSE)) # R always replaced by GSpecies Tree Estimation
Description
This function calculates the species tree from a set of gene trees.
Usage
speciesTree(x, FUN = min)Arguments
x | a list of trees, e.g., an object of class |
FUN | a function used to compute the divergence times of eachpair of tips. |
Details
For all trees inx, the divergence time of each pair of tips iscalculated: these are then ‘summarized’ withFUN to build a newdistance matrix used to calculate the species tree with asingle-linkage hierarchical clustering. The default forFUNcomputes the maximum tree (maxtree) of Liu et al. (2010). UsingFUN = mean gives the shallowest divergence tree of Maddison andKnowles (2006).
Value
an object of class"phylo".
Author(s)
Emmanuel Paradis
References
Liu, L., Yu, L. and Pearl, D. K. (2010) Maximum tree: a consistentestimator of the species tree.Journal of Mathematical Biology,60, 95–106.
Maddison, W. P. and Knowles, L. L. (2006) Inferring phylogeny despiteincomplete lineage sorting.Systematic Biology,55, 21–30.
Examples
### example in Liu et al. (2010):tr1 <- read.tree(text = "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);")tr2 <- read.tree(text = "(((A:0.07,C:0.07):0.02,D:0.09):0.03,B:0.12);")TR <- c(tr1, tr2)TSmax <- speciesTree(TR) # MAXTREETSsha <- speciesTree(TR, mean) # shallowest divergencekronoviz(c(tr1, tr2, TSmax, TSsha), horiz = FALSE, type = "c", cex = 1.5, font = 1)mtext(c("Gene tree 1", "Gene tree 2", "Species tree - MAXTREE"), at = -c(7.5, 4, 1))mtext("Species tree - Shallowest Divergence")layout(1)Generates Systematic Regular Trees
Description
This function generates trees with regular shapes.
Usage
stree(n, type = "star", tip.label = NULL)Arguments
n | an integer giving the number of tips in the tree. |
type | a character string specifying the type of tree togenerate; four choices are possible: |
tip.label | a character vector giving the tip labels; if notspecified, the tips "t1", "t2", ..., are given. |
Details
The types of trees generated are:
“star”: a star (or comb) tree with a single internal node.
“balanced”: a fully balanced dichotomous rooted tree;
nmust be a power of 2 (2, 4, 8, ...).“left”: a fully unbalanced rooted tree where the largestclade is on the left-hand side when the tree is plotted upwards.
“right”: same than above but in the other direction.
Value
An object of class"phylo".
Author(s)
Emmanuel Paradis
See Also
Examples
layout(matrix(1:4, 2, 2))plot(stree(100))plot(stree(128, "balanced"))plot(stree(100, "left"))plot(stree(100, "right"))Zoom on a Portion of a Phylogeny by Successive Clicks
Description
This function plots simultaneously a whole phylogenetic tree(supposedly large) and a portion of it determined by clicking on the nodes of the phylogeny. On exit, returns the last subtree visualized.
Usage
subtreeplot(x, wait=FALSE, ...)Arguments
x | an object of class |
wait | a logical indicating whether the node beeing processed should be printed (useful for big phylogenies). |
... | further arguments passed to |
Details
This function aims at easily exploring very large trees. The main argument isa phylogenetic tree, and the second one is a logical indicating whether a waiting message should be printed while the calculation is being processed.
The whole tree is plotted on the left-hand side in half of the device. Thesubtree is plotted on the right-hand side in the other half. The user clicks on the nodes in the complete tree and the subtree corresponding to this node is ploted in the right-hand side. There is no limit for the number of clicks that can be done. On exit, the subtree on the right hand side is returned.
To use a subtree as the new tree in which to zoom, the user has to use the function many times. This can however be done in a single command line (see example 2).
Author(s)
Damien de Viennedamien.de-vienne@u-psud.fr
See Also
Examples
## Not run: #example 1: simpletree1 <- rtree(50)tree2 <- subtreeplot(tree1, wait = TRUE) # on exit, tree2 will be a subtree of tree1#example 2: more than one zoomtree1 <- rtree(60)tree2 <- subtreeplot(subtreeplot(subtreeplot(tree1))) # allow three succssive zooms## End(Not run)All subtrees of a Phylogenetic Tree
Description
This function returns a list of all the subtrees of a phylogenetic tree.
Usage
subtrees(tree, wait=FALSE)Arguments
tree | an object of class |
wait | a logical indicating whether the node beeing processed should be printed (useful for big phylogenies). |
Value
subtrees returns a list of trees of class"phylo" andreturns invisibly for each subtree a list with the followingcomponents:
tip.label | |
node.label | |
Ntip | |
Nnode |
Author(s)
Damien de Viennedamien.de-vienne@u-psud.fr
See Also
zoom,subtreeplot for functions extracting particular subtrees.
Examples
### Random tree with 12 leavesphy<-rtree(12)par(mfrow=c(4,3))plot(phy, sub="Complete tree")### Extract the subtreesl<-subtrees(phy)### plot all the subtreesfor (i in 1:11) plot(l[[i]], sub=paste("Node", l[[i]]$node.label[1]))par(mfrow=c(1,1))Print Summary of a Phylogeny
Description
The first function prints a compact summary of a phylogenetic tree (anobject of class"phylo"). The three other functions return thenumber of tips, nodes, or edges, respectively.
Usage
## S3 method for class 'phylo'summary(object, ...)Ntip(phy)## S3 method for class 'phylo'Ntip(phy)## S3 method for class 'multiPhylo'Ntip(phy)Nnode(phy, ...)## S3 method for class 'phylo'Nnode(phy, internal.only = TRUE, ...)## S3 method for class 'multiPhylo'Nnode(phy, internal.only = TRUE, ...)Nedge(phy)## S3 method for class 'phylo'Nedge(phy)## S3 method for class 'multiPhylo'Nedge(phy)Arguments
object,phy | an object of class |
... | further arguments passed to or from other methods. |
internal.only | a logical indicating whether to return the numberof internal nodes only (the default), or of internal and terminal(tips) nodes (if |
Details
The summary includes the numbers of tips and of nodes, summarystatistics of the branch lengths (if they are available) with mean,variance, minimum, first quartile, median, third quartile, andmaximum, listing of the first ten tip labels, and (if available) ofthe first ten node labels. It is also printed whether some of theseoptional elements (branch lengths, node labels, and root edge) are notfound in the tree.
summary simply prints its results on the standard output and isnot meant for programming.
Value
A NULL value in the case ofsummary, a single numeric value forthe three other functions.
Author(s)
Emmanuel Paradis
See Also
read.tree,summary for the generic Rfunction,multiphylo,c.phylo
Examples
data(bird.families)summary(bird.families)Ntip(bird.families)Nnode(bird.families)Nedge(bird.families)Translation from DNA to Amino Acid Sequences
Description
trans translates DNA sequences into amino acids.complement returns the (reverse) complement sequences.
Usage
trans(x, code = 1, codonstart = 1)complement(x)Arguments
x | an object of class |
code | an integer value giving the genetic code to beused. Currently only the genetic codes 1 to 6 are supported. |
codonstart | an integer giving where to start the translation. Thisshould be 1, 2, or 3, but larger values are accepted and have foreffect to start the translation further towards the 3'-end of the sequence. |
Details
Withtrans, if the sequence length is not a multiple of three,a warning message is printed. Alignment gaps are simply ignored (i.e.,AG- returnsX with no special warning or message). Baseambiguities are taken into account where relevant: for instance,GGN,GGA,GGR, etc, all returnG.
See the link given in the References for details about the taxonomiccoverage and alternative codons of each code.
Value
an object of class"AAbin" or"DNAbin", respectively.
Note
These functions are equivalent totranslate andcomp inthe packageseqinr with the difference that there is no need toconvert the sequences into character strings.
Author(s)
Emmanuel Paradis
References
https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=cgencodes
See Also
Examples
data(woodmouse)X <- trans(woodmouse) # not correctX2 <- trans(woodmouse, 2) # using the correct codeidentical(X, X2)alview(X[1:2, 1:60]) # some 'Stop' codons (*)alview(X2[, 1:60])X2Tree Popping
Description
Method for reconstructing phylogenetic trees from an object of classsplits using tree popping.
Usage
treePop(obj)Arguments
obj | an object of class |
Value
an object of class "phylo" which displays all the splitsin the input object.
Author(s)
Andrei Popescu
Tree Explorer With Multiple Devices
Description
This function requires a plotted tree: the user is invited to clickclose to a node and the corresponding subtree (or clade) is plotted ona new window.
Usage
trex(phy, title = TRUE, subbg = "lightyellow3", return.tree = FALSE, ...)Arguments
phy | an object of class |
title | a logical or a character string (see details). |
subbg | a character string giving the background colour for thesubtree. |
return.tree | a logical: if |
... | further arguments to pass to |
Details
This function works with a tree (freshly) plotted on an interactivegraphical device (i.e., not a file). After callingtrex, theuser clicks close to a node of the tree, then the clade from this nodeis plotted on anew window. The user can click as many times onthe main tree: the clades are plotted successively on thesamenew window. The process is stopped by a right-click. If the user clickstoo close to the tips, a message “Try again!” is printed.
Each timetrex is called, the subtree is plotted on a newwindow without closing or deleting those possibly alreadyplotted. They may be distinguished with the optionstitleand/orsubbg.
In all cases, the device wherephy is plotted is the activewindow after the operation. It shouldnot be closed during thewhole process.
Iftitle = TRUE, a default title is printed on the new windowusing the node label, or the node number if there are no node labelsin the tree. Iftitle = FALSE, no title is printed. Iftitle is a character string, it is used for the title.
Value
an object of class"phylo" ifreturn.tree = TRUE
Author(s)
Emmanuel Paradis
See Also
Examples
## Not run: tr <- rcoal(1000)plot(tr, show.tip.label = FALSE)trex(tr) # left-click as many times as you want, then right-clicktr <- makeNodeLabel(tr)trex(tr, subbg = "lightgreen") # id.## generate a random colour with control on the darkness:rRGB <- function(a, b) rgb(runif(1, a, b), runif(1, a, b), runif(1, a, b))### with a random pale background:trex(tr, subbg = rRGB(0.8, 1))## the above can be called many times...graphics.off() # close all graphical devices## End(Not run)Tree Reconstruction Based on the Triangles Method
Description
Fast distance-based construction method. Should only be used whendistance measures are fairly reliable.
Usage
triangMtd(X)triangMtds(X)Arguments
X | a distance matrix |
.
Value
an object of class"phylo".
Author(s)
Andrei Popescu
References
http://archive.numdam.org/ARCHIVE/RO/RO_2001__35_2/RO_2001__35_2_283_0/RO_2001__35_2_283_0.pdf
See Also
Examples
data(woodmouse)tr <- triangMtd(dist.dna(woodmouse))plot(tr)Revomes Duplicate Trees
Description
This function scans a list of trees, and returns a list with theduplicate trees removed. By default the labelled topologies arecompared.
Usage
## S3 method for class 'multiPhylo'unique(x, incomparables = FALSE, use.edge.length = FALSE, use.tip.label = TRUE, ...)Arguments
x | an object of class |
incomparables | unused (for compatibility with the generic). |
use.edge.length | a logical specifying whether to consider the edgelengths in the comparisons; the default is |
use.tip.label | a logical specifying whether to consider the tiplabels in the comparisons; the default is |
... | further arguments passed to or from other methods. |
Value
an object of class"multiPhylo" with an attribute"old.index" indicating which trees of the original list aresimilar (the tree of smaller index is taken as reference).
Author(s)
Emmanuel Paradis
See Also
all.equal.phylo,unique for the generic Rfunction,read.tree,read.nexus
Examples
TR <- rmtree(50, 4)length(unique(TR)) # not always 15...howmanytrees(4)Update Labels
Description
This function changes labels (names or rownames) giving two vectors (old andnew). It is a generic function with several methods as described below.
Usage
updateLabel(x, old, new, ...)## S3 method for class 'character'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'DNAbin'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'AAbin'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'phylo'updateLabel(x, old, new, exact = TRUE, nodes = FALSE, ...)## S3 method for class 'evonet'updateLabel(x, old, new, exact = TRUE, nodes = FALSE, ...)## S3 method for class 'data.frame'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'matrix'updateLabel(x, old, new, exact = TRUE, ...)Arguments
x | an object where to change the labels. |
old,new | two vectors of mode character (must be of the same length). |
exact | a logical value (see details). |
nodes | a logical value specifying whether to also update the node labels of the tree or network. |
... | further arguments passed to and from methods. |
Details
This function can be used to change some of the labels (see examples) or all of them if their ordering is not sure.
Ifexact = TRUE (the default), the values inold are matched exactly with the labels; otherwise (exact = FALSE), the values inold are considered as regular expressions and searched in the labels withgrep.
Value
an object of the same class thanx.
Author(s)
Emmanuel Paradis
See Also
makeLabel,makeNodeLabel,mixedFontLabel,stripLabel,checkLabel
Examples
## Not run: ## the tree by Nyakatura & Bininda-Emonds (2012, BMC Biology)x <- "https://static-content.springer.com/esm/art"y <- "3A10.1186"z <- "2F1741-7007-10-12/MediaObjects/12915_2011_534_MOESM5_ESM.NEX"## The commande below may not print correctly in HTML because of the## percentage symbol; see the text or PDF help page.url <- paste(x, y, z, sep = "TC <- read.nexus(url)tr <- TC$carnivoreST_bestEstimateold <- c("Uncia_uncia", "Felis_manul", "Leopardus_jacobitus")new <- c("Panthera_uncia", "Otocolobus_manul", "Leopardus_jacobita")tr.updated <- updateLabel(tr, old, new)## End(Not run)tr <- rtree(6)## the order of the labels are randomized by this functionold <- paste0("t", 1:6)new <- paste0("x", 1:6)updateLabel(tr, old, new)trVariance Components with Orthonormal Contrasts
Description
This function calls Phylip's contrast program and returns thephylogenetic and phenotypic variance-covariance components for one orseveral traits. There can be several observations per species.
Usage
varCompPhylip(x, phy, exec = NULL)Arguments
x | a numeric vector, a matrix (or data frame), or a list. |
phy | an object of class |
exec | a character string giving the name of the executablecontrast program (see details). |
Details
The datax can be in several forms: (i) a numeric vector ifthere is single trait and one observation per species; (ii) amatrix or data frame if there are several traits (as columns) and asingle observation of each trait for each species; (iii) a list ofvectors if there is a single trait and several observations perspecies; (iv) a list of matrices or data frames: same than (ii) butwith several traits and the rows are individuals.
Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.
Phylip (version 3.68 or higher) must be accessible on your computer. Ifyou have a Unix-like operating system, the executable name is assumedto be"phylip contrast" (as in Debian); otherwise it is setto"contrast". If this doesn't suit your system, use theoptionexec accordingly. If the executable is not in the path, youmay need to specify it, e.g.,exec = "C:/Program Files/Phylip/contrast".
Value
a list with elementsvarA andvarE with the phylogenetic(additive) and phenotypic (environmental) variance-covariancematrices. If a single trait is analyzed, these contains its variances.
Author(s)
Emmanuel Paradis
References
Felsenstein, J. (2004) Phylip (Phylogeny Inference Package) version3.68. Department of Genetics, University of Washington, Seattle, USA.http://evolution.genetics.washington.edu/phylip/phylip.html.
Felsenstein, J. (2008) Comparative methods with sampling error andwithin-species variation: Contrasts revisited and revised.American Naturalist,171, 713–725.
See Also
Examples
## Not run: tr <- rcoal(30)### Five traits, one observation per species:x <- replicate(5, rTraitCont(tr, sigma = 1))varCompPhylip(x, tr) # varE is smallx <- replicate(5, rnorm(30))varCompPhylip(x, tr) # varE is large### Five traits, ten observations per species:x <- replicate(30, replicate(5, rnorm(10)), simplify = FALSE)varCompPhylip(x, tr)## End(Not run)Compute Variance Component Estimates
Description
Get variance component estimates from a fittedlme object.
Usage
varcomp(x, scale = FALSE, cum = FALSE)Arguments
x | A fitted |
scale | Scale all variance so that they sum to 1 |
cum | Send cumulative variance components. |
Details
Variance computations is done as in Venables and Ripley (2002).
Value
A named vector of classvarcomp with estimated variance components.
Author(s)
Julien Dutheildutheil@evolbio.mpg.de
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statisticswith S (Fourth Edition). New York: Springer-Verlag.
See Also
Examples
data(carnivora)library(nlme)m <- lme(log10(SW) ~ 1, random = ~ 1|Order/SuperFamily/Family/Genus, data=carnivora)v <- varcomp(m, TRUE, TRUE)plot(v)Phylogenetic Variance-covariance or Correlation Matrix
Description
This function computes the expected variances and covariances of acontinuous trait assuming it evolves under a given model.
This is a generic function with methods for objects of class"phylo" and"corPhyl".
Usage
vcv(phy, ...)## S3 method for class 'phylo'vcv(phy, model = "Brownian", corr = FALSE, ...)## S3 method for class 'corPhyl'vcv(phy, corr = FALSE, ...)Arguments
phy | an object of the correct class (see above). |
model | a character giving the model used to compute thevariances and covariances; only |
corr | a logical indicating whether the correlation matrix shouldbe returned ( |
... | further arguments to be passed to or from other methods. |
Value
a numeric matrix with the names of the tips as colnames and rownames.
Note
Do not confuse this function withvcov whichcomputes the variance-covariance matrix among parameters of a fittedmodel object.
Author(s)
Emmanuel Paradis
References
Garland, T. Jr. and Ives, A. R. (2000) Using the past to predict thepresent: confidence intervals for regression equations in phylogeneticcomparative methods.American Naturalist,155, 346–364.
See Also
corBrownian,corMartins,corGrafen,corPagel,corBlomberg,vcv2phylo
Examples
tr <- rtree(5)## all are the same:vcv(tr)vcv(corBrownian(1, tr))vcv(corPagel(1, tr))Variance-Covariance Matrix to Tree
Description
This function transforms a variance-covariance matrix into aphylogenetic tree.
Usage
vcv2phylo(mat, tolerance = 1e-7)Arguments
mat | a square symmetric (positive-definite) matrix. |
tolerance | the numeric tolerance used to compare the branchlengths. |
Details
The function tests if the matrix is symmetric and positive-definite(i.e., all its eigenvalues positive within the specified tolerance).
Value
an object of class"phylo".
Author(s)
Simon Blomberg
See Also
Examples
tr <- rtree(10)V <- vcv(tr) # VCV matrix assuming Brownian motionz <- vcv2phylo(V)identical(tr, z) # FALSEall.equal(tr, z) # TRUEDefine Similarity Matrix
Description
weight.taxo computes a matrix whose entries [i, j] are set to 1if x[i] == x[j], 0 otherwise.
weight.taxo2 computes a matrix whose entries [i, j] are set to 1if x[i] == x[j] AND y[i] != y[j], 0 otherwise.
The diagonal [i, i] is always set to 0.
The returned matrix can be used as a weight matrix inMoran.I.x andy may be vectors offactors.
See further details invignette("MoranI").
Usage
weight.taxo(x) weight.taxo2(x, y)Arguments
x,y | a vector or a factor. |
Value
a square numeric matrix.
Author(s)
Emmanuel Paradis
See Also
Find Patterns in DNA Sequences
Description
This function finds patterns in a single or a set of DNA or AA sequences.
Usage
where(x, pattern)Arguments
x | an object inheriting the class either |
pattern | a character string to be searched in |
Details
Ifx is a vector, the function returns a single vector givingthe position(s) where the pattern was found. Ifx is a matrixor a list, it returns a list with the positions of the pattern foreach sequence.
Patterns may be overlapping. For instance, ifpattern = "tata"and the sequence starts with ‘tatata’, then the output will be c(1, 3).
Value
a vector of integers or a list of such vectors.
Author(s)
Emmanuel Paradis
See Also
Examples
data(woodmouse)where(woodmouse, "tata")## with AA sequences:x <- trans(woodmouse, 2)where(x, "irk")Identifies Edges of a Tree
Description
This function identifies the edges that belong to a group (possiblynon-monophyletic) specified as a set of tips.
Usage
which.edge(phy, group)Arguments
phy | an object of class |
group | a vector of mode numeric or character specifying the tipsfor which the edges are to be identified. |
Details
The group of tips specified in ‘group’ may be non-monophyletic(paraphyletic or polyphyletic), in which case all edges from the tipsto their most recent common ancestor are identified.
The identification is made with the indices of the rows of the matrix‘edge’ of the tree.
Value
a numeric vector.
Author(s)
Emmanuel Paradis
See Also
Cytochrome b Gene Sequences of Woodmice
Description
This is a set of 15 sequences of the mitochondrial gene cytochromeb of the woodmouse (Apodemus sylvaticus) which is asubset of the data analysed by Michaux et al. (2003). The full dataset is available through GenBank (accession numbers AJ511877 toAJ511987).
Usage
data(woodmouse)Format
An object of class"DNAbin".
Source
Michaux, J. R., Magnanou, E., Paradis, E., Nieberding, C. and Libois,R. (2003) Mitochondrial phylogeography of the Woodmouse(Apodemus sylvaticus) in the Western Palearctic region.Molecular Ecology,12, 685–697.
See Also
Examples
data(woodmouse)str(woodmouse)Write DNA Sequences in a File
Description
These functions write in a file a list of DNA sequences in sequential,interleaved, or FASTA format.write.FASTA can write either DNAor AA sequences.
Usage
write.dna(x, file, format = "interleaved", append = FALSE, nbcol = 6, colsep = " ", colw = 10, indent = NULL, blocksep = 1)write.FASTA(x, file, header = NULL, append = FALSE)Arguments
x | a list or a matrix of DNA sequences, or of AA sequences for |
file | a file name specified by either a variable of mode character,or a double-quoted string. |
format | a character string specifying the format of the DNAsequences. Three choices are possible: |
append | a logical, if |
nbcol | a numeric specifying the number of columns per row (6 bydefault); may be negative implying that the nucleotides are printedon a single line. |
colsep | a character used to separate the columns (a singlespace by default). |
colw | a numeric specifying the number of nucleotides per column(10 by default). |
indent | a numeric or a character specifying how the blocks ofnucleotides are indented (see details). |
blocksep | a numeric specifying the number of lines between theblocks of nucleotides (this has an effect only if 'format ="interleaved"'). |
header | a vector of mode character giving the header to bewritten in the FASTA file before the sequences. By default, there isno header. |
Details
Three formats are supported in the present function: see the help pageofread.dna and the references below for a description.
If the sequences have no names, then they are given "1", "2", ... aslabels in the file.
With the interleaved and sequential formats, the sequences must be allof the same length. The names of the sequences are not truncated.
The argumentindent specifies how the rows of nucleotides areindented. In the interleaved and sequential formats, the rows withthe taxon names are never indented; the subsequent rows are indentedwith 10 spaces by default (i.e., ifindent = NULL). In the FASTAformat, the rows are not indented by default. This default behaviourcan be modified by specifying a value toindent: the rows are thenindented with “indent” (if it is a character) or ‘indent’ spaces (ifit is a numeric). For example, specifyingindent = " " orindent = 3 will have the same effect (useindent = "\t"for a tabulation).
The different options are intended to give flexibility in formattingthe sequences. For instance, if the sequences are very long it may bejudicious to remove all the spaces beween columns (colsep = ""), inthe margins (indent = 0), and between the blocks (blocksep = 0) toproduce a smaller file.
write.dna(, format = "fasta") can be very slow if the sequencesare long (> 10 kb).write.FASTA is much faster in thissituation but the formatting is not flexible: each sequence is printedon a single line, which is OK for big files that are not intended tobe open with a text editor.
Value
None (invisible ‘NULL’).
Note
Specifying a negative value for ‘nbcol’ (meaning that the nucleotidesare printed on a single line) gives the same output for theinterleaved and sequential formats.
The names of the sequences can be truncated with the functionmakeLabel. In particular, Clustal is limited to 30characters, and PHYML seems limited to 99 characters.
Author(s)
Emmanuel Paradis
References
Anonymous. FASTA format.https://en.wikipedia.org/wiki/FASTA_format
Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version3.5c. Department of Genetics, University of Washington.http://evolution.genetics.washington.edu/phylip/phylip.html
See Also
read.dna,read.GenBank,makeLabel
Write Tree File in Nexus Format
Description
This function writes trees in a file with the NEXUS format.
Usage
write.nexus(..., file = "", translate = TRUE, digits = 10)Arguments
... | either (i) a single object of class |
file | a file name specified by either a variable of mode character,or a double-quoted string; if |
translate | a logical, if |
digits | a numeric giving the number of digits used for printingbranch lengths. For negative numbers no branch lengths are printed. |
Details
If several trees are given, they must all have the same tip labels.
If among the objects given some are not trees of class"phylo",they are simply skipped and not written in the file.
Seewrite.tree for details on how tip (and node) labelsare checked before being printed.
Value
None (invisible ‘NULL’).
Author(s)
Emmanuel Paradis
References
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.
See Also
read.nexus,read.tree,write.tree,read.nexus.data,write.nexus.data,write.phyloXML
Write Character Data in NEXUS Format
Description
This function writes in a file a list of data in the NEXUS format. Thenames of the vectors of the list are used as taxon names.
For the moment, only sequence data (DNA or protein) are supported.
Usage
write.nexus.data(x, file, format = "dna", datablock = TRUE, interleaved = TRUE, charsperline = NULL, gap = NULL, missing = NULL)Arguments
x | a matrix or a list of data each made of a single vectorof mode character where each element is a character state (e.g.,“A”, “C”, ...) Objects of class of “DNAbin”are accepted. |
file | a file name specified by either a variable of modecharacter, or a double-quoted string. |
format | a character string specifying the format of thesequences. Four choices are possible: |
datablock | a logical, if |
interleaved | a logical, if |
charsperline | a numeric value specifying the number ofcharacters per line when used with |
gap | a character specifying the symbol for gap. Default is“ |
missing | a character specifying the symbol for missingdata. Default is “ |
Details
If the sequences have no names, then they are given “1”,“2”, ..., as names in the file.
Sequences must be all of the same length.
Value
None (invisible ‘NULL’).
Author(s)
Johan Nylandernylander@scs.fsu.edu and Thomas Guillerme
References
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.
See Also
read.nexus,write.nexus,read.nexus.data
Examples
## Not run: ## Write interleaved DNA data with 100 characters per line in a DATA blockdata(woodmouse)write.nexus.data(woodmouse, file= "wood.ex.nex", interleaved = TRUE, charsperline = 100)## Write sequential DNA data in TAXA and CHARACTERS blocksdata(cynipids)write.nexus.data(cynipids, file = "cyn.ex.nex", format = "protein", datablock = FALSE, interleaved = FALSE)unlink(c("wood.ex.nex", "cyn.ex.nex"))## End(Not run)Write Tree File in phyloXML Format
Description
This function writes trees to a file of phyloXML format.
Usage
write.phyloXML(phy, file = "", tree.names = FALSE)Arguments
phy | an object of class |
file | a file name specified by either a variable of mode character,or a double-quoted string; if |
tree.names | either a logical or a vector of mode characterspecifying whether or which tree names should be written to the file. |
Details
If several trees are given, they will be represented as multiple<phylogeny> elements. Contrary towrite.nexus, the treesneed not have the same tip labels.
Whentree.names isTRUE, the tree names will be alwaysadded as <name> tags to each phylogeny element. If thephyobject is unnamed, then the names will be automatically generatedfrom the tree indices as "tree<index>" (e.g. tree1, tree2, ...). Iftree.names is a character vector, the specified names will beused instead.
Branch lengths, labels, and rootedness are preserved in the phyloXMLfile.
Value
None (invisibleNULL).
Author(s)
Federico Marotta
References
Han, M. V. and Zmasek, C. M. (2009) phyloXML: XML for evolutionary biology andcomparative genomics.BMC Bioinformatics,10, 356.
See Also
read.tree,write.tree,read.nexus,write.nexus,read.nexus.data,write.nexus.data
Write Tree File in Parenthetic Format
Description
This function writes in a file a tree in parenthetic format using theNewick (also known as New Hampshire) format.
Usage
write.tree(phy, file = "", append = FALSE, digits = 10, tree.names = FALSE)Arguments
phy | an object of class |
file | a file name specified by either a variable of mode character,or a double-quoted string; if |
append | a logical, if |
digits | a numeric giving the number of (significant) digits usedfor printing branch lengths (see details). For negative numbers nobranch lengths are printed. |
tree.names | either a logical or a vector of mode character. If |
Details
The node labels and the root edge length, if available, are written inthe file.
Iftree.names == TRUE then a variant of the Newick format iswritten for which the name of a tree precedes the Newick format tree(parentheses are eventually deleted beforehand). The tree names aretaken from thenames attribute if present (they are ignored iftree.names is a character vector).
The tip labels (and the node labels if present) are checked beforebeing printed: the leading and trailing spaces, and the leading leftand trailing right parentheses are deleted; the other spaces arereplaced by underscores; the commas, colons, semicolons, and the otherparentheses are replaced with dashes.
The argumentdigits gives the number ofsignificantdigits (not rounding). For instance, ifdigits = 2 the branchlength 1.234e-7 is printed as 1.23e-7 (not 0).
Value
a vector of mode character iffile = "", none (invisibleNULL) otherwise.
Author(s)
Emmanuel Paradis, Daniel Lawsondan.lawson@bristol.ac.uk, and Klaus Schliepklaus.schliep@gmail.com
References
Felsenstein, J. The Newick tree format.http://evolution.genetics.washington.edu/phylip/newicktree.html
Olsen, G. Interpretation of the "Newick's 8:45" tree format standard.http://evolution.genetics.washington.edu/phylip/newick_doc.html
See Also
read.tree,read.nexus,write.nexus,write.phyloXML
Fits the Yule Model to a Phylogenetic Tree
Description
This function fits by maximum likelihood a Yule model, i.e., abirth-only model to the branching times computed from a phylogenetictree.
Usage
yule(phy, use.root.edge = FALSE)Arguments
phy | an object of class |
use.root.edge | a logical specifying whether to consider the rootedge in the calculations. |
Details
The tree must be fully dichotomous.
The maximum likelihood estimate of the speciation rate is obtained bythe ratio of the number of speciation events on the cumulative numberof species through time; these two quantities are obtained with thenumber of nodes in the tree, and the sum of the branch lengths,respectively.
If there is a ‘root.edge’ element in the phylogenetic tree, anduse.root.edge = TRUE, then it is assumed that it has abiological meaning and is counted as a branch length, and the root iscounted as a speciation event; otherwise the number of speciationevents is the number of nodes - 1.
The standard-error of lambda is computed with the second derivative ofthe log-likelihood function.
Value
An object of class "yule" which is a list with the followingcomponents:
lambda | the maximum likelihood estimate of the speciation(birth) rate. |
se | the standard-error of lambda. |
loglik | the log-likelihood at its maximum. |
Author(s)
Emmanuel Paradis
See Also
branching.times,diversi.gof,diversi.time,ltt.plot,birthdeath,bd.ext,yule.cov
Fits the Yule Model With Covariates
Description
This function fits by maximum likelihood the Yule model withcovariates, that is a birth-only model where speciation rate isdetermined by a generalized linear model.
Usage
yule.cov(phy, formula, data = NULL)Arguments
phy | an object of class |
formula | a formula specifying the model to be fitted. |
data | the name of the data frame where the variables in |
Details
The model fitted is a generalization of the Yule model where thespeciation rate is determined by:
\ln\frac{\lambda_i}{1 - \lambda_i} = \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \alpha
where\lambda_i is the speciation rate for species i,x_{i1}, x_{i2}, \dots are species-specificvariables, and\beta_1, \beta_2, \dots, \alphaare parameters to be estimated. The term on the left-hand side aboveis a logit function often used in generalized linear models forbinomial data (seefamily). The above model canbe written in matrix form:
\mathrm{logit} \lambda_i = x_i' \beta
The standard-errors of the parameters are computed with the secondderivatives of the log-likelihood function. (See References for otherdetails on the estimation procedure.)
The function needs three things:
a phylogenetic tree which may contain multichotomies;
a formula which specifies the predictors of the model describedabove: this is given as a standardR formula and has no response (noleft-hand side term), for instance:
~ x + y, it can includeinteractions (~ x + a * b) (seeformulafor details);the predictors specified in the formula must be accessible tothe function (either in the global space, or though the
dataoption); they can be numeric vectors or factors. The length and theorder of these data are important: the number of values (length) mustbe equal to the number of tips of the tree + the number of nodes. Theorder is the following: first the values for the tips in the sameorder than for the labels, then the values for the nodes sequentiallyfrom the root to the most terminal nodes (i.e., in the order given byphy$edge).
The user must obtain the values for the nodes separately.
Note that the method in its present implementation assumes that thechange in a species trait is more or less continuous between two nodesor between a node and a tip. Thus reconstructing the ancestral valueswith a Brownian motion model may be consistent with the presentmethod. This can be done with the functionace.
Value
A NULL value is returned, the results are simply printed. The outputincludes the deviance of the null (intercept-only) model and alikelihood-ratio test of the fitted model against the null model.Note that the deviance of the null model is different from the onereturned byyule because of the different parametrizations.
Author(s)
Emmanuel Paradis
References
Paradis, E. (2005) Statistical analysis of diversification withspecies traits.Evolution,59, 1–12.
See Also
branching.times,diversi.gof,diversi.time,ltt.plot,birthdeath,bd.ext,yule
Examples
### a simple example with some random datadata(bird.orders)x <- rnorm(45) # the tree has 23 tips and 22 nodes### the standard-error for x should be as large as### the estimated parameteryule.cov(bird.orders, ~ x)### another example with a tree that has a multichotomydata(bird.families)y <- rnorm(272) # 137 tips + 135 nodesyule.cov(bird.families, ~ y)Fits the Time-Dependent Yule Model
Description
This function fits by maximum likelihood the time-dependent Yulemodel. The time is measured from the past (root.time) to thepresent.
Usage
yule.time(phy, birth, BIRTH = NULL, root.time = 0, opti = "nlm", start = 0.01)Arguments
phy | an object of class |
birth | a (vectorized) function specifying how the birth(speciation) probability changes through time (see details). |
BIRTH | a (vectorized) function giving the primitive of |
root.time | a numeric value giving the time of the root node(time is measured from the past towards the present). |
opti | a character string giving the function used foroptimisation of the likelihood function. Three choices are possible: |
start | the initial values used in the optimisation. |
Details
The model fitted is a straightforward extension of the Yule model withcovariates (seeyule.cov). Rather than havingheterogeneity among lineages, the speciation probability is the samefor all lineages at a given time, but can change through time.
The functionbirthmust meet these two requirements: (i)the parameters to be estimated are the formal arguments; (ii) time isnamedt in the body of the function. However, this is theopposite for the primitiveBIRTH:t is the formalargument, and the parameters are used in its body. See the examples.
It is recommended to useBIRTH if possible, and required ifspeciation probability is constant on some time interval. If thisprimitive cannot be provided, a numerical integration is done withintegrate.
The standard-errors of the parameters are computed with the Hessian ofthe log-likelihood function.
Value
An object of class"yule" (seeyule).
Author(s)
Emmanuel Paradis
References
Hubert, N., Paradis, E., Bruggemann, H. and Planes, S. (2011) Communityassembly and diversification in Indo-Pacific coral reeffishes.Ecology and Evolution,1, 229–277.
See Also
branching.times,ltt.plot,birthdeath,yule,yule.cov,bd.time
Examples
### define two models...birth.logis <- function(a, b) 1/(1 + exp(-a*t - b)) # logisticbirth.step <- function(l1, l2, Tcl) { # 2 rates with one break-point ans <- rep(l1, length(t)) ans[t > Tcl] <- l2 ans}### ... and their primitives:BIRTH.logis <- function(t) log(exp(-a*t) + exp(b))/a + tBIRTH.step <- function(t){ out <- numeric(length(t)) sel <- t <= Tcl if (any(sel)) out[sel] <- t[sel] * l1 if (any(!sel)) out[!sel] <- Tcl * l1 + (t[!sel] - Tcl) * l2 out}data(bird.families)### fit both models:yule.time(bird.families, birth.logis)yule.time(bird.families, birth.logis, BIRTH.logis) # same but faster## Not run: yule.time(bird.families, birth.step) # failsyule.time(bird.families, birth.step, BIRTH.step, opti = "nlminb", start = c(.01, .01, 100))Zoom on a Portion of a Phylogeny
Description
This function plots simultaneously a whole phylogenetic tree(supposedly large) and a portion of it.
Usage
zoom(phy, focus, subtree = FALSE, col = rainbow, ...)Arguments
phy | an object of class |
focus | a vector, either numeric or character, or a list ofvectors specifying the tips to be focused on. |
subtree | a logical indicating whether to show the context of theextracted subtrees. |
col | a vector of colours used to show where the subtrees are inthe main tree, or a function . |
... | further arguments passed to |
Details
This function aims at exploring very large trees. The main argument isa phylogenetic tree, and the second one is a vector or a list ofvectors specifying the tips to be focused on. The vector(s) can beeither numeric and thus taken as the indices of the tip labels, orcharacter in which case it is taken as the corresponding tip labels.
The whole tree is plotted on the left-hand side in a narrowersub-window (about a quarter of the device) without tip labels. Thesubtrees consisting of the tips in ‘focus’ are extracted and plottedon the right-hand side starting from the top left corner andsuccessively column-wise.
If the argument ‘col’ is a vector of colours, as many colours as thenumber of subtrees must be given. The alternative is to give afunction that will create colours or grey levels from the number ofsubtrees: seerainbow for some possibilitieswith colours.
Author(s)
Emmanuel Paradis
See Also
plot.phylo,drop.tip,layout,rainbow,grey
Examples
## Not run: data(chiroptera)zoom(chiroptera, 1:20, subtree = TRUE)zoom(chiroptera, grep("Plecotus", chiroptera$tip.label))zoom(chiroptera, list(grep("Plecotus", chiroptera$tip.label), grep("Pteropus", chiroptera$tip.label)))## End(Not run)