Movatterモバイル変換

Version:

5.8-1

Date:

2024-12-10

Title:

Analyses of Phylogenetics and Evolution

Depends:

R (≥ 3.2.0)

Suggests:

gee, expm, igraph, phangorn, xml2

Imports:

nlme, lattice, graphics, methods, stats, utils, parallel, Rcpp(≥ 0.12.0), digest

LinkingTo:

Rcpp

ZipData:

Description:

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

License:

GPL-2 |GPL-3

URL:

https://github.com/emmanuelparadis/ape

BugReports:

https://github.com/emmanuelparadis/ape/issues

Encoding:

UTF-8

NeedsCompilation:

yes

Packaged:

2024-12-10 17:29:50 UTC; paradis

Author:

Emmanuel Paradis

[aut, cre, cph], Simon Blomberg

[aut, cph], Ben Bolker

[aut, cph], Joseph Brown

[aut, cph], Santiago Claramunt

[aut, cph], Julien Claude

[aut, cph], Hoa Sien Cuong [aut, cph], Richard Desper [aut, cph], Gilles Didier

[aut, cph], Benoit Durand [aut, cph], Julien Dutheil

[aut, cph], RJ Ewing [aut, cph], Olivier Gascuel [aut, cph], Thomas Guillerme

[aut, cph], Christoph Heibl

[aut, cph], Anthony Ives

[aut, cph], Bradley Jones

[aut, cph], Franz Krah

[aut, cph], Daniel Lawson

[aut, cph], Vincent Lefort [aut, cph], Pierre Legendre

[aut, cph], Jim Lemon [aut, cph], Guillaume Louvel

[aut, cph], Federico Marotta [aut, cph], Eric Marcon

[aut, cph], Rosemary McCloskey

[aut, cph], Johan Nylander [aut, cph], Rainer Opgen-Rhein [aut, cph], Andrei-Alin Popescu [aut, cph], Manuela Royer-Carenzi [aut, cph], Klaus Schliep

[aut, cph], Korbinian Strimmer

[aut, cph], Damien de Vienne

[aut, cph]

Maintainer:

Emmanuel Paradis <Emmanuel.Paradis@ird.fr>

Repository:

CRAN

Date/Publication:

2024-12-16 00:00:02 UTC

Analyses of Phylogenetics and Evolution

Description

ape provides functions for reading, writing, manipulating,analysing, and simulating phylogenetic trees and DNA sequences,computing DNA distances, translating into AA sequences, estimatingtrees with distance-based methods, and a range of methods forcomparative analyses and analysis of diversification. Functionalitiesare also provided for programming new phylogenetic methods.

The complete list of functions can be displayed withlibrary(help = ape).

More information onape can be found athttps://emmanuelparadis.github.io.

Author(s)

Emmanuel Paradis, Ben Bolker, Julien Claude, Hoa Sien Cuong, RichardDesper, Benoit Durand, Julien Dutheil, Olivier Gascuel, ChristophHeibl, Daniel Lawson, Vincent Lefort, Pierre Legendre, Jim Lemon,Yvonnick Noel, Johan Nylander, Rainer Opgen-Rhein, Andrei-AlinPopescu, Klaus Schliep, Korbinian Strimmer, Damien de Vienne

Maintainer: Emmanuel Paradis <Emmanuel.Paradis@ird.fr>

References

Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.

Paradis, E., Claude, J. and Strimmer, K. (2004) APE: analyses ofphylogenetics and evolution in R language.Bioinformatics,20, 289–290.

Popescu, A.-A., Huber, K. T. and Paradis, E. (2012) ape 3.0: new toolsfor distance based phylogenetics and evolutionary analysis inR.Bioinformatics,28, 1536–1537.

Paradis, E. and Schliep, K. (2019) ape 5.0: an environment for modernphylogenetics and evolutionary analyses in R.Bioinformatics,35, 526–528.

Amino Acid Sequences

Description

These functions help to create and manipulate AA sequences.

Usage

## S3 method for class 'AAbin'print(x, ...)## S3 method for class 'AAbin'x[i, j, drop = FALSE]## S3 method for class 'AAbin'c(..., recursive = FALSE)## S3 method for class 'AAbin'rbind(...)## S3 method for class 'AAbin'cbind(..., check.names = TRUE, fill.with.Xs = FALSE,             quiet = FALSE)## S3 method for class 'AAbin'as.character(x, ...)## S3 method for class 'AAbin'labels(object, ...)## S3 method for class 'AAbin'image(x, what, col, bg = "white", xlab = "", ylab = "",      show.labels = TRUE, cex.lab = 1, legend = TRUE, grid = FALSE,      show.aa = FALSE, aa.cex = 1, aa.font = 1, aa.col = "black",      scheme = "Ape_AA",...)as.AAbin(x, ...)## S3 method for class 'character'as.AAbin(x, ...)## S3 method for class 'list'as.AAbin(x, ...)## S3 method for class 'AAString'as.AAbin(x, ...)## S3 method for class 'AAStringSet'as.AAbin(x, ...)## S3 method for class 'AAMultipleAlignment'as.AAbin(x, ...)## S3 method for class 'AAbin'as.list(x, ...)## S3 method for class 'AAbin'as.matrix(x, ...)## S3 method for class 'AAbin'as.phyDat(x, ...)dist.aa(x, pairwise.deletion = FALSE, scaled = FALSE)AAsubst(x)

Arguments

x,object

an object of class"AAbin" (or else depending onthe function).

i,j

indices of the rows and/or columns to select or todrop. They may be numeric, logical, or character (in the same way thanfor standardR objects).

drop

logical; ifTRUE, the returned object is of thelowest possible dimension.

recursive

logical; whether to go down lists and concatenate itselements.

check.names

a logical specifying whether to check the rownamesbefore binding the columns (see details).

fill.with.Xs

a logical indicating whether to keep allpossible individuals as indicating by the rownames, and eventuallyfilling the missing data with insertion gaps (ignored ifcheck.names = FALSE).

quiet

a logical to switch off warning messages when some rowsare dropped.

what

a vector of characters specifying the amino acids tovisualize. Currently, the only possible choice is to show the threecategories hydrophobic, small, and hydrophilic.

col

a vector of colours. If missing, this is set to “red”,“yellow” and “blue”.

bg

the colour used for AA codes not amongwhat (typicallyX and *).

xlab

the label for thex-axis; none by default.

ylab

Idem for they-axis. Note that by default, the labelsof the sequences are printed on they-axis (see next option).

show.labels

a logical controlling whether the sequence labels areprinted (TRUE by default).

cex.lab

a single numeric controlling the size of the sequencelabels. Usecex.axis to control the size of the annotations onthex-axis.

legend

a logical controlling whether the legend is plotted(TRUE by default).

grid

a logical controlling whether to draw a grid (FALSE bydefault).

show.aa

a logical controlling whether to show the AA symbols (FALSE bydefault).

aa.cex,aa.font,aa.col

control the aspect of the AA symbols(ignored if the previous isFALSE).

scheme

a predefined color scheme. For amino acid options are "Ape_AA","Zappo_AA", "Clustal" and "Hydrophobicity", for nucleotides "Ape_NT" and"RY_NT".

pairwise.deletion

a logical indicating whether to delete thesites with missing data in a pairwise way. The default is to delete thesites with at least one missing data for all sequences.

scaled

a logical value specifying whether to scale the number ofAA differences by the sequence length.

...

further arguments to be passed to or from other methods.

Details

These functions help to manipulate amino acid sequences of class"AAbin". These objects are stored in vectors, matrices, or listswhich can be manipulated with the usual[ operator.

There is a conversion function to and from characters.

The functiondist.aa computes the number of AA differencesbetween each pair of sequences in a matrix; this can be scaled by thesequence length. See the functiondist.ml inphangorn for evolutionary distances with AA sequences.

The functionAAsubst returns the indices of the polymorphic sites(similar toseg.sites for DNA sequences; see examples below).

The two functionscbind.AAbin andrbind.AAbin work in thesame way than the similar methods for the class"DNAbin": seecbind.DNAbin for more explanations about their respectivebehaviours.

Value

an object of class"AAbin","character","dist", or"numeric", depending on the function.

Author(s)

Emmanuel Paradis, Franz Krah

Examples

data(woodmouse)AA <- trans(woodmouse, 2)seg.sites(woodmouse)AAsubst(AA)

Tree Estimation Based on an Improved Version of the NJ Algorithm

Description

This function performs the BIONJ algorithm of Gascuel (1997).

Usage

bionj(X)

Arguments

X

a distance matrix; may be an object of class"dist".

Value

an object of class"phylo".

Author(s)

original C code by Hoa Sien Cuong and Olivier Gascuel; adapted andported toR by Vincent Lefortvincent.lefort@lirmm.fr

References

Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithmbased on a simple model of sequence data.Molecular Biology and Evolution,14:, 685–695.

Examples

### From Saitou and Nei (1987, Table 1):x <- c(7, 8, 11, 13, 16, 13, 17, 5, 8, 10, 13,       10, 14, 5, 7, 10, 7, 11, 8, 11, 8, 12,       5, 6, 10, 9, 13, 8)M <- matrix(0, 8, 8)M[lower.tri(M)] <- xM <- t(M)M[lower.tri(M)] <- xdimnames(M) <- list(1:8, 1:8)tr <- bionj(M)plot(tr, "u")### a less theoretical exampledata(woodmouse)trw <- bionj(dist.dna(woodmouse))plot(trw)

Congruence among distance matrices

Description

FunctionCADM.global compute and test the coefficient of concordance among several distance matrices through a permutation test.

FunctionCADM.post carries out a posteriori permutation tests of the contributions of individual distance matrices to the overall concordance of the group.

Use in phylogenetic analysis: to identify congruence among distance matrices (D) representing different genes or different types of data. Congruent D matrices correspond to data tables that can be used together in a combined phylogenetic or other type of multivariate analysis.

Usage

CADM.global(Dmat, nmat, n, nperm=99, make.sym=TRUE, weights=NULL,            silent=FALSE)CADM.post  (Dmat, nmat, n, nperm=99, make.sym=TRUE, weights=NULL,             mult="holm", mantel=FALSE, silent=FALSE)

Arguments

Dmat

A text file listing the distance matrices one after the other, with or without blank lines in-between. Each matrix is in the form of a square distance matrix with 0's on the diagonal.

nmat

Number of distance matrices in file Dmat.

n

Number of objects in each distance matrix. All matrices must have the same number of objects.

nperm

Number of permutations for the tests of significance.

make.sym

TRUE: turn asymmetric matrices into symmetric matrices by averaging the two triangular portions. FALSE: analyse asymmetric matrices as they are.

weights

A vector of positive weights for the distance matrices. Example: weights = c(1,2,3). NULL (default): all matrices have same weight in the calculation of W.

mult

Method for correcting P-values in multiple testing. The methods are "holm" (default), "sidak", and "bonferroni". The Bonferroni correction is overly conservative; it is not recommended. It is included to allow comparisons with the other methods.

mantel

TRUE: Mantel statistics will be computed from ranked distances, as well as permutational P-values. FALSE (default): Mantel statistics and tests will not be computed.

silent

TRUE: informative messages will not be printed, but stopping messages will. Option useful for simulation work. FALSE: informative messages will be printed.

Details

Dmat must contain two or more distance matrices, listed one after the other, all of the same size, and corresponding to the same objects in the same order. Raw data tables can be transformed into distance matrices before comparison with other such distance matrices, or with data that have been obtained as distance matrices, e.g. serological or DNA hybridization data. The distances will be transformed to ranks before computation of the coefficient of concordance and other statistics.

CADM.global tests the global null hypothesis that all matrices are incongruent. If the global null is rejected, functionCADM.post can be used to identify the concordant (H0 rejected) and discordant matrices (H0 not rejected) in the group. If a distance matrix has a negative value for theMantel.mean statistic, that matrix clearly does not belong to the group. Remove that matrix (if there are more than one, remove first the matrix that has the most strongly negative value forMantel.mean) and run the analysis again.

The corrections used for multiple testing are applied to the list of P-values (P) produced in the a posteriori tests; they take into account the number of tests (k) carried out simulatenously (number of matrices, parameternmat).

The Holm correction is computed after ordering the P-values in a list with the smallest value to the left. Compute adjusted P-values as:

P_{corr} = (k-i+1)*P

where i is the position in the ordered list. Final step: from left to right, if an adjustedP_{corr} in the ordered list is smaller than the one occurring at its left, make the smallest one equal to the largest one.

The Sidak correction is:

P_{corr} = 1 - (1 - P)^k

The Bonferonni correction is:

P_{corr} = k*P

Value

CADM.global produces a small table containing the W, Chi2, and Prob.perm statistics described in the following list.CADM.post produces a table stored in elementA_posteriori_tests, containing Mantel.mean, Prob, and Corrected.prob statistics in rows; the columns correspond to the k distance matrices under study, labeled Dmat.1 to Dmat.k.If parametermantel is TRUE, tables of Mantel statistics and P-values are computed among the matrices.

W

Kendall's coefficient of concordance, W (Kendall and Babington Smith 1939; see also Legendre 2010).

Chi2

Friedman's chi-square statistic (Friedman 1937) used in the permutation test of W.

Prob.perm

Permutational probability.

Mantel.mean

Mean of the Mantel correlations, computed on rank-transformed distances, between the distance matrix under test and all the other matrices in the study.

Prob

Permutational probabilities, uncorrected.

Corrected prob

Permutational probabilities corrected using the method selected in parametermult.

Mantel.cor

Matrix of Mantel correlations, computed on rank-transformed distances, among the distance matrices.

Mantel.prob

One-tailed P-values associated with the Mantel correlations of the previous table. The probabilities are computed in the right-hand tail. H0 is tested against the alternative one-tailed hypothesis that the Mantel correlation under test is positive. No correction is made for multiple testing.

Author(s)

Pierre Legendre, Universite de Montreal

References

Campbell, V., Legendre, P. and Lapointe, F.-J. (2009) Assessing congruence among ultrametric distance matrices.Journal of Classification,26, 103–117.

Campbell, V., Legendre, P. and Lapointe, F.-J. (2011) The performance of the Congruence Among Distance Matrices (CADM) test in phylogenetic analysis.BMC Evolutionary Biology,11, 64.

Friedman, M. (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance.Journal of the American Statistical Association,32, 675–701.

Kendall, M. G. and Babington Smith, B. (1939) The problem of m rankings.Annals of Mathematical Statistics,10, 275–287.

Lapointe, F.-J., Kirsch, J. A. W. and Hutcheon, J. M. (1999) Total evidence, consensus, and bat phylogeny: a distance-based approach.Molecular Phylogenetics and Evolution,11, 55–66.

Legendre, P. (2010) Coefficient of concordance. Pp. 164-169 in: Encyclopedia of Research Design, Vol. 1. N. J. Salkind, ed. SAGE Publications, Inc., Los Angeles.

Legendre, P. and Lapointe, F.-J. (2004) Assessing congruence among distance matrices: single malt Scotch whiskiesrevisited.Australian and New Zealand Journal of Statistics,46, 615–629.

Legendre, P. and Lapointe, F.-J. (2005) Congruence entre matrices de distance. P. 178-181 in: Makarenkov, V., G. Cucumel et F.-J. Lapointe [eds] Comptes rendus des 12emes Rencontres de la Societe Francophone de Classification, Montreal, 30 mai - 1er juin 2005.

Siegel, S. and Castellan, N. J., Jr. (1988)Nonparametric statistics for the behavioral sciences. 2nd edition. New York: McGraw-Hill.

Examples

# Examples 1 and 2: 5 genetic distance matrices computed from simulated DNA# sequences representing 50 taxa having evolved along additive trees with# identical evolutionary parameters (GTR+ Gamma + I). Distance matrices were# computed from the DNA sequence matrices using a p distance corrected with the# same parameters as those used to simulate the DNA sequences. See Campbell et# al. (2009) for details.# Example 1: five independent additive trees. Data provided by V. Campbell.data(mat5Mrand)res.global <- CADM.global(mat5Mrand, 5, 50)# Example 2: three partly similar trees, two independent trees.# Data provided by V. Campbell.data(mat5M3ID)res.global <- CADM.global(mat5M3ID, 5, 50)res.post   <- CADM.post(mat5M3ID, 5, 50, mantel=TRUE)# Example 3: three matrices respectively representing Serological# (asymmetric), DNA hybridization (asymmetric) and Anatomical (symmetric)# distances among 9 families. Data from Lapointe et al. (1999).data(mat3)res.global <- CADM.global(mat3, 3, 9, nperm=999)res.post   <- CADM.post(mat3, 3, 9, nperm=999, mantel=TRUE)# Example 4, showing how to bind two D matrices (cophenetic matrices# in this example) into a file using rbind(), then run the global test.a <- rtree(5)b <- rtree(5)A <- cophenetic(a)B <- cophenetic(b)x <- rownames(A)B <- B[x, x]M <- rbind(A, B)CADM.global(M, 2, 5)

Manipulate DNA Sequences in Bit-Level Format

Description

These functions help to manipulate DNA sequences coded in thebit-level coding scheme.

Usage

## S3 method for class 'DNAbin'print(x, printlen = 6, digits = 3, ...)## S3 method for class 'DNAbin'rbind(...)## S3 method for class 'DNAbin'cbind(..., check.names = TRUE, fill.with.gaps = FALSE,             quiet = FALSE)## S3 method for class 'DNAbin'x[i, j, drop = FALSE]## S3 method for class 'DNAbin'as.matrix(x, ...)## S3 method for class 'DNAbin'c(..., recursive = FALSE)## S3 method for class 'DNAbin'as.list(x, ...)## S3 method for class 'DNAbin'labels(object, ...)

Arguments

x,object

an object of class"DNAbin".

...

either further arguments to be passed to or from othermethods in the case ofprint,as.matrix, andlabels, or a series of objects of class"DNAbin" in thecase ofrbind,cbind, andc.

printlen

the number of labels to print (6 by default).

digits

the number of digits to print (3 by default).

check.names

a logical specifying whether to check the rownamesbefore binding the columns (see details).

fill.with.gaps

a logical indicating whether to keep allpossible individuals as indicating by the rownames, and eventuallyfilling the missing data with insertion gaps (ignored ifcheck.names = FALSE).

quiet

a logical to switch off warning messages when some rowsare dropped.

i,j

indices of the rows and/or columns to select or to drop.They may be numeric, logical, or character (in the same way than forstandardR objects).

drop

logical; ifTRUE, the returned object is of thelowest possible dimension.

recursive

for compatibility with the generic (unused).

Details

These are all ‘methods’ of generic functions which are here applied toDNA sequences stored as objects of class"DNAbin". They areused in the same way than the standardR functions to manipulatevectors, matrices, and lists. Additionally, the operators[[and$ may be used to extract a vector from a list. Note thatthe default ofdrop is not the same than the generic operator:this is to avoid dropping rownames when selecting a single sequence.

These functions are provided to manipulate easily DNA sequences codedwith the bit-level coding scheme. The latter allows much fastercomparisons of sequences, as well as storing them in less memorycompared to the format used beforeape 1.10.

Forcbind, the default behaviour is to keep only individuals(as indicated by the rownames) for which there are no missing data. Iffill.with.gaps = TRUE, a ‘complete’ matrix is returned,enventually with insertion gaps as missing data. Ifcheck.names = TRUE (the default), the rownames of each matrix are checked, andthe rows are reordered if necessary (if some rownames are duplicated,an error is returned). Ifcheck.names = FALSE, the matricesmust all have the same number of rows, and are simply binded; therownames of the first matrix are used. See the examples.

as.matrix may be used to convert DNA sequences (of the samelength) stored in a list into a matrix while keeping the names and theclass.as.list does the reverse operation.

Value

an object of class"DNAbin" in the case ofrbind,cbind, and[.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2007) A Bit-Level Coding Scheme for Nucleotides.https://emmanuelparadis.github.io/misc/BitLevelCodingScheme_20April2007.pdf

Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.

Examples

data(woodmouse)woodmouseprint(woodmouse, 15, 6)print(woodmouse[1:5, 1:300], 15, 6)### Just to show how distances could be influenced by sampling:dist.dna(woodmouse[1:2, ])dist.dna(woodmouse[1:3, ])### cbind and its options:x <- woodmouse[1:2, 1:5]y <- woodmouse[2:4, 6:10]as.character(cbind(x, y)) # gives warningas.character(cbind(x, y, fill.with.gaps = TRUE))## Not run: as.character(cbind(x, y, check.names = FALSE)) # gives an error## End(Not run)

Recode Blocks of Indels

Description

This function scans a set of aligned DNA sequences and returns amatrix with information of the localisations and lengths on alignmentgaps.

Usage

DNAbin2indel(x)

Arguments

x

an object of class"DNAbin".

Details

The output matrix has the same dimensions than the input one with,either a numeric value where an alignment gap starts giving the lengthof the gap, or zero. The rownames are kept.

Value

a numeric matrix.

Author(s)

Emmanuel Paradis

Tree Estimation Based on the Minimum Evolution Algorithm

Description

The two FastME functions (balanced and OLS) perform theminimum evolution algorithm of Desper and Gascuel (2002).

Usage

  fastme.bal(X, nni = TRUE, spr = TRUE, tbr = FALSE)  fastme.ols(X, nni = TRUE)

Arguments

X

a distance matrix; may be an object of class"dist".

nni

a logical value; TRUE to perform NNIs (default).

spr

ditto for SPRs.

tbr

ignored (see details).

Details

The code to perform topology searches based on TBR (tree bisection andreconnection) did not run correctly and has been removed after therelease ofape 5.3. A warning is issued iftbr = TRUE.

Value

an object of class"phylo".

Author(s)

original C code by Richard Desper; adapted and ported to Rby Vincent Lefortvincent.lefort@lirmm.fr

References

Desper, R. and Gascuel, O. (2002) Fast and accurate phylogenyreconstruction algorithms based on the minimum-evolution principle.Journal of Computational Biology,9, 687–705.

Examples

### From Saitou and Nei (1987, Table 1):x <- c(7, 8, 11, 13, 16, 13, 17, 5, 8, 10, 13,       10, 14, 5, 7, 10, 7, 11, 8, 11, 8, 12,       5, 6, 10, 9, 13, 8)M <- matrix(0, 8, 8)M[lower.tri(M)] <- xM <- t(M)M[lower.tri(M)] <- xdimnames(M) <- list(1:8, 1:8)tr <- fastme.bal(M)plot(tr, "u")### a less theoretical exampledata(woodmouse)trw <- fastme.bal(dist.dna(woodmouse))plot(trw)

Initialize a ‘corPhyl’ Structure Object

Description

Initialize acorPhyl correlation structure object.Does the same asInitialize.corStruct, but also checks the row names of data and builds an index.

Usage

## S3 method for class 'corPhyl'Initialize(object, data, ...)

Arguments

object

An object inheriting from classcorPhyl.

data

The data to use. If it contains rownames, they are matched with the tree tip labels, otherwise data are supposed to be in the same order than tip labels and a warning is sent.

...

some methods for this generic require additional arguments. None are used in this method.

Value

An initialized object of same class asobject.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

Theoretical Lineage-Through Time Plots

Description

This function draws the lineage-through time (LTT) plots predictedunder a speciation-extinction model (aka birth-death model) withspecified values of speciation and extinction rates (which may varywith time).

A prediction interval is plotted by default which requires to define asample size (100 by default), and different curves can be combined.

Usage

LTT(birth = 0.1, death = 0, N = 100, Tmax = 50, PI = 95,    scaled = TRUE, eps = 0.1, add = FALSE, backward = TRUE,    ltt.style = list("black", 1, 1), pi.style = list("blue", 1, 2), ...)

Arguments

birth

the speciation rate, this may be either a numeric valueor a funtion of time (namedt in the code of the function).

death

id. for the extinction rate.

N

the size of the tree.

Tmax

the age of the root of the tree.

PI

the percentage value of the prediction interval; set thisvalue to 0 to not draw this interval.

scaled

a logical values specifying whether to scale they-axis between 0 and 1.

eps

a numerical value giving the resolution of the time axis.

add

a logical values specifying whether to make a new plot (thedefault).

backward

a logical value: should the time axis be traced fromthe present (the default), or from the root of the tree?

ltt.style

a list with three elements giving the style of theLTT curve with, respectively, the colour ("col"), the linethickness ("lwd"), and the line type ("lty").

pi.style

id. for the prediction interval.

...

arguments passed toplot (e.g.,log="y").

Details

For the moment, this works well whenbirth anddeath areconstant. Some improvements are under progress for time-dependentrates (but see below for an example).

Author(s)

Emmanuel Paradis

References

Hallinan, N. (2012) The generalized time variable reconstructedbirth–death process.Journal of Theoretical Biology,300, 265–276.

Paradis, E. (2011) Time-dependent speciation and extinction fromphylogenies: a least squares approach.Evolution,65,661–672.

Paradis, E. (2015) Random phylogenies and the distribution ofbranching times.Journal of Theoretical Biology,387,39–45.

Examples

### predicted LTT plot under a Yule model with lambda = 0.1### and 50 species after 50 units of time...LTT(N = 50)### ... and with a birth-death model with the same rate of### diversification (try with N = 500):LTT(0.2, 0.1, N = 50, PI = 0, add = TRUE, ltt.style = list("red", 2, 1))### predictions under different tree sizes:layout(matrix(1:4, 2, 2, byrow = TRUE))for (N in c(50, 100, 500, 1000)) {    LTT(0.2, 0.1, N = N)    title(paste("N =", N))}layout(1)## Not run: ### speciation rate decreasing with timebirth.logis <- function(t) 1/(1 + exp(0.02 * t + 4))LTT(birth.logis)LTT(birth.logis, 0.05)LTT(birth.logis, 0.1)## End(Not run)

Most Parsimonious Reconstruction

Description

This function does ancestral character reconstruction by parsimony asdescribed in Hanazawa et al. (1995) and modified by Narushima andHanazawa (1997).

Usage

MPR(x, phy, outgroup)

Arguments

x

a vector of integers.

phy

an object of class"phylo"; the tree must beunrooted and fully dichotomous.

outgroup

an integer or a character string giving the tip ofphy used as outgroup.

Details

Hanazawa et al. (1995) and Narushima and Hanazawa (1997) used Farris's(1970) and Swofford and Maddison's (1987) framework to reconstructancestral states using parsimony. The character is assumed to takeinteger values. The algorithm finds the sets of values for each nodeas intervals with lower and upper values.

It is recommended to root the tree with the outgroup before theanalysis, so plotting the values withnodelabels issimple.

Value

a matrix of integers with two columns named “lower” and “upper”giving the lower and upper values of the reconstructed sets for eachnode.

Author(s)

Emmanuel Paradis

References

Farris, J. M. (1970) Methods for computing Wagner trees.Systematic Zoology,19, 83–92.

Hanazawa, M., Narushima, H. and Minaka, N. (1995) Generating mostparsimonious reconstructions on a tree: a generalization of theFarris–Swofford–Maddison method.Discrete AppliedMathematics,56, 245–265.

Narushima, H. and Hanazawa, M. (1997) A more efficient algorithm forMPR problems in phylogeny.Discrete Applied Mathematics,80, 231–238.

Swofford, D. L. and Maddison, W. P. (1987) Reconstructing ancestralcharacter states under Wagner parsimony.MathematicalBiosciences,87, 199–229.

Examples

## the example in Narushima and Hanazawa (1997):tr <- read.tree(text = "(((i,j)c,(k,l)b)a,(h,g)e,f)d;")x <- c(1, 3, 0, 6, 5, 2, 4)names(x) <- letters[6:12](o <- MPR(x, tr, "f"))plot(tr)nodelabels(paste0("[", o[, 1], ",", o[, 2], "]"))tiplabels(x[tr$tip.label], adj = -2)## some random data:x <- rpois(30, 1)tr <- rtree(30, rooted = FALSE)MPR(x, tr, "t1")

Moran's I Autocorrelation Index

Description

This function computes Moran's I autocorrelation coefficient ofx giving a matrix of weights using the method described byGittleman and Kot (1990).

Usage

  Moran.I(x, weight, scaled = FALSE, na.rm = FALSE,          alternative = "two.sided")

Arguments

x

a numeric vector.

weight

a matrix of weights.

scaled

a logical indicating whether the coefficient should bescaled so that it varies between -1 and +1 (default toFALSE).

na.rm

a logical indicating whether missing values should beremoved.

alternative

a character string specifying the alternativehypothesis that is tested against the null hypothesis of nophylogenetic correlation; must be of one "two.sided", "less", or"greater", or any unambiguous abbrevation of these.

Details

The matrixweight is used as “neighbourhood” weights, andMoran's I coefficient is computed using the formula:

I = \frac{n}{S_0} \frac{\sum_{i=1}^n\sum_{j=1}^n w_{i,j}(y_i - \overline{y})(y_j - \overline{y})}{\sum_{i=1}^n {(y_i -\overline{y})}^2}

with

y_i = observations
w_{i,j} = distance weight
n = number of observations
S_0 =\sum_{i=1}^n\sum_{j=1}^n wij

The null hypothesis of no phylogenetic correlation is tested assumingnormality of I under this null hypothesis. If the observed valueof I is significantly greater than the expected value, then the valuesofx are positively autocorrelated, whereas if Iobserved <Iexpected, this will indicate negative autocorrelation.

Value

A list containing the elements:

observed

the computed Moran's I.

expected

the expected value of I under the null hypothesis.

sd

the standard deviation of I under the null hypothesis.

p.value

the P-value of the test of the null hypothesis againstthe alternative hypothesis specified inalternative.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de andEmmanuel Paradis

References

Gittleman, J. L. and Kot, M. (1990) Adaptation: statistics and a nullmodel for estimating phylogenetic effects.Systematic Zoology,39, 227–241.

Examples

tr <- rtree(30)x <- rnorm(30)## weights w[i,j] = 1/d[i,j]:w <- 1/cophenetic(tr)## set the diagonal w[i,i] = 0 (instead of Inf...):diag(w) <- 0Moran.I(x, w)Moran.I(x, w, alt = "l")Moran.I(x, w, alt = "g")Moran.I(x, w, scaled = TRUE) # usualy the same

Construction of Consensus Distance Matrix With SDM

Description

This function implements the SDM method of Criscuolo et al. (2006) fora set of n distance matrices.

Usage

SDM(...)

Arguments

...

2n elements (with n > 1), the first n elements are thedistance matrices: these can be (symmetric) matrices, objects ofclass"dist", or a mix of both. The next n elements are thesequence length from which the matrices have been estimated (can beseen as a degree of confidence in matrices).

Details

Reconstructs a consensus distance matrix from a set of input distancematrices on overlapping sets of taxa. Potentially missing values inthe supermatrix are represented byNA. An error is returned ifthe input distance matrices can not resolve to a consensus matrix.

Value

a 2-element list containing a distance matrix labelled by the union ofthe set of taxa of the input distance matrices, and a variance matrixassociated to the returned distance matrix.

Author(s)

Andrei Popescu

References

Criscuolo, A., Berry, V., Douzery, E. J. P. , and Gascuel, O. (2006)SDM: A fast distance-based approach for (super)tree building inphylogenomics.Systematic Biology,55, 740–755.

Ancestral Character Estimation

Description

ace estimates ancestral character states, and the associateduncertainty, for continuous and discrete characters. Ifmarginal = TRUE, a marginal estimation procedure is used. With this method,the likelihood values at a given node are computed using only theinformation from the tips (and branches) descending from this node.

The present implementation of marginal reconstruction for discretecharacters does not calculate the most likely state for each node,integrating over all the possible states, over all the other nodes inthe tree, in proportion to their probability. For more details, seethe Note below.

logLik,deviance, andAIC are generic functionsused to extract the log-likelihood, the deviance, or the Akaikeinformation criterion of a fitted object. If no such values areavailable,NULL is returned.

anova is another generic function which is used to comparenested models: the significance of the additional parameter(s) istested with likelihood ratio tests. You must ensure that the modelsare effectively nested (if they are not, the results will bemeaningless). It is better to list the models from the smallest to thelargest.

Usage

ace(x, phy, type = "continuous", method = if (type == "continuous")   "REML" else "ML", CI = TRUE,    model = if (type == "continuous") "BM" else "ER",    scaled = TRUE, kappa = 1, corStruct = NULL, ip = 0.1,    use.expm = FALSE, use.eigen = TRUE, marginal = FALSE)## S3 method for class 'ace'print(x, digits = 4, ...)## S3 method for class 'ace'logLik(object, ...)## S3 method for class 'ace'deviance(object, ...)## S3 method for class 'ace'AIC(object, ..., k = 2)## S3 method for class 'ace'anova(object, ...)

Arguments

x

a vector or a factor; an object of class"ace" in thecase ofprint.

phy

an object of class"phylo".

type

the variable type; either"continuous" or"discrete" (or an abbreviation of these).

method

a character specifying the method used forestimation. Four choices are possible:"ML","REML","pic", or"GLS".

CI

a logical specifying whether to return the 95% confidenceintervals of the ancestral state estimates (for continuouscharacters) or the likelihood of the different states (for discreteones).

model

a character specifying the model (ignored ifmethod = "GLS"), or a numeric matrix iftype = "discrete" (seedetails).

scaled

a logical specifying whether to scale the contrastestimate (used only ifmethod = "pic").

kappa

a positive value giving the exponent transformation ofthe branch lengths (see details).

corStruct

ifmethod = "GLS", specifies the correlationstructure to be used (this also gives the assumed model).

ip

the initial value(s) used for the ML estimation procedurewhentype == "discrete" (possibly recycled).

use.expm

a logical specifying whether to use the packageexpm to compute the matrix exponential (relevant only iftype = "d"). IfFALSE, the functionmatexpofromape is used (see details). This option is ignored ifuse.eigen = TRUE (see next).

use.eigen

a logical (relevant iftype = "d"); ifTRUE then the probability matrix is computed with an eigendecomposition instead of a matrix exponential (see details).

marginal

a logical (relevant iftype = "d"). By default,the joint reconstruction of the ancestral states are done. Set thisoption toTRUE if you want the marginal reconstruction (seedetails.)

digits

the number of digits to be printed.

object

an object of class"ace".

k

a numeric value giving the penalty per estimated parameter;the default isk = 2 which is the classical Akaikeinformation criterion.

...

further arguments passed to or from other methods.

Details

Iftype = "continuous", the default model is Brownian motionwhere characters evolve randomly following a random walk. This modelcan be fitted by residual maximum likelihood (the default), maximumlikelihood (Felsenstein 1973, Schluter et al. 1997), least squares(method = "pic", Felsenstein 1985), or generalized leastsquares (method = "GLS", Martins and Hansen 1997, Cunningham etal. 1998). In the last case, the specification ofphy andmodel are actually ignored: it is instead given through acorrelation structure with the optioncorStruct.

In the settingmethod = "ML" andmodel = "BM" (this usedto be the default untilape 3.0-7) the maximum likelihoodestimation is done simultaneously on the ancestral values and thevariance of the Brownian motion process; these estimates are then usedto compute the confidence intervals in the standard way. The REMLmethod first estimates the ancestral value at the root (aka, thephylogenetic mean), then the variance of the Brownian motion processis estimated by optimizing the residual log-likelihood. The ancestralvalues are finally inferred from the likelihood function giving thesetwo parameters. Ifmethod = "pic" or"GLS", theconfidence intervals are computed using the expected variances underthe model, so they depend only on the tree.

It could be shown that, with a continous character, REML results inunbiased estimates of the variance of the Brownian motion processwhile ML gives a downward bias. Therefore the former is recommanded.

For discrete characters (type = "discrete"), only maximumlikelihood estimation is available (Pagel 1994) (seeMPRfor an alternative method). The model is specified through a numericmatrix with integer values taken as indices of the parameters. Thenumbers of rows and of columns of this matrix must be equal, and aretaken to give the number of states of the character. For instance,matrix(c(0, 1, 1, 0), 2) will represent a model with twocharacter states and equal rates of transition,matrix(c(0, 1, 2, 0), 2) a model with unequal rates,matrix(c(0, 1, 1, 1, 0, 1, 1, 1, 0), 3) a model with three states and equal rates oftransition (the diagonal is always ignored). There are short-cuts tospecify these models:"ER" is an equal-rates model (e.g., thefirst and third examples above),"ARD" is anall-rates-different model (the second example), and"SYM" is asymmetrical model (e.g.,matrix(c(0, 1, 2, 1, 0, 3, 2, 3, 0), 3)). If a short-cut is used, the number of states is determined fromthe data.

By default, the likelihood of the different ancestral states ofdiscrete characters are computed with a joint estimation procedureusing a procedure similar to the one described in Pupko et al. (2000).Ifmarginal = TRUE, a marginal estimation procedure is used(this was the only choice untilape 3.1-1). With this method,the likelihood values at a given node are computed using only theinformation from the tips (and branches) descending from this node.With the joint estimation, all information is used for each node. Thedifference between these two methods is further explained inFelsenstein (2004, pp. 259-260) and in Yang (2006, pp. 121-126). Thepresent implementation of the joint estimation uses a “two-pass”algorithm which is much faster than stochastic mapping while theestimates of both methods are very close.

With discrete characters it is necessary to compute the exponential ofthe rate matrix. The only possibility untilape 3.0-7 was thefunctionmatexpo inape. Ifuse.expm = TRUEanduse.eigen = FALSE, the functionexpm,in the package of the same name, is used.matexpo is faster butquite inaccurate for large and/or asymmetric matrices. In case ofdoubt, use the latter. Sinceape 3.0-10, it is possible to usean eigen decomposition avoiding the need to compute the matrixexponential; see details in Lebl (2013, sect. 3.8.3). This is muchfaster and is now the default.

Since version 5.2 ofape,ace can take state uncertaintyfor discrete characters into account: this should be coded withR'sNA only. More details:

https://www.mail-archive.com/r-sig-phylo@r-project.org/msg05286.html

Value

an object of class"ace" with the following elements:

ace

iftype = "continuous", the estimates of theancestral character values.

CI95

iftype = "continuous", the estimated 95%confidence intervals.

sigma2

iftype = "continuous",model = "BM", andmethod = "ML", the maximum likelihood estimate of theBrownian parameter.

rates

iftype = "discrete", the maximum likelihoodestimates of the transition rates.

se

iftype = "discrete", the standard-errors ofestimated rates.

index.matrix

iftype = "discrete", gives the indices oftherates in the rate matrix.

loglik

ifmethod = "ML", the maximum log-likelihood.

lik.anc

iftype = "discrete", the scaled likelihoods ofeach ancestral state.

call

the function call.

Note

Liam Revell points out that for discrete characters the ancestrallikelihood values returned withmarginal = FALSE are actuallythe marginal estimates, while settingmarginal = TRUE returnsthe conditional (scaled) likelihoods of the subtree:

http://blog.phytools.org/2015/05/about-how-acemarginaltrue-does-not.html

Author(s)

Emmanuel Paradis, Ben Bolker

References

Cunningham, C. W., Omland, K. E. and Oakley, T. H. (1998)Reconstructing ancestral character states: a criticalreappraisal.Trends in Ecology & Evolution,13,361–366.

Felsenstein, J. (1973) Maximum likelihood estimationof evolutionary trees from continuous characters.AmericanJournal of Human Genetics,25, 471–492.

Felsenstein, J. (1985) Phylogenies and the comparativemethod.American Naturalist,125, 1–15.

Felsenstein, J. (2004)Inferring Phylogenies. Sunderland:Sinauer Associates.

Lebl, J. (2013)Notes on Diffy Qs: Differential Equations forEngineers.https://www.jirka.org/diffyqs/.

Martins, E. P. and Hansen, T. F. (1997) Phylogenies and thecomparative method: a general approach to incorporating phylogeneticinformation into the analysis of interspecific data.AmericanNaturalist,149, 646–667.

Pagel, M. (1994) Detecting correlated evolution on phylogenies: ageneral method for the comparative analysis of discretecharacters.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,255, 37–45.

Pupko, T., Pe'er, I, Shamir, R., and Graur, D. (2000) A fast algorithmfor joint reconstruction of ancestral amino acid sequences.Molecular Biology and Evolution,17, 890–896.

Schluter, D., Price, T., Mooers, A. O. and Ludwig, D. (1997)Likelihood of ancestor states in adaptive radiation.Evolution,51, 1699–1711.

Yang, Z. (2006)Computational Molecular Evolution. Oxford:Oxford University Press.

Examples

### Some random data...data(bird.orders)x <- rnorm(23)### Compare the three methods for continuous characters:ace(x, bird.orders)ace(x, bird.orders, method = "pic")ace(x, bird.orders, method = "GLS",    corStruct = corBrownian(1, bird.orders))### For discrete characters:x <- factor(c(rep(0, 5), rep(1, 18)))ans <- ace(x, bird.orders, type = "d")#### Showing the likelihoods on each node:plot(bird.orders, type = "c", FALSE, label.offset = 1)co <- c("blue", "yellow")tiplabels(pch = 22, bg = co[as.numeric(x)], cex = 2, adj = 1)nodelabels(thermo = ans$lik.anc, piecol = co, cex = 0.75)

Add a Scale Bar to a Phylogeny Plot

Description

This function adds a horizontal bar giving the scale of the branchlengths to a plot of a phylogenetic tree on the current graphicaldevice.

Usage

add.scale.bar(x, y, length = NULL, ask = FALSE,              lwd = 1, lcol = "black", ...)

Arguments

x

x location of the bar (can be left missing).

y

y location of the bar (can be left missing).

length

a numeric value giving the length of the scale bar. Ifnone is supplied, a value is calculated from the data.

ask

a logical; ifTRUE the user is asked to click whereto draw the bar. The default isFALSE.

lwd

the width of the bar.

lcol

the colour of the bar (usecol for the colour ofthe text).

...

further arguments to be passed totext.

Details

By default, the bar is placed in a corner of the graph depending onthe direction of the tree. Otherwise bothx andy mustbe specified (if only one is given it is ignored).

The further arguments (...) are used to format the text. Theymay befont,cex,col, and so on (see examplesbelow, and the help page ontext).

The functionlocator may be used todetermine thex andy arguments.

Author(s)

Emmanuel Paradis

Examples

tr <- rtree(10)layout(matrix(1:2, 2, 1))plot(tr)add.scale.bar()plot(tr)add.scale.bar(cex = 0.7, font = 2, col = "red")layout(1)

Incomplete Distance Matrix Filling

Description

Fills missing entries from incomplete distance matrix using theadditive or the ultrametric procedure (see reference for details).

Usage

additive(X)ultrametric(X)

Arguments

X

a distance matrix or an object of class"dist".

Value

a distance matrix.

Author(s)

Andrei Popescu

References

Makarenkov, V. and Lapointe, F.-J. (2004) A weighted least-squaresapproach for inferring phylogenies from incomplete distancematrices.Bioinformatics,20, 2113–2121.

Alignment Explorer With Multiple Devices

Description

This function helps to explore DNA alignments by zooming in. The userclicks twice defining the opposite corners of the portion which isextracted and drawned on a new window.

Usage

alex(x, ...)

Arguments

x

an object of class"DNAbin".

...

further arguments to pass toimage.DNAbin.

Details

This function works with a DNA alignment (freshly) plotted on aninteractive graphical device (i.e., not a file) withimage.After callingalex, the user clicks twice defining a rectanglein the alignment, then this portion of the alignment is extacted andplotted on anew window. The user can click as many times onthe alignment. The process is stopped by a right-click. If the userclicks twice outside the alignment, a message “Try again!” isprinted.

Each timealex is called, the alignment is plotted on a newwindow without closing or deleting those possibly already plotted.

In all cases, the device wherex is plotted is the activewindow after the operation. It shouldnot be closed during thewhole process.

Value

NULL

Author(s)

Emmanuel Paradis

Examples

## Not run: data(woodmouse)image(woodmouse)alex(woodmouse)## End(Not run)

Compare DNA Sets

Description

Comparison of DNA sequence sets, particularly when aligned.

Usage

## S3 method for class 'DNAbin'all.equal(target, current, plot = FALSE, ...)

Arguments

target,current

the two sets of sequences to be compared.

plot

a logical value specifying whether to plot the sites thatare different (only if the labels of both alignments are the same).

...

further arguments passed toimage.DNAbin.

Details

If the two sets of DNA sequences are exactly identical, this functionreturnsTRUE. Otherwise, a detailed comparison is made only ifthe labels (i.e., rownames) oftarget andcurrent are thesame (possibly in different orders). In all other cases, a briefdescription of the differences is returned (sometimes withrecommendations to make further comparisons).

This function can be used for testing in programs usingisTRUE (see examples below).

Value

TRUE if the two sets are identical; a list with two elements(message and different.sites) if a detailed comparison is done; or avector of mode character.

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)woodm2 <- woodmousewoodm2[1, c(1:5, 10:12, 30:40)] <- as.DNAbin("g")res <- all.equal(woodmouse, woodm2, plot = TRUE)str(res)## if used for testing in R programs:isTRUE(all.equal(woodmouse, woodmouse)) # TRUEisTRUE(all.equal(woodmouse, woodm2)) # FALSEall.equal(woodmouse, woodmouse[15:1, ])all.equal(woodmouse, woodmouse[-1, ])all.equal(woodmouse, woodmouse[, -1])## Not run: ## To run the followings you need internet and Clustal and MUSCLE## correctly installed.## Data from Johnson et al. (2006, Science)refs <- paste("DQ082", 505:545, sep = "")DNA <- read.GenBank(refs)DNA.clustal <- clustal(DNA)DNA.muscle <- muscle(DNA)isTRUE(all.equal(DNA.clustal, DNA.muscle)) # FALSEall.equal(DNA.clustal, DNA.muscle, TRUE)## End(Not run)

Global Comparison of two Phylogenies

Description

This function makes a global comparison of two phylogenetic trees.

Usage

## S3 method for class 'phylo'all.equal(target, current, use.edge.length = TRUE,                   use.tip.label = TRUE, index.return = FALSE,                   tolerance = .Machine$double.eps ^ 0.5,                   scale = NULL, ...)

Arguments

target

an object of class"phylo".

current

an object of class"phylo".

use.edge.length

ifFALSE only the topologies arecompared; the default isTRUE.

use.tip.label

ifFALSE the unlabelled trees arecompared; the default isTRUE.

index.return

ifTRUE the function returns a two-columnmatrix giving the correspondence between the nodes of both trees.

tolerance

the numeric tolerance used to compare the branchlengths.

scale

a positive number, comparison of branch lengths is madeafter scaling (i.e., dividing) them by this number.

...

further arguments passed to or from other methods.

Details

This function is meant to be an adaptation of the generic functionall.equal for the comparison of phylogenetic trees.

A single phylogenetic tree may have several representations in the Newickformat and in the"phylo" class of objects used in ‘ape’. Oneaim of the present function is to be able to identify whether twoobjects of class"phylo" represent the same phylogeny.

Value

A logical value, or a two-column matrix.

Note

The algorithm used here does not work correctly for the comparison oftopologies (i.e., ignoring tip labels) of unrooted trees. This alsoaffectsunique.multiPhylo which calls the present function. See:

https://www.mail-archive.com/r-sig-phylo@r-project.org/msg01445.html.

Author(s)

Benoît Durandb.durand@alfort.AFSSA.FR

Examples

### maybe the simplest example of two representations### for the same rooted tree...:t1 <- read.tree(text = "(a:1,b:1);")t2 <- read.tree(text = "(b:1,a:1);")all.equal(t1, t2)### ... compare with this:identical(t1, t2)### one just slightly more complicated...:t3 <- read.tree(text = "((a:1,b:1):1,c:2);")t4 <- read.tree(text = "(c:2,(a:1,b:1):1);")all.equal(t3, t4) # == all.equal.phylo(t3, t4)### ... here we force the comparison as lists:all.equal.list(t3, t4)

Print DNA or AA Sequence Alignement

Description

This function displays in the console or a file an alignment of DNA orAAsequences. The first sequence is printed on the first row and thebases of the other sequences are replaced by dots if they areidentical with the first sequence.

Usage

alview(x, file = "", uppercase = TRUE, showpos = TRUE)

Arguments

x

a matrix or a list of DNA sequences (class"DNAbin")or a matrix of AA sequences (class"AAbin").

file

a character string giving the name of the file where to print the sequences; by default, they are printed in the console.

uppercase

a logical specifying whether to print the bases asuppercase letters.

showpos

either a logical value specifying whether to displaythe site positions, or a numeric vector giving these positions (seeexamples).

Details

The first line of the output shows the position of the last column of the printed alignment.

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)alview(woodmouse[, 1:50])alview(woodmouse[, 1:50], uppercase = FALSE)## display only some sites:j <- c(10, 49, 125, 567) # just randomx <- woodmouse[, j]alview(x, showpos = FALSE) # no site position displayedalview(x, showpos = j)## Not run: alview(woodmouse, file = "woodmouse.txt")## End(Not run)

Internal Ape Functions

Description

Internalape functions which are undocumented but still exportedbecause called by other packages. Use with care!

Tools to Explore Files

Description

These functions help to find files on the local disk.

Usage

Xplorefiles(from = "HOME", recursive = TRUE, ignore.case = TRUE)editFileExtensions()bydir(x)Xplor(from = "HOME")

Arguments

from

the directory where to start the file search; by default,the ‘HOME’ directory. Usefrom = getwd() to start from thecurrent working directory.

recursive

whether to search the subdirectories;TRUE bydefault.

ignore.case

whether to ignore the case of the file extensions;TRUE by default.

x

a list returned byXplorefiles.

Details

Xplorefiles looks for all files with a specified extension intheir names. The default is to look for the following file types:CLUSTAL (.aln), FASTA (.fas, .fasta), FASTQ (.fq, .fastq), NEWICK(.nwk, .newick, .tre, .tree), NEXUS (.nex, .nexus), and PHYLIP(.phy). This list can be modified witheditFileExtensions.

bydir sorts the list of files by directories.

Xplor combines the other operations and opens the results ina Web browser with clickable links to the directories and files.

Value

Xplorefiles returns a list.bydir prints the filelistings on the console.

Author(s)

Emmanuel Paradis

Examples

## Not run: x <- Xplorefiles()x # all data files on your diskbydir(x) # sorted by directoriesbydir(x["fasta"]) # only the FASTA filesXplorefiles(getwd(), recursive = FALSE) # look only in current dirXplor()## End(Not run)

Conversion Among DNA Sequence Internal Formats

Description

These functions transform a set of DNA sequences among variousinternal formats.

Usage

as.alignment(x)as.DNAbin(x, ...)## S3 method for class 'character'as.DNAbin(x, ...)## S3 method for class 'list'as.DNAbin(x, ...)## S3 method for class 'alignment'as.DNAbin(x, ...)## S3 method for class 'DNAString'as.DNAbin(x, ...)## S3 method for class 'DNAStringSet'as.DNAbin(x, ...)## S3 method for class 'PairwiseAlignmentsSingleSubject'as.DNAbin(x, ...)## S3 method for class 'DNAMultipleAlignment'as.DNAbin(x, ...)## S3 method for class 'DNAbin'as.character(x, ...)

Arguments

x

a matrix or a list containing the DNA sequences, or an objectof class"alignment".

...

further arguments to be passed to or from other methods.

Details

Foras.alignment, the sequences given as argument should bestored as matrices or lists of single-character strings (the formatused inape before version 1.10). The returned object is in theformat used in the packageseqinr to store aligned sequences.

as.DNAbin is a generic function with methods so that it workswith sequences stored into vectors, matrices, or lists. It can convertsome S4 classes from the packageBiostrings in BioConductor. Forconsistency withinape, this uses an S3-style syntax. To convertobjects of class"DNAStringSetList", see the examples.

as.character is a generic function: the present methodconverts objects of class"DNAbin" into the format usedbeforeape 1.10 (matrix of single characters, or list of vectorsof single characters). This function must be used first to convertobjects of class"DNAbin" into the class"alignment".

Value

an object of class"alignment" in the case of"as.alignment"; an object of class"DNAbin" in the caseof"as.DNAbin"; a matrix of mode character or a list containingvectors of mode character in the case of"as.character".

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)x <- as.character(woodmouse)x[, 1:20]str(as.alignment(x))identical(as.DNAbin(x), woodmouse)### conversion from BioConductor:## Not run: if (require(Biostrings)) {data(phiX174Phage)X <- as.DNAbin(phiX174Phage)## base frequencies:base.freq(X) # from apealphabetFrequency(phiX174Phage) # from Biostrings### for objects of class "DNAStringSetList"X <- lapply(x, as.DNAbin) # a list of lists### to put all sequences in a single list:X <- unlist(X, recursive = FALSE)class(X) <- "DNAbin"}## End(Not run)

Split Frequencies and Conversion Among Split Classes

Description

bitsplits returns the bipartitions (aka splits) for a singletree or a list of trees. If at least one tree is rooted, an error isreturned.

countBipartitions returns the frequencies of the bipartitionsfrom a reference tree (phy) observed in a list of trees (X), all unrooted.

as.bitsplits andas.prop.part are generic functions forconverting between the"bitsplits" and"prop.part"classes.

Usage

bitsplits(x)countBipartitions(phy, X)as.bitsplits(x)## S3 method for class 'prop.part'as.bitsplits(x)## S3 method for class 'bitsplits'print(x, ...)## S3 method for class 'bitsplits'sort(x, decreasing = FALSE, ...)as.prop.part(x, ...)## S3 method for class 'bitsplits'as.prop.part(x, include.trivial = FALSE, ...)

Arguments

x

an object of the appropriate class.

phy

an object of class"phylo".

X

an object of class"multiPhylo".

decreasing

a logical value to sort the bipartitions inincreasing (the default) or decreasing order of their frequency.

include.trivial

a logical value specifying whether to includethe trivial split with all tips in the returned object.

...

further arguments passed to or from other methods.

Details

These functions count bipartitions as defined by internal branches, sothey work only with unrooted trees. The structure of the class"bitsplits" is described in a separate document on ape's website.

This data structure has a memory requirement proportional ton^2, so it can be inefficient with large trees (> 1000 tips),particularly if they are very different (i.e., with few sharedsplits). In any case, an error occurs if the product of the number oftips by the number of nodes is greater than2^{31}-1 (~2.1billion). A warning message is given if the tree(s) has(ve) more than46,341 tips. It may happen that the search for splits is interruptedif the data structure is full (with a warning message).

Value

bitsplits,as.bitsplits, andsort return an objectof class"bitsplits".

countBipartitions returns a vector of integers.

as.prop.part returns an object of class"prop.part".

Author(s)

Emmanuel Paradis

Examples

tr <- rtree(20)pp <- prop.part(tr)as.bitsplits(pp)## works only with unrooted trees (ape 5.5):countBipartitions(rtree(10, rooted = FALSE), rmtree(100, 10, rooted = FALSE))

Conversion Between Phylo and Matching Objects

Description

These functions convert objects between the classes"phylo" and"matching".

Usage

as.matching(x, ...)## S3 method for class 'phylo'as.matching(x, labels = TRUE, ...)## S3 method for class 'matching'as.phylo(x, ...)

Arguments

x

an object to convert as an object of class"matching"or of class"phylo".

labels

a logical specifying whether the tip and node labelsshould be included in the returned matching.

...

further arguments to be passed to or from other methods.

Details

A matching is a representation where each tip and each node are givena number, and sibling groups are grouped in a “matching pair” (seeDiaconis and Holmes 1998, for details). This coding system can be usedonly for binary (fully dichotomous) trees.

Diaconis and Holmes (1998) gave some conventions to insure that agiven tree has a unique representation as a matching. I have tried tofollow them in the present functions.

Value

as.matching returns an object of class"matching" withthe following component:

matching

a two-column numeric matrix where the columnsrepresent the sibling pairs.

tip.label

(optional) a character vector giving the tip labelswhere the ith element is the label of the tip numbered i inmatching.

node.label

(optional) a character vector giving the nodelabels in the same order than inmatching (i.e. the ithelement is the label of the node numbered i + n inmatching,with n the number of tips).

as.phylo.matching returns an object of class"phylo".

Note

Branch lengths are not supported in the present version.

Author(s)

Emmanuel Paradis

References

Diaconis, P. W. and Holmes, S. P. (1998) Matchings and phylogenetictrees.Proceedings of the National Academy of Sciences USA,95, 14600–14602.

Examples

data(bird.orders)m <- as.matching(bird.orders)str(m)mtr <- as.phylo(m)all.equal(tr, bird.orders, use.edge.length = FALSE)

Conversion Among Tree and Network Objects

Description

as.phylo is a generic function which converts an object into atree of class"phylo". There are currently two methods forobjects of class"hclust" and of class"phylog"(implemented in the packageade4). The default method is for anyobject inheriting the class"phylo" which is returned unchanged.

as.hclust.phylo is a method of the genericas.hclust which converts an object of class"phylo" into one of class"hclust". This can used toconvert an object of class"phylo" into one of class"dendrogram" (see examples).

as.network andas.igraph convert trees of class"phylo" into these respective classes defined in the packagesof the same names (where the generics are defined).

old2new.phylo andnew2old.phylo are utility functionsfor converting between the old and new coding of the class"phylo".

Usage

as.phylo(x, ...)## Default S3 method:as.phylo(x, ...)## S3 method for class 'hclust'as.phylo(x, ...)## S3 method for class 'phylog'as.phylo(x, ...)## S3 method for class 'phylo'as.hclust(x, ...)old2new.phylo(phy)new2old.phylo(phy)## S3 method for class 'phylo'as.network(x, directed = is.rooted(x), ...)## S3 method for class 'phylo'as.igraph(x, directed = is.rooted(x), use.labels = TRUE, ...)

Arguments

x

an object to be converted into another class.

directed

a logical value: should the network be directed? Bydefault, this depends on whether the tree is rooted or not.

use.labels

a logical specifying whether to use labels to buildthe network of class"igraph". IfTRUE and the treehas no node labels, then some default labels are created first. IfFALSE, the network is built with integers.

...

further arguments to be passed to or from other methods.

phy

an object of class"phylo".

Value

An object of class"hclust","phylo","network",or"igraph".

Note

In an object of class"hclust", theheight gives thedistance between the two sets that are being agglomerated. So thesedistances are divided by two when setting the branch lengths of aphylogenetic tree.

Author(s)

Emmanuel Paradis

Examples

data(bird.orders)hc <- as.hclust(bird.orders)tr <- as.phylo(hc)all.equal(bird.orders, tr) # TRUE### shows the three plots for tree objects:dend <- as.dendrogram(hc)layout(matrix(c(1:3, 3), 2, 2))plot(bird.orders, font = 1)plot(hc)par(mar = c(8, 0, 0, 0)) # leave space for the labelsplot(dend)### how to get identical plots with### plot.phylo and plot.dendrogram:layout(matrix(1:2, 2, 1))plot(bird.orders, font = 1, no.margin = TRUE, label.offset = 0.4)par(mar = c(0, 0, 0, 8))plot(dend, horiz = TRUE)layout(1)## Not run: ### convert into networks:if (require(network)) {    x <- as.network(rtree(10))    print(x)    plot(x, vertex.cex = 1:4)    plot(x, displaylabels = TRUE)}tr <- rtree(5)if (require(igraph)) {    print((x <- as.igraph(tr)))    plot(x)    print(as.igraph(tr, TRUE, FALSE))    print(as.igraph(tr, FALSE, FALSE))}## End(Not run)

Conversion from Taxonomy Variables to Phylogenetic Trees

Description

The functionas.phylo.formula (short formas.phylo)builds a phylogenetic tree (an object of classphylo) froma set of nested taxonomic variables.

Usage

## S3 method for class 'formula'as.phylo(x, data = parent.frame(), collapse = TRUE, ...)

Arguments

x

a right-side formula describing the taxonomic relationship:~C1/C2/.../Cn.

data

the data.frame where to look for the variables (defaultto user's workspace).

collapse

a logical value specifying whether to collapse singlenodes in the returned tree (see details).

...

further arguments to be passed from other methods.

Details

Taxonomic variables must be nested and passed in the correct order:the higher clade must be on the left of the formula, for instance~Order/Family/Genus/Species. In most cases, the resulting treewill be unresolved and will contain polytomies.

The optioncollapse = FALSE has for effect to add single nodesin the tree when a given higher level has only one element in thelevel below (e.g., a monospecific genus); see the example below.

Value

an object of class"phylo".

Author(s)

Julien Dutheildutheil@evolbio.mpg.de, Eric Marcon andKlaus Schliep

Examples

data(carnivora)frm <- ~SuperFamily/Family/Genus/Speciestr <- as.phylo(frm, data = carnivora, collapse=FALSE)tr$edge.length <- rep(1, nrow(tr$edge))plot(tr, show.node.label=TRUE)Nnode(tr)## compare with:Nnode(as.phylo(frm, data = carnivora, collapse = FALSE))

Axis on Side of Phylogeny

Description

This function adds a scaled axis on the side of a phylogeny plot.

Usage

axisPhylo(side = NULL, root.time = NULL, backward = TRUE, ...)

Arguments

side

a numeric value specifying the side where the axis isplotted: 1: below, 2: left, 3: above, 4: right. By default, this is taken from the direction of the plot.

root.time

the time assigned to the root node of the tree. Bydefault, this is taken from theroot.time element of thetree. If it is absent, this is determined from the next option.

backward

a logical value; if TRUE, the most distant tip fromthe root is considered as the origin of the time scale; if FALSE,this is the root node.

...

further arguments to be passed toaxis.

Details

The further arguments (...) are used to format the axis. Theymay befont,cex,col,las, and so on (seethe help pages onaxis andpar).

Author(s)

Emmanuel Paradis, Klaus Schliep

Examples

tr <- rtree(30)ch <- rcoal(30)plot(ch)axisPhylo()plot(tr, "c", FALSE, direction = "u")axisPhylo(las = 1)

Balance of a Dichotomous Phylogenetic Tree

Description

This function computes the balance of a phylogenetic tree, that is foreach node of the tree the numbers of descendants (i.e. tips) on eachof its daughter-branch. The tree must be fully dichotomous.

Usage

balance(phy)

Arguments

phy

an object of class"phylo".

Value

a numeric matrix with two columns and one row for each node of thetree. The columns give the numbers of descendants on eachdaughter-branches (the order of both columns being arbitrary). If thephylogenyphy has an elementnode.label, this is used asrownames for the returned matrix; otherwise the numbers (of modecharacter) of the matrixedge ofphy are used as rownames.

Author(s)

Emmanuel Paradis

References

Aldous, D. J. (2001) Stochastic models and descriptive statistics forphylogenetic trees, from Yule to today.Statistical Science,16, 23–34.

Base frequencies from DNA Sequences

Description

base.freq computes the frequencies (absolute or relative) ofthe four DNA bases (adenine, cytosine, guanine, and thymidine) from asample of sequences.

GC.content computes the proportion of G+C (using the previousfunction). All missing or unknown sites are ignored.

Ftab computes the contingency table with the absolutefrequencies of the DNA bases from a pair of sequences.

Usage

base.freq(x, freq = FALSE, all = FALSE)GC.content(x)Ftab(x, y = NULL)

Arguments

x

a vector, a matrix, or a list which contains the DNAsequences.

y

a vector with a single DNA sequence.

freq

a logical specifying whether to return the proportions(the default) or the absolute frequencies (counts).

all

a logical; by default only the counts of A, C, G, and T arereturned. Ifall = TRUE, all counts of bases, ambiguous codes,missing data, and alignment gaps are returned.

Details

The base frequencies are computed over all sequences in thesample.

ForFtab, if the argumenty is given then bothxandy are coerced as vectors and must be of equal length. Ify is not given,x must be a matrix or a list and onlythe two first sequences are used.

Value

A numeric vector with namesc("a", "c", "g", "t") (and possibly"r", "m", ..., a single numeric value, or a four by four matrixwith similar dimnames.

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)base.freq(woodmouse)base.freq(woodmouse, TRUE)base.freq(woodmouse, TRUE, TRUE)GC.content(woodmouse)Ftab(woodmouse)Ftab(woodmouse[1, ], woodmouse[2, ]) # same than aboveFtab(woodmouse[14:15, ]) # between the last two

Extended Version of the Birth-Death Models to Estimate Speciationand Extinction Rates

Description

This function fits by maximum likelihood a birth-death model to thecombined phylogenetic and taxonomic data of a given clade. Thephylogenetic data are given by a tree, and the taxonomic data by thenumber of species for the its tips.

Usage

bd.ext(phy, S, conditional = TRUE)

Arguments

phy

an object of class"phylo".

S

a numeric vector giving the number of species for each tip.

conditional

whether probabilities should be conditioned on noextinction (mainly to compare results with previous analyses; seedetails).

Details

A re-parametrization of the birth-death model studied by Kendall(1948) so that the likelihood has to be maximized overd/b andb - d, whereb is the birth rate, andd the deathrate.

The standard-errors of the estimated parameters are computed using anormal approximation of the maximum likelihood estimates.

If the argumentS has names, then they are matched to the tiplabels ofphy. The user must be careful here since the functionrequires that both series of names perfectly match, so this operationmay fail if there is a typing or syntax error. If both series of namesdo not match, the valuesS are taken to be in the same orderthan the tip labels ofphy, and a warning message is issued.

Note that the function does not check that the tree is effectivelyultrametric, so if it is not, the returned result may not bemeaningful.

Ifconditional = TRUE, the probabilities of the taxonomic dataare calculated conditioned on no extinction (Rabosky et al. 2007). Inprevious versions of the present function (until ape 2.6-1),unconditional probabilities were used resulting in underestimatedextinction rate. Though it does not make much sense to useconditional = FALSE, this option is provided to compare resultsfrom previous analyses: if the species richnesses are relatively low,both versions will give similar results (see examples).

Author(s)

Emmanuel Paradis

References

Paradis, E. (2003) Analysis of diversification: combining phylogeneticand taxonomic data.Proceedings of the Royal Society ofLondon. Series B. Biological Sciences,270, 2499–2505.

Rabosky, D. L., Donnellan, S. C., Talaba, A. L. and Lovette,I. J. (2007) Exceptional among-lineage variation in diversificationrates during the radiation of Australia's most diverse vertebrateclade.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,274, 2915–2923.

Examples

### An example from Paradis (2003) using the avian orders:data(bird.orders)### Number of species in each order from Sibley and Monroe (1990):S <- c(10, 47, 69, 214, 161, 17, 355, 51, 56, 10, 39, 152,       6, 143, 358, 103, 319, 23, 291, 313, 196, 1027, 5712)bd.ext(bird.orders, S)bd.ext(bird.orders, S, FALSE) # same than older versions

Time-Dependent Birth-Death Models

Description

This function fits a used-defined time-dependent birth-deathmodel.

Usage

bd.time(phy, birth, death, BIRTH = NULL, DEATH = NULL,        ip, lower, upper, fast = FALSE, boot = 0, trace = 0)

Arguments

phy

an object of class"phylo".

birth

either a numeric (if speciation rate is assumedconstant), or a (vectorized) function specifying how the birth(speciation) probability changes through time (see details).

death

id. for extinction probability.

BIRTH

(optional) a vectorized function giving the primitiveofbirth.

DEATH

id. fordeath.

ip

a numeric vector used as initial values for the estimationprocedure. If missing, these values are guessed.

lower,upper

the lower and upper bounds of the parameters. Ifmissing, these values are guessed too.

fast

a logical value specifying whether to use fasterintegration (see details).

boot

the number of bootstrap replicates to assess theconfidence intervals of the parameters. Not run by default.

trace

an integer value. If non-zero, the fitting procedure isprinted everytrace steps. This can be helpful if convergenceis particularly slow.

Details

Details on how to specify the birth and death functions and theirprimitives can be found in the help page ofyule.time.

The model is fitted by minimizing the least squares deviation betweenthe observed and the predicted distributions of branching times. Thesecomputations rely heavily on numerical integrations. Iffast = FALSE, integrations are done with R'sintegratefunction. Iffast = TRUE, a faster but less accurate functionprovided inape is used. If fitting a complex model to a largephylogeny, a strategy might be to first use the latter option, andthen to use the estimates as starting values withfast = FALSE.

Value

A list with the following components:

par: a vector of estimates with names taken from the parametersin the specified functions.
SS: the minimized sum of squares.
convergence: output convergence criterion fromnlminb.
message: id.
iterations: id.
evaluations: id.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2011) Time-dependent speciation and extinction fromphylogenies: a least squares approach.Evolution,65,661–672.

Examples

set.seed(3)tr <- rbdtree(0.1, 0.02)bd.time(tr, 0, 0) # fits a simple BD modelbd.time(tr, 0, 0, ip = c(.1, .01)) # 'ip' is useful here## the classic logistic:birth.logis <- function(a, b) 1/(1 + exp(-a*t - b))## Not run: bd.time(tr, birth.logis, 0, ip = c(0, -2, 0.01))## slow to get:## $par##            a            b        death## -0.003486961 -1.995983179  0.016496454#### $SS## [1] 20.73023## End(Not run)

Phylogenetic Generalized Linear Mixed Model for Binary Data

Description

binaryPGLMM performs linear regression for binary phylogenetic data, estimating regression coefficients with approximate standard errors. It simultaneously estimates the strength of phylogenetic signal in the residuals and gives an approximate conditional likelihood ratio test for the hypothesis that there is no signal. Therefore, when applied without predictor (independent) variables, it gives a test for phylogenetic signal for binary data. The method uses a GLMM approach, alternating between penalized quasi-likelihood (PQL) to estimate the "mean components" and restricted maximum likelihood (REML) to estimate the "variance components" of the model.

binaryPGLMM.sim is a companion function that simulates binary phylogenetic data of the same structure analyzed by binaryPGLMM.

Usage

binaryPGLMM(formula, data = list(), phy, s2.init = 0.1,            B.init = NULL, tol.pql = 10^-6, maxit.pql = 200,            maxit.reml = 100)binaryPGLMM.sim(formula, data = list(), phy, s2 = NULL, B = NULL, nrep = 1)## S3 method for class 'binaryPGLMM'print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

formula

a two-sided linear formula object describing thefixed-effects of the model; for example, Y ~ X.

data

a data frame containing the variables named in formula.

phy

a phylogenetic tree as an object of class "phylo".

s2.init

an initial estimate of s2, the scaling component of thevariance in the PGLMM. A value of s2 = 0 implies no phylogeneticsignal. Note that the variance-covariance matrix given by thephylogeny phy is scaled to have determinant = 1.

B.init

initial estimates of B, the matrix containing regressioncoefficients in the model. This matrix must havedim(B.init)=c(p+1,1), where p is the number of predictor(independent) variables; the first element of B corresponds to theintercept, and the remaining elements correspond in order to thepredictor (independent) variables in the model.

tol.pql

a control parameter dictating the tolerance forconvergence for the PQL optimization.

maxit.pql

a control parameter dictating the maximum number ofiterations for the PQL optimization.

maxit.reml

a control parameter dictating the maximum number ofiterations for the REML optimization.

x

an object of class "binaryPGLMM".

s2

in binaryPGLMM.sim, value of s2. See s2.init.

B

in binaryPGLMM.sim, value of B, the matrix containing regression coefficients in the model. See B.init.

nrep

in binaryPGLMM.sim, number of compete data sets produced.

digits

the number of digits to print.

...

further arguments passed toprint.

Details

The function estimates parameters for the model

Pr(Y = 1) = q

q = inverse.logit(b0 + b1 * x1 + b2 * x2 + \dots + \epsilon)

\epsilon ~ Gaussian(0, s2 * V)

whereV is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution). Although mathematically there is no requirement forV to be ultrametric, forcingV into ultrametric form can aide in the interpretation of the model, because in regression for binary dependent variables, only the off-diagonal elements (i.e., covariances) of matrixV are biologically meaningful (see Ives & Garland 2014).

The function converts a phylo tree object into a variance-covariance matrix, and further standardizes this matrix to have determinant = 1. This in effect standardizes the interpretation of the scalar s2. Although mathematically not required, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

The estimation method alternates between PQL to obtain estimates of the mean components of the model (this is the standard approach to estimating GLMs) and REML to obtain estimates of the variance components. This method gives relatively fast and robust estimation. Nonetheless, the estimates of the coefficients B will generally be upwards bias, as is typical of estimation for binary data. The standard errors of B are computed from the PQL results conditional on the estimate of s2 and therefore should tend to be too small. The function returns an approximate P-value for the hypothesis of no phylogenetic signal in the residuals (i.e., H0:s2 = 0) using an approximate likelihood ratio test based on the conditional REML likelihood (rather than the marginal likelihood). Simulations have shown that these P-values tend to be high (giving type II errors: failing to identify variances that in fact are statistically significantly different from zero).

It is a good idea to confirm statistical inferences using parametric bootstrapping, and the companion function binaryPGLMM.sim gives a simply tool for this. See Examples below.

Value

An object of class "binaryPGLMM".

formula

formula specifying the regression model.

B

estimates of the regression coefficients.

B.se

approximate PQL standard errors of the regressioncoefficients.

B.cov

approximate PQL covariance matrix for the regressioncoefficients.

B.zscore

approximate PQL Z scores for the regressioncoefficients.

B.pvalue

approximate PQL tests for the regression coefficientsbeing different from zero.

s2

phylogenetic signal measured as the scalar magnitude of thephylogenetic variance-covariance matrix s2 * V.

P.H0.s2

approximate likelihood ratio test of the hypothesis H0that s2 = 0. This test is based on the conditional REML (keeping theregression coefficients fixed) and is prone to inflated type 1 errors.

mu

for each data point y, the estimate of p that y = 1.

b

for each data point y, the estimate of inverse.logit(p).

X

the predictor (independent) variables returned in matrix form(including 1s in the first column).

H

residuals of the form b + (Y - mu)/(mu * (1 - mu)).

B.init

the user-provided initial estimates of B. If B.init isnot provided, these are estimated using glm() assuming no phylogeneticsignal. The glm() estimates can generate convergence problems, sousing small values (e.g., 0.01) is more robust but slower.

VCV

the standardized phylogenetic variance-covariance matrix.

V

estimate of the covariance matrix of H.

convergeflag

flag for cases when convergence failed.

iteration

number of total iterations performed.

converge.test.B

final tolerance for B.

converge.test.s2

final tolerance for s2.

rcondflag

number of times B is reset to 0.01. This is done whenrcond(V) < 10^(-10), which implies that V cannot be inverted.

Y

in binaryPGLMM.sim, the simulated values of Y.

Author(s)

Anthony R. Ives

References

Ives, A. R. and Helmus, M. R. (2011) Generalized linear mixed modelsfor phylogenetic analyses of community structure.EcologicalMonographs,81, 511–525.

Ives, A. R. and Garland, T., Jr. (2014) Phylogenetic regression forbinary dependent variables. Pages 231–261in L. Z. Garamszegi,editor.Modern Phylogenetic Comparative Methods and TheirApplication in Evolutionary Biology. Springer-Verlag, BerlinHeidelberg.

Examples

## Illustration of binaryPGLMM() with simulated data# Generate random phylogenyn <- 100phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1)# Generate random data and standardize to have mean 0 and variance 1X1 <- rTraitCont(phy, model = "BM", sigma = 1)X1 <- (X1 - mean(X1))/var(X1)# Simulate binary Ysim.dat <- data.frame(Y=array(0, dim=n), X1=X1, row.names=phy$tip.label)sim.dat$Y <- binaryPGLMM.sim(Y ~ X1, phy=phy, data=sim.dat, s2=.5,                             B=matrix(c(0,.25),nrow=2,ncol=1), nrep=1)$Y# Fit modelbinaryPGLMM(Y ~ X1, phy=phy, data=sim.dat)## Not run: # Compare with phyloglm()library(phylolm)summary(phyloglm(Y ~ X1, phy=phy, data=sim.dat))# Compare with glm() that does not account for phylogenysummary(glm(Y ~ X1, data=sim.dat, family="binomial"))# Compare with logistf() that does not account# for phylogeny but is less biased than glm()library(logistf)logistf(Y ~ X1, data=sim.dat)# Compare with MCMCglmmlibrary(MCMCglmm)V <- vcv(phy)V <- V/max(V)detV <- exp(determinant(V)$modulus[1])V <- V/detV^(1/n)invV <- Matrix(solve(V),sparse=T)sim.dat$species <- phy$tip.labelrownames(invV) <- sim.dat$speciesnitt <- 43000thin <- 10burnin <- 3000prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)))summary(MCMCglmm(Y ~ X1, random=~species, ginvers=list(species=invV),    data=sim.dat, slice=TRUE, nitt=nitt, thin=thin, burnin=burnin,    family="categorical", prior=prior, verbose=FALSE))## Examine bias in estimates of B1 and s2 from binaryPGLMM with# simulated data. Note that this will take a while.Reps = 1000s2 <- 0.4B1 <- 1meanEsts <- data.frame(n = Inf, B1 = B1, s2 = s2, Pr.s2 = 1, propconverged = 1)for (n in c(160, 80, 40, 20)) {  meanEsts.n <- data.frame(B1 = 0, s2 = 0, Pr.s2 = 0, convergefailure = 0)    for (rep in 1:Reps) {      phy <- compute.brlen(rtree(n = n), method = "Grafen", power = 1)      X <- rTraitCont(phy, model = "BM", sigma = 1)      X <- (X - mean(X))/var(X)      sim.dat <- data.frame(Y = array(0, dim = n), X = X, row.names = phy$tip.label)      sim <- binaryPGLMM.sim(Y ~ 1 + X, phy = phy, data = sim.dat, s2 = s2,                                       B = matrix(c(0,B1), nrow = 2, ncol = 1), nrep = 1)      sim.dat$Y <- sim$Y      z <- binaryPGLMM(Y ~ 1 + X, phy = phy, data = sim.dat)      meanEsts.n[rep, ] <- c(z$B[2], z$s2, z$P.H0.s2, z$convergeflag == "converged")  }converged <- meanEsts.n[,4]meanEsts <- rbind(meanEsts,                  c(n, mean(meanEsts.n[converged==1,1]),                            mean(meanEsts.n[converged==1,2]),                            mean(meanEsts.n[converged==1, 3] < 0.05),                            mean(converged)))}meanEsts# Results output for B1 = 0.5, s2 = 0.4; n-Inf gives the values used to# simulate the data#    n       B1        s2      Pr.s2 propconverged# 1 Inf 1.000000 0.4000000 1.00000000         1.000# 2 160 1.012719 0.4479946 0.36153072         0.993# 3  80 1.030876 0.5992027 0.24623116         0.995# 4  40 1.110201 0.7425203 0.13373860         0.987# 5  20 1.249886 0.8774708 0.05727377         0.873## Examine type I errors for estimates of B0 and s2 from binaryPGLMM()# with simulated data. Note that this will take a while.Reps = 1000s2 <- 0B0 <- 0B1 <- 0H0.tests <- data.frame(n = Inf, B0 = B0, s2 = s2, Pr.B0 = .05,                       Pr.s2 = .05, propconverged = 1)for (n in c(160, 80, 40, 20)) {  ests.n <- data.frame(B1 = 0, s2 = 0, Pr.B0 = 0, Pr.s2 = 0, convergefailure = 0)  for (rep in 1:Reps) {    phy <- compute.brlen(rtree(n = n), method = "Grafen", power = 1)    X <- rTraitCont(phy, model = "BM", sigma = 1)    X <- (X - mean(X))/var(X)    sim.dat <- data.frame(Y = array(0, dim = n), X = X, row.names = phy$tip.label)    sim <- binaryPGLMM.sim(Y ~ 1, phy = phy, data = sim.dat, s2 = s2,                           B = matrix(B0, nrow = 1, ncol = 1), nrep = 1)    sim.dat$Y <- sim$Y    z <- binaryPGLMM(Y ~ 1, phy = phy, data = sim.dat)    ests.n[rep, ] <- c(z$B[1], z$s2, z$B.pvalue, z$P.H0.s2, z$convergeflag == "converged")  }converged <- ests.n[,5]H0.tests <- rbind(H0.tests,                  c(n, mean(ests.n[converged==1,1]),                    mean(ests.n[converged==1,2]),                    mean(ests.n[converged==1, 3] < 0.05),                    mean(ests.n[converged==1, 4] < 0.05),                    mean(converged)))}H0.tests# Results for type I errors for B0 = 0 and s2 = 0; n-Inf gives the values# used to simulate the data. These results show that binaryPGLMM() tends to# have lower-than-nominal p-values; fewer than 0.05 of the simulated# data sets have H0:B0=0 and H0:s2=0 rejected at the alpha=0.05 level.#     n            B0         s2      Pr.B0      Pr.s2 propconverged# 1 Inf  0.0000000000 0.00000000 0.05000000 0.05000000         1.000# 2 160 -0.0009350357 0.07273163 0.02802803 0.04804805         0.999# 3  80 -0.0085831477 0.12205876 0.04004004 0.03403403         0.999# 4  40  0.0019303847 0.25486307 0.02206620 0.03711133         0.997# 5  20  0.0181394905 0.45949266 0.02811245 0.03313253         0.996## End(Not run)

Binds Trees

Description

This function binds together two phylogenetic trees to give a singleobject of class"phylo".

Usage

bind.tree(x, y, where = "root", position = 0, interactive = FALSE)x + y

Arguments

x

an object of class"phylo".

y

an object of class"phylo".

where

an integer giving the number of the node or tip of thetreex where the treey is binded ("root" is ashort-cut for the root).

position

a numeric value giving the position from the tip ornode given bynode where the treey is binded;negative values are ignored.

interactive

ifTRUE the user is asked to choose the tipor node ofx by clicking on the tree which must be plotted.

Details

The argumentx can be seen as the receptor tree, whereasy is the donor tree. The root ofy is then grafted on alocation ofx specified bywhere and, possibly,position. Ify has a root edge, this is added as ininternal branch in the resulting tree.

x + y is a shortcut for:

    bind.tree(x, y, position = if (is.null(x$root.edge)) 0 else    x$root.edge)

If only one of the trees has no branch length, the branch lengths ofthe other one are ignored with a warning.

If one (or both) of the trees has no branch length, it is possible tospecify a value of 'position' to graft 'y' below the node of 'x'specified by 'where'. In this case, the exact value of 'position' isnot important as long as it is greater than zero. The new node will bemultichotomous if 'y' has no root edge. This can be solved by givingan arbitrary root edge to 'y' beforehand (e.g.,y$root.edge <- 1): it will be deleted during the binding operation.

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis

Examples

### binds the two clades of bird orderstreefile1 <- tempfile("tree", fileext = ".tre")treefile2 <- tempfile("tree", fileext = ".tre")cat("((Struthioniformes:21.8,Tinamiformes:21.8):4.1,",    "((Craciformes:21.6,Galliformes:21.6):1.3,Anseriformes:22.9):3.0):2.1;",    file = treefile1, sep = "\n")cat("(Turniciformes:27.0,(Piciformes:26.3,((Galbuliformes:24.4,",    "((Bucerotiformes:20.8,Upupiformes:20.8):2.6,",    "(Trogoniformes:22.1,Coraciiformes:22.1):1.3):1.0):0.6,",    "(Coliiformes:24.5,(Cuculiformes:23.7,(Psittaciformes:23.1,",    "(((Apodiformes:21.3,Trochiliformes:21.3):0.6,",    "(Musophagiformes:20.4,Strigiformes:20.4):1.5):0.6,",    "((Columbiformes:20.8,(Gruiformes:20.1,Ciconiiformes:20.1):0.7):0.8,",    "Passeriformes:21.6):0.9):0.6):0.6):0.8):0.5):1.3):0.7):1.0;",    file = treefile2, sep = "\n")tree.bird1 <- read.tree(treefile1)tree.bird2 <- read.tree(treefile2)unlink(c(treefile1, treefile2)) # clean-up(birds <- tree.bird1 + tree.bird2)layout(matrix(c(1, 2, 3, 3), 2, 2))plot(tree.bird1)plot(tree.bird2)plot(birds)### examples with random treesx <- rtree(4, tip.label = LETTERS[1:4])y <- rtree(4, tip.label = LETTERS[5:8])x <- makeNodeLabel(x, prefix = "x_")y <- makeNodeLabel(y, prefix = "y_")x$root.edge <- y$root.edge <- .2z <- bind.tree(x, y, po=.2)plot(y, show.node.label = TRUE, font = 1, root.edge = TRUE)title("y")plot(x, show.node.label = TRUE, font = 1, root.edge = TRUE)title("x")plot(z, show.node.label = TRUE, font = 1, root.edge = TRUE)title("z <- bind.tree(x, y, po=.2)")## make sure the terminal branch length is long enough:x$edge.length[x$edge[, 2] == 2] <- 0.2z <- bind.tree(x, y, 2, .1)plot(y, show.node.label = TRUE, font = 1, root.edge = TRUE)title("y")plot(x, show.node.label = TRUE, font = 1, root.edge = TRUE)title("x")plot(z, show.node.label = TRUE, font = 1, root.edge = TRUE)title("z <- bind.tree(x, y, 2, .1)")x <- rtree(50)y <- rtree(50)x$root.edge <- y$root.edge <- .2z <- x + yplot(y, show.tip.label = FALSE, root.edge = TRUE); axisPhylo()title("y")plot(x, show.tip.label = FALSE, root.edge = TRUE); axisPhylo()title("x")plot(z, show.tip.label = FALSE, root.edge = TRUE); axisPhylo()title("z <- x + y")layout(1)

Phylogeny of the Families of Birds From Sibley and Ahlquist

Description

This data set describes the phylogenetic relationships of the familiesof birds as reported by Sibley and Ahlquist (1990). Sibley andAhlquist inferred this phylogeny from an extensive number of DNA/DNAhybridization experiments. The “tapestry” reported by these twoauthors (more than 1000 species out of the ca. 9000 extant birdspecies) generated a lot of debates.

The present tree is based on the relationships among families. A fewfamilies were not included in the figures in Sibley and Ahlquist, andthus are not included here as well. The branch lengths were calculatedfrom the values of\Delta T_{50}H as found in Sibleyand Ahlquist (1990, figs. 354, 355, 356, and 369).

Usage

data(bird.families)

Format

The data are stored as an object of class"phylo" whichstructure is described in the help page of the functionread.tree.

Source

Sibley, C. G. and Ahlquist, J. E. (1990) Phylogeny and classificationof birds: a study in molecular evolution. New Haven: Yale University Press.

Examples

data(bird.families)op <- par(cex = 0.3)plot(bird.families)par(op)

Phylogeny of the Orders of Birds From Sibley and Ahlquist

Description

This data set describes the phylogenetic relationships of the ordersof birds as reported by Sibley and Ahlquist (1990). Sibley andAhlquist inferred this phylogeny from an extensive number of DNA/DNAhybridization experiments. The “tapestry” reported by these twoauthors (more than 1000 species out of the ca. 9000 extant birdspecies) generated a lot of debates.

The present tree is based on the relationships among orders. Thebranch lengths were calculated from the values of\Delta T_{50}H as found in Sibley and Ahlquist (1990,fig. 353).

Usage

data(bird.orders)

Format

The data are stored as an object of class"phylo" whichstructure is described in the help page of the functionread.tree.

Source

Sibley, C. G. and Ahlquist, J. E. (1990) Phylogeny and classificationof birds: a study in molecular evolution. New Haven: Yale University Press.

Examples

data(bird.orders)plot(bird.orders)

Estimation of Speciation and Extinction Rates With Birth-Death Models

Description

This function fits by maximum likelihood a birth-death model to thebranching times computed from a phylogenetic tree using the method ofNee et al. (1994).

Usage

birthdeath(phy)## S3 method for class 'birthdeath'print(x, ...)

Arguments

phy

an object of class"phylo".

x

an object of class"birthdeath".

...

further arguments passed to theprint function.

Details

Nee et al. (1994) used a re-parametrization of the birth-death modelstudied by Kendall (1948) so that the likelihood has to be maximizedoverd/b andb - d, whereb is the birth rate,andd the death rate. This is the approach used by the presentfunction.

This function computes the standard-errors of the estimated parametersusing a normal approximations of the maximum likelihood estimates:this is likely to be inaccurate because of asymmetries of thelikelihood function (Nee et al. 1995). In addition, 95 intervals of both parameters are computed using profile likelihood:they are particularly useful if the estimate ofd/b is at theboundary of the parameter space (i.e. 0, which is often the case).

Note that the function does not check that the tree is effectivelyultrametric, so if it is not, the returned result may not be meaningful.

Value

An object of class"birthdeath" which is a list with thefollowing components:

tree

the name of the tree analysed.

N

the number of species.

dev

the deviance (= -2 log lik) at its minimum.

para

the estimated parameters.

se

the corresponding standard-errors.

CI

the 95% profile-likelihood confidence intervals.

Author(s)

Emmanuel Paradis

References

Kendall, D. G. (1948) On the generalized “birth-and-death”process.Annals of Mathematical Statistics,19, 1–15.

Nee, S., May, R. M. and Harvey, P. H. (1994) The reconstructedevolutionary process.Philosophical Transactions of the RoyalSociety of London. Series B. Biological Sciences,344, 305–311.

Nee, S., Holmes, E. C., May, R. M. and Harvey, P. H. (1995) Estimatingextinctions from molecular phylogenies. inExtinction Rates,eds. Lawton, J. H. and May, R. M., pp. 164–182, Oxford University Press.

Tree Bipartition and Bootstrapping Phylogenies

Description

These functions analyse bipartitions found in a series of trees.

prop.part counts the number of bipartitions found in a seriesof trees given as.... If a single tree is passed, thereturned object is a list of vectors with the tips descending fromeach node (i.e., clade compositions indexed by node number).

prop.clades counts the number of times the bipartitions presentinphy are present in a series of trees given as... orin the list previously computed and given withpart.

boot.phylo performs a bootstrap analysis.

Usage

boot.phylo(phy, x, FUN, B = 100, block = 1,           trees = FALSE, quiet = FALSE,           rooted = is.rooted(phy), jumble = TRUE,            mc.cores = 1)prop.part(..., check.labels = TRUE)prop.clades(phy, ..., part = NULL, rooted = FALSE)## S3 method for class 'prop.part'print(x, ...)## S3 method for class 'prop.part'summary(object, ...)## S3 method for class 'prop.part'plot(x, barcol = "blue", leftmar = 4, col = "red", ...)

Arguments

phy

an object of class"phylo".

x

in the case ofboot.phylo: a taxa (rows) by characters(columns) matrix; in the case ofprint andplot: anobject of class"prop.part".

FUN

the function used to estimatephy (see details).

B

the number of bootstrap replicates.

block

the number of columns inx that will be resampledtogether (see details).

trees

a logical specifying whether to return the bootstrapedtrees (FALSE by default).

quiet

a logical: a progress bar is displayed by default.

rooted

a logical specifying whether the trees should be treatedas rooted or not.

jumble

a logical value. By default, the rows ofx arerandomized to avoid artificially too large bootstrap valuesassociated with very short branches.

mc.cores

the number of cores (CPUs) to be used (passed toparallel).

...

either (i) a single object of class"phylo", (ii) aseries of such objects separated by commas, or (iii) a listcontaining such objects. In the case ofplot furtherarguments for the plot (see details).

check.labels

a logical specifying whether to check the labelsof each tree. IfFALSE, it is assumed that all trees have thesame tip labels, and that they are in the same order (see details).

part

a list of partitions as returned byprop.part; ifthis is used then... is ignored.

object

an object of class"prop.part".

barcol

the colour used for the bars displaying the number ofpartitions in the upper panel.

leftmar

the size of the margin on the left to display the tiplabels.

col

the colour used to visualise the bipartitions.

Details

The argumentFUN inboot.phylo must be the function usedto estimate the tree from the original data matrix. Thus, if the treewas estimated with neighbor-joining (seenj), one maybe wantssomething likeFUN = function(xx) nj(dist.dna(xx)).

block inboot.phylo specifies the number of columns tobe resampled altogether. For instance, if one wants to resample at thecodon-level, thenblock = 3 must be used.

Usingcheck.labels = FALSE inprop.part decreasescomputing times. This requires that (i) all trees have the same tiplabels,and (ii) these labels are ordered similarly in alltrees (in other words, the elementtip.label are identical inall trees).

The plot function represents a contingency table of the differentpartitions (on thex-axis) in the lower panel, and their observednumbers in the upper panel. Any further arguments (...) are used tochange the aspects of the points in the lower panel: these may bepch,col,bg,cex, etc. This functionworks only if there is an attributelabels in the object.

The print method displays the partitions and their numbers. Thesummary method extracts the numbers only.

Value

prop.part returns an object of class"prop.part" whichis a list with an attribute"number". The elements of this listare the observed clades, and the attribute their respectivenumbers. If the defaultcheck.labels = FALSE is used, anattribute"labels" is added, and the vectors of the returnedobject contains the indices of these labels instead of the labelsthemselves.

prop.clades andboot.phylo return a numeric vectorwhichith element is the number associated to theithnode ofphy. Iftrees = TRUE,boot.phylo returnsa list whose first element (named"BP") is like before, and thesecond element ("trees") is a list with the bootstrapedtrees.

summary returns a numeric vector.

Note

prop.clades calls internallyprop.part with the optioncheck.labels = TRUE, which may be very slow. If the treespassed as... fulfills conditions (i) and (ii) above, then itmight be faster to first call, e.g.,pp <- prop.part(...), thenuse the optionpart:prop.clades(phy, part = pp).

Sinceape 3.5,prop.clades should return sensible resultsfor all values ofrooted: ifFALSE, the numbers ofbipartitions (or splits); ifTRUE, the number of clades (ofhopefully rooted trees).

Author(s)

Emmanuel Paradis

References

Efron, B., Halloran, E. and Holmes, S. (1996) Bootstrap confidencelevels for phylogenetic trees.Proceedings of the NationalAcademy of Sciences USA,93, 13429–13434.

Felsenstein, J. (1985) Confidence limits on phylogenies: an approachusing the bootstrap.Evolution,39, 783–791.

Examples

data(woodmouse)f <- function(x) nj(dist.dna(x))tr <- f(woodmouse)### Are bootstrap values stable?for (i in 1:5)  print(boot.phylo(tr, woodmouse, f, quiet = TRUE))### How many partitions in 100 random trees of 10 labels?...TR <- rmtree(100, 10)pp10 <- prop.part(TR)length(pp10)### ... and in 100 random trees of 20 labels?TR <- rmtree(100, 20)pp20 <- prop.part(TR)length(pp20)plot(pp10, pch = "x", col = 2)plot(pp20, pch = "x", col = 2)set.seed(2)tr <- rtree(10) # rooted## the following used to return a wrong result with ape <= 3.4:prop.clades(tr, tr)prop.clades(tr, tr, rooted = TRUE)tr <- rtree(10, rooted = FALSE)prop.clades(tr, tr) # correct### an illustration of the use of prop.clades with bootstrap trees:fun <- function(x) as.phylo(hclust(dist.dna(x), "average")) # upgma() in phangorntree <- fun(woodmouse)## get 100 bootstrap trees:bstrees <- boot.phylo(tree, woodmouse, fun, trees = TRUE)$trees## get proportions of each clade:clad <- prop.clades(tree, bstrees, rooted = TRUE)## get proportions of each bipartition:boot <- prop.clades(tree, bstrees)layout(1)par(mar = rep(2, 4))plot(tree, main = "Bipartition vs. Clade Support Values")drawSupportOnEdges(boot)nodelabels(clad)legend("bottomleft", legend = c("Bipartitions", "Clades"), pch = 22,       pt.bg = c("green", "lightblue"), pt.cex = 2.5)## Not run: ## an example of double bootstrap:nrep1 <- 100nrep2 <- 100p <- ncol(woodmouse)DB <- 0for (b in 1:nrep1) {    X <- woodmouse[, sample(p, p, TRUE)]    DB <- DB + boot.phylo(tr, X, f, nrep2, quiet = TRUE)}DB## to compare with:boot.phylo(tr, woodmouse, f, 1e4)## End(Not run)

Branching Times of a Phylogenetic Tree

Description

This function computes the branching times of a phylogenetic tree,that is the distance from each node to the tips, under the assumption thatthe tree is ultrametric. Note that the function does not check that thetree is effectively ultrametric, so if it is not, the returned resultmay not be meaningful.

Usage

branching.times(phy)

Arguments

phy

an object of class"phylo".

Value

a numeric vector with the branching times. If the phylogenyphyhas an elementnode.label, this is used as names for thereturned vector; otherwise the numbers (of mode character) of thematrixedge ofphy are used as names.

Author(s)

Emmanuel Paradis

Building Lists of Trees

Description

These functions help to build lists of trees of class"multiPhylo".

Usage

## S3 method for class 'phylo'c(..., recursive = TRUE)## S3 method for class 'multiPhylo'c(..., recursive = TRUE).compressTipLabel(x, ref = NULL).uncompressTipLabel(x)

Arguments

...

one or several objects of class"phylo" and/or"multiPhylo".

recursive

see details.

x

an object of class"phylo" or"multiPhylo".

ref

an optional vector of mode character to constrain the orderof the tips. By default, the order from the first tree is used.

Details

Thesec methods check all the arguments, and return by defaulta list of single trees unless some objects are not trees or lists oftrees, in which caserecursive is switched to FALSE and awarning message is given. Ifrecursive = FALSE, the objects aresimply concatenated into a list. Beforeape 4.0,recursivewas always set to FALSE.

.compressTipLabel transforms an object of class"multiPhylo" by checking that all trees have the same tiplabels and renumbering the tips in theedge matrix so that thetip numbers are also the same taking the first tree as the reference(duplicated labels are not allowed). The returned object has a uniquevector of tip labels (attr(x, "TipLabel")).

.uncompressTipLabel does the reverse operation.

Value

An object of class"multiPhylo".

Author(s)

Emmanuel Paradis

Examples

x <- c(rtree(4), rtree(2))xy <- c(rtree(4), rtree(4))z <- c(x, y)zprint(z, TRUE)try(.compressTipLabel(x)) # errora <- .compressTipLabel(y).uncompressTipLabel(a) # back to y## eventually compare str(a) and str(y)

Carnivora body sizes and life history traits

Description

Dataset adapted from Gittleman (1986), including 2 morphological variables (body and brain sizes), 8 life history traits variables and 4 taxonomic variables.

Usage

data(carnivora)

Format

A data frame with 112 observations on 17 variables.

[,1]	Order	factor	Carnivora order
[,2]	SuperFamily	factor	Super family (Caniformia or Feliformia)
[,3]	Family	factor	Carnivora family
[,4]	Genus	factor	Carnivora genus
[,5]	Species	factor	Carnivora species
[,6]	FW	numeric	Female body weight (kg)
[,7]	SW	numeric	Average body weight of adult male and adult female (kg)
[,8]	FB	numeric	Female brain weight (g)
[,9]	SB	numeric	Average brain weight of adult male and adult female (g)
[,10]	LS	numeric	Litter size
[,11]	GL	numeric	Gestation length (days)
[,12]	BW	numeric	Birth weigth (g)
[,13]	WA	numeric	Weaning age (days)
[,14]	AI	numeric	Age of independance (days)
[,15]	LY	numeric	Longevity (months)
[,16]	AM	numeric	Age of sexual maturity (days)
[,17]	IB	numeric	Inter-birth interval (months)

Source

Gittleman, J. L. (1986) Carnivore life history patterns: allometric,phylogenetic and ecological associations.American Naturalist,127: 744–771.

Examples

data(carnivora)## Fig. 1 in Gittleman (1986):plot(carnivora$BW ~ carnivora$FW, pch = (1:8)[carnivora$Family], log = "xy",     xlab = "Female body weight (kg)", ylab = "Birth weigth (g)",     ylim = c(1, 2000))legend("bottomright", legend = levels(carnivora$Family), pch = 1:8)plot(carnivora$BW ~ carnivora$FB, pch = (1:8)[carnivora$Family], log = "xy",     xlab = "Female brain weight (g)", ylab = "Birth weigth (g)",     ylim = c(1, 2000))legend("bottomright", legend = levels(carnivora$Family), pch = 1:8)

Check DNA Alignments

Description

This function performs a series of diagnostics on a DNA alignement.

Usage

checkAlignment(x, check.gaps = TRUE, plot = TRUE, what = 1:4)

Arguments

x

an object of class"DNAbin".

check.gaps

a logical value specifying whether to check thedistribution of alignment gaps.

plot

a logical value specifying whether to do the plots.

what

an integer value giving the plot to be done. By default,four plots are done on the same figure.

Details

This function prints on the console a series of diagnostics on theset a aligned DNA sequences. If alignment gaps are present, theirwidth distribution is analysed, as well as the width of contiguousbase segments. The pattern of nucleotide diversity on each site isalso analysed, and a relevant table is printed.

Ifplot = TRUE, four plots are done: an image of thealignement, the distribution of gap widths (if present), the Shannonindex of nucleotide diversity along the sequence, and the number ofobserved bases along the sequence.

If the sequences contain many gaps, it might be better to setcheck.gaps = FALSE to skip the analysis of contiguoussegments.

Value

NULL

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)checkAlignment(woodmouse)layout(1)

Checking Labels

Description

Checking and correcting character strings, particularly before writinga Newick tree.

Usage

checkLabel(x)

Arguments

x

a vector of mode character.

Details

This function deletes the leading and trailing spaces (includingtabulations, new lines, and left or right parentheses at the beginningor end of the strings), substitutes the spaces inside the strings byunderscores, and substitutes commas, colons, semicolons, andparentheses inside the strings by dashes.

Value

a vector of mode character.

Author(s)

Emmanuel Paradis

Examples

checkLabel(" Homo sapiens\t(Primates; World)   ")

Check the Structure of a "phylo" Object

Description

This function takes as single argument an object (phy), checks itselements, and prints a diagnostic. All problems are printed with alabel: FATAL (will likely cause an error or a crash) or MODERATE (maycause some problems).

This function is mainly intended for developers creating"phylo" objects from scratch.

Usage

checkValidPhylo(phy)

Arguments

phy

an object of class"phylo".

Value

NULL.

Author(s)

Emmanuel Paradis

Examples

tr <- rtree(3)checkValidPhylo(tr)tr$edge[1] <- 0checkValidPhylo(tr)

Number of Cherries and Null Models of Trees

Description

This function calculates the number of cherries (see definition below)on a phylogenetic tree, and tests the null hypotheses whether thisnumber agrees with those predicted from two null models of trees (theYule model, and the uniform model).

Usage

cherry(phy)

Arguments

phy

an object of class"phylo".

Details

A cherry is a pair of adjacent tips on a tree. The tree can be eitherrooted or unrooted, but the present function considers only rootedtrees. The probability distribution function of the number of cherrieson a tree depends on the speciation/extinction model that generatedthe tree.

McKenzie and Steel (2000) derived the probabilitydistribution function of the number of cherries for two models: theYule model and the uniform model. Broadly, in the Yule model, each extantspecies is equally likely to split into two daughter-species; in theuniform model, a branch is added to tree on any of the alreadyexisting branches with a uniform probability.

The probabilities are computed using recursive formulae; however, forboth models, the probability density function converges to a normallaw with increasing number of tips in the tree. The function usesthese normal approximations for a number of tips greater than or equalto 20.

Value

A NULL value is returned, the results are simply printed.

Author(s)

Emmanuel Paradis

References

McKenzie, A. and Steel, M. (2000) Distributions of cherries for twomodels of trees.Mathematical Biosciences,164, 81–92.

Bat Phylogeny

Description

This phylogeny of bats (Mammalia: Chiroptera) is a supertree (i.e. acomposite phylogeny constructed from several sources; see source fordetails).

Usage

data(chiroptera)

Format

The data are stored in RData (binary) format.

Source

Jones, K. E., Purvis, A., MacLarnon, A., Bininda-Emonds, O. R. P. andSimmons, N. B. (2002) A phylogenetic supertree of the bats (Mammalia:Chiroptera).Biological Reviews of the Cambridge PhilosophicalSociety,77, 223–259.

Examples

data(chiroptera)str(chiroptera)op <- par(cex = 0.3)plot(chiroptera, type = "c")par(op)

Molecular Dating With Mean Path Lengths

Description

This function estimates the node ages of a tree using the mean pathlengths method of Britton et al. (2002). The branch lengths of theinput tree are interpreted as (mean) numbers of substitutions.

Usage

chronoMPL(phy, se = TRUE, test = TRUE)

Arguments

phy

an object of class"phylo".

se

a logical specifying whether to compute the standard-errorsof the node ages (TRUE by default).

test

a logical specifying whether to test the molecular clockat each node (TRUE by default).

Details

The mean path lengths (MPL) method estimates the age of a node withthe mean of the distances from this node to all tips descending fromit. Under the assumption of a molecular clock, standard-errors of theestimates node ages can be computed (Britton et al. 2002).

The tests performed iftest = TRUE is a comparison of the MPLof the two subtrees originating from a node; the null hypothesis isthat the rate of substitution was the same in both subtrees (Brittonet al. 2002). The test statistic follows, under the null hypothesis, astandard normal distribution. The returnedP-value is theprobability of observing a greater absolute value (i.e., a two-sidedtest). No correction for multiple testing is applied: this is left tothe user.

Absolute dating can be done by multiplying the edge lengths found bycalibrating one node age.

Value

an object of class"phylo" with branch lengths as estimated bythe function. There are, by default, two attributes:

stderr

the standard-errors of the node ages.

Pval

theP-value of the test of the molecular clock foreach node.

Note

The present version requires a dichotomous tree.

Author(s)

Emmanuel Paradis

References

Britton, T., Oxelman, B., Vinnersten, A. and Bremer, K. (2002)Phylogenetic dating with confidence intervals using mean pathlengths.Molecular Phylogenetics and Evolution,24,58–65.

Examples

tr <- rtree(10)tr$edge.length <- 5*tr$edge.lengthchr <- chronoMPL(tr)layout(matrix(1:4, 2, 2, byrow = TRUE))plot(tr)title("The original tree")plot(chr)axisPhylo()title("The dated MPL tree")plot(chr)nodelabels(round(attr(chr, "stderr"), 3))title("The standard-errors")plot(tr)nodelabels(round(attr(chr, "Pval"), 3))title("The tests")layout(1)

Molecular Dating With Penalized Likelihood

Description

This function estimates the node ages of a tree using asemi-parametric method based on penalized likelihood (Sanderson2002). The branch lengths of the input tree are interpreted as meannumbers of substitutions (i.e., per site).

Usage

chronopl(phy, lambda, age.min = 1, age.max = NULL,         node = "root", S = 1, tol = 1e-8,         CV = FALSE, eval.max = 500, iter.max = 500, ...)

Arguments

phy

an object of class"phylo".

lambda

value of the smoothing parameter.

age.min

numeric values specifying the fixed node ages (ifage.max = NULL) or the youngest bound of the nodes known tobe within an interval.

age.max

numeric values specifying the oldest bound of the nodesknown to be within an interval.

node

the numbers of the nodes whose ages are given byage.min;"root" is a short-cut for the root.

S

the number of sites in the sequences; leave the default ifbranch lengths are in mean number of substitutions.

tol

the value below which branch lengths are consideredeffectively zero.

CV

whether to perform cross-validation.

eval.max

the maximal number of evaluations of the penalizedlikelihood function.

iter.max

the maximal number of iterations of the optimizationalgorithm.

...

further arguments passed to controlnlminb.

Details

The idea of this method is to use a trade-off between a parametricformulation where each branch has its own rate, and a nonparametricterm where changes in rates are minimized between contiguousbranches. A smoothing parameter (lambda) controls this trade-off. Iflambda = 0, then the parametric component dominates and rates vary asmuch as possible among branches, whereas for increasing values oflambda, the variation are smoother to tend to a clock-like model (samerate for all branches).

lambda must be given. The known ages are given inage.min, and the correponding node numbers innode.These two arguments must obviously be of the same length. By default,an age of 1 is assumed for the root, and the ages of the other nodesare estimated.

Ifage.max = NULL (the default), it is assumed thatage.min gives exactly known ages. Otherwise,age.max andage.min must be of the same length and give the intervals foreach node. Some node may be known exactly while the others areknown within some bounds: the values will be identical in botharguments for the former (e.g.,age.min = c(10, 5), age.max = c(10, 6), node = c(15, 18) means that the age of node 15 is 10units of time, and the age of node 18 is between 5 and 6).

If two nodes are linked (i.e., one is the ancestor of the other) andhave the same values ofage.min andage.max (say, 10 and15) this will result in an error because the medians of these valuesare used as initial times (here 12.5) giving initial branch length(s)equal to zero. The easiest way to solve this is to change slightly thegiven values, for instance useage.max = 14.9 for the youngestnode, orage.max = 15.1 for the oldest one (or similarly forage.min).

The input tree may have multichotomies. If some internal branches areof zero-length, they are collapsed (with a warning), and the returnedtree will have less nodes than the input one. The presence ofzero-lengthed terminal branches of results in an error since it makeslittle sense to have zero-rate branches.

The cross-validation used here is different from the one proposed bySanderson (2002). Here, each tip is dropped successively and theanalysis is repeated with the reduced tree: the estimated dates forthe remaining nodes are compared with the estimates from the fulldata. For theith tip the following is calculated:

\sum_{j=1}^{n-2}{\frac{(t_j - t_j^{-i})^2}{t_j}}

wheret_j is the estimated date for thejth nodewith the full phylogeny,t_j^{-i} is the estimated datefor thejth node after removing tipi from the tree,andn is the number of tips.

The present version uses thenlminb to optimisethe penalized likelihood function: see its help page for details onparameters controlling the optimisation procedure.

Value

an object of class"phylo" with branch lengths as estimated bythe function. There are three or four further attributes:

ploglik

the maximum penalized log-likelihood.

rates

the estimated rates for each branch.

message

the message returned bynlminb indicatingwhether the optimisation converged.

D2

the influence of each observation on overall dateestimates (ifCV = TRUE).

Note

The new functionchronos replaces the present one whichis no more maintained.

Author(s)

Emmanuel Paradis

References

Sanderson, M. J. (2002) Estimating absolute rates of molecularevolution and divergence times: a penalized likelihoodapproach.Molecular Biology and Evolution,19,101–109.

Molecular Dating by Penalised Likelihood and Maximum Likelihood

Description

chronos is the main function fitting a chronogram to aphylogenetic tree whose branch lengths are in number of substitutionper sites.

makeChronosCalib is a tool to prepare data frames with thecalibration points of the phylogenetic tree.

chronos.control creates a list of parameters to be passedtochronos.

Usage

chronos(phy, lambda = 1, model = "correlated", quiet = FALSE,        calibration = makeChronosCalib(phy),        control = chronos.control())## S3 method for class 'chronos'print(x, ...)makeChronosCalib(phy, node = "root", age.min = 1,   age.max = age.min, interactive = FALSE, soft.bounds = FALSE)chronos.control(...)

Arguments

phy

an object of class"phylo".

lambda

value of the smoothing parameter.

model

a character string specifying the model of substitutionrate variation among branches. The possible choices are:“correlated”, “relaxed”, “discrete”, “clock”, or anunambiguous abbreviation of these.

quiet

a logical value; by default the calculation progress aredisplayed.

calibration

a data frame (see details).

control

a list of parameters controlling the optimisationprocedure (see details).

x

an object of classc("chronos", "phylo").

node

a vector of integers giving the node numbers for which acalibration point is given. The default is a short-cut for theroot.

age.min,age.max

vectors of numerical values giving the minimumand maximum ages of the nodes specified innode.

interactive

a logical value. IfTRUE, thenphy isplotted and the user is asked to click close to a node and enter theages on the keyboard.

soft.bounds

(currently unused)

...

in the case ofchronos.control: one of the fiveparameters controlling optimisation (unused in the case ofprint.chronos).

Details

chronos replaceschronopl but with a different interfaceand some extensions (see References).

The known dates (argumentcalibration) must be given in a dataframe with the following column names: node, age.min, age.max, andsoft.bounds (the last one is yet unused). For each row, these are,respectively: the number of the node in the “phylo” coding standard,the minimum age for this node, the maximum age, and a logical valuespecifying whether the bounds are soft. If age.min = age.max, thismeans that the age is exactly known. This data frame can be built withmakeChronosCalib which returns by default a data frame with asingle row giving age = 1 for the root. The data frame can be builtinteractively by clicking on the plotted tree.

The argumentcontrol allows one to change some parameters ofthe optimisation procedure. This must be a list with names. Theavailable options with their default values are:

tol = 1e-8: tolerance for the estimation of the substitutionrates.
iter.max = 1e4: the maximum number of iterations at eachoptimization step.
eval.max = 1e4: the maximum number of function evaluations ateach optimization step.
nb.rate.cat = 10: the number of rate categories ifmodel= "discrete" (set this parameter to 1 to fit a strict clockmodel).
dual.iter.max = 20: the maximum number of alternativeiterations between rates and dates.
epsilon = 1e-6: the convergence diagnostic criterion.

Usingmodel = "clock" is actually a short-cut tomodel = "discrete" and settingnb.rate.cat = 1 in the list passed tocontrol.

The commandchronos.control() returns a list with the defaultvalues of these parameters. They may be modified by passing them tothis function, or directly in the list.

Value

chronos returns an object of classc("chronos", "phylo"). There is a print method for it. There are additionalattributes which can be visualised withstr or extracted withattr.

makeChronosCalib returns a data frame.

chronos.control returns a list.

Author(s)

Emmanuel Paradis, Santiago Claramunt, Guillaume Louvel

References

Kim, J. and Sanderson, M. J. (2008) Penalized likelihood phylogeneticinference: bridging the parsimony-likelihood gap.SystematicBiology,57, 665–674.

Paradis, E. (2013) Molecular dating of phylogenies by likelihoodmethods: a comparison of models and a new informationcriterion.Molecular Phylogenetics and Evolution,67,436–444.

Sanderson, M. J. (2002) Estimating absolute rates of molecularevolution and divergence times: a penalized likelihoodapproach.Molecular Biology and Evolution,19,101–109.

Examples

library(ape)tr <- rtree(10)### the default is the correlated rate model:chr <- chronos(tr)### strict clock model:ctrl <- chronos.control(nb.rate.cat = 1)chr.clock <- chronos(tr, model = "discrete", control = ctrl)### How different are the rates?attr(chr, "rates")attr(chr.clock, "rates")## Not run: cal <- makeChronosCalib(tr, interactive = TRUE)cal### if you made mistakes, you can edit the data frame with:### fix(cal)chr <- chronos(tr, calibration = cal)## End(Not run)

Multiple Sequence Alignment with External Applications

Description

These functions call their respective program fromR to align a setof nucleotide sequences of class"DNAbin" or"AAbin". The application(s) must be installed seperately and itis highly recommended to do this so that the executables are in adirectory located on the PATH of the system.

This version includes an experimental version ofmuscle5 whichcalls MUSCLE5 (see the link to the documentation in the Referencesbelow);muscle still calls MUSCLE version 3. Note that theexecutable of MUSCLE5 is also named ‘muscle’ by the defaultcompilation setting.

The functionsefastats andletterconf require MUSCLE5.

Usage

clustal(x, y, guide.tree, pw.gapopen = 10, pw.gapext = 0.1,        gapopen = 10, gapext = 0.2, exec = NULL, MoreArgs = "",        quiet = TRUE, original.ordering = TRUE, file)clustalomega(x, y, guide.tree, exec = NULL,MoreArgs = "",              quiet = TRUE, original.ordering = TRUE, file)muscle(x, y, guide.tree, exec, MoreArgs = "",        quiet = TRUE, original.ordering = TRUE, file)muscle5(x, exec = "muscle", MoreArgs = "", quiet = FALSE,        file, super5 = FALSE, mc.cores = 1)tcoffee(x, exec = "t_coffee", MoreArgs = "", quiet = TRUE,        original.ordering = TRUE)efastats(X, exec = "muscle", quiet = FALSE)letterconf(X, exec = "muscle")

Arguments

x

an object of class"DNAbin" or"AAbin" (can bemissing).

y

an object of class"DNAbin" or"AAbin" used forprofile alignment (can be missing).

guide.tree

guide tree, an object of class"phylo" (canbe missing).

pw.gapopen,pw.gapext

gap opening and gap extension penaltiesused by Clustal during pairwise alignments.

gapopen,gapext

idem for global alignment.

exec

a character string giving the name of the program, withits path if necessary.clustal tries to guess this argumentdepending on the operating system (see details).

MoreArgs

a character string giving additional options.

quiet

a logical: the default is to not print onR's console themessages from the external program.

original.ordering

a logical specifying whether to return thealigned sequences in the same order than inx (TRUE bydefault).

file

a file with its path if results should be stored (can bemissing).

super5

a logical value. By default, the PPP algorithm is used.

mc.cores

the number of cores to be used by MUSCLE5.

X

a list with several alignments of the same sequences withall with the same row order.

Details

It is highly recommended to install the executables properly so thatthey are in a directory located on the PATH (i.e., accessible from anyother directory). Alternatively, the full path to the executablemay be given (e.g.,exec = "~/muscle/muscle"), or a (symbolic)link may be copied in the working directory. For Debian and itsderivatives (e.g., Ubuntu), it is recommended to use the binariesdistributed by Debian.

clustal tries to guess the name of the executable programdepending on the operating system. Specifically, the followings areused: “clustalw” under Linux, “clustalw2” under MacOS, and“clustalw2.exe” under Windows. Forclustalomega,“clustalo[.exe]” is the default on all systems (with no specificpath).

When called without arguments (i.e.,clustal(), ...), thefunction prints the options of the program which may be passed toMoreArgs.

Sinceape 5.1,clustal,clustalomega, andmuscle can align AA sequences as well as DNA sequences.

Value

an object of class"DNAbin" or"AAbin" with the alignedsequences.

efastats returns a data frame.

letterconf opens the default Web brower.

Author(s)

Emmanuel Paradis, Franz Krah

References

Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J.,Higgins, D. G. and Thompson, J. D. (2003) Multiple sequence alignmentwith the Clustal series of programs.Nucleic Acids Research31, 3497–3500.http://www.clustal.org/

Edgar, R. C. (2004) MUSCLE: Multiple sequence alignment with highaccuracy and high throughput.Nucleic Acids Research,32, 1792–1797.http://www.drive5.com/muscle/muscle_userguide3.8.html

Notredame, C., Higgins, D. and Heringa, J. (2000) T-Coffee: A novelmethod for multiple sequence alignments.Journal of MolecularBiology,302, 205–217.https://tcoffee.org/

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W.,Lopez, R., McWilliam, H., Remmert, M., S\"oding, J., Thompson,J. D. and Higgins, D. G. (2011) Fast, scalable generation ofhigh-quality protein multiple sequence alignments using ClustalOmega.Molecular Systems Biology,7, 539.http://www.clustal.org/

https://drive5.com/muscle5/

Examples

## Not run: ### display the options:clustal()clustalomega()muscle()tcoffee()data(woodmouse)### open gaps more easily:clustal(woodmouse, pw.gapopen = 1, pw.gapext = 1)### T-Coffee requires negative values (quite slow; muscle() is much faster):tcoffee(woodmouse,  MoreArgs = "-gapopen=-10 -gapext=-2")## End(Not run)

Coalescent Intervals

Description

This function extracts or generates information about coalescent intervals(number of lineages, interval lengths, interval count, total depth) froma phylogenetic tree or a list of internode distances. The input treeneeds to be ultra-metric (i.e. clock-like).

Usage

coalescent.intervals(x)

Arguments

x

either an ultra-metric phylogenetic tree (i.e. an object ofclass"phylo") or, alternatively, a vector of interval lengths.

Value

An object of class"coalescentIntervals" with the following entries:

lineages

A vector with the number of lineages at the start of each coalescentinterval.

interval.length

A vector with the length of each coalescentinterval.

interval.count

The total number of coalescentintervals.

total.depth

The sum of the lengths of all coalescentintervals.

Author(s)

Korbinian Strimmer

Examples

data("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load treeci <- coalescent.intervals(tree.hiv) # from treecidata("hivtree.table") # same tree, but in table formatci <- coalescent.intervals(hivtree.table$size) # from vector of interval lengthsci

Collapse Single Nodes

Description

collapse.singles deletes the single nodes (i.e., with a singledescendant) in a tree.

has.singles tests for the presence of single node(s) in a tree.

Usage

collapse.singles(tree, root.edge = FALSE)has.singles(tree)

Arguments

tree

an object of class"phylo".

root.edge

whether to get the singleton edges from the rootuntil the first bifurcating node and put them asroot.edge ofthe returned tree. By default, this is ignored or if the tree has noedge lengths (see examples).

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis, Klaus Schliep

Examples

## a tree with 3 tips and 3 nodes:e <- c(4L, 6L, 6L, 5L, 5L, 6L, 1L, 5L, 3L, 2L)dim(e) <- c(5, 2)tr <- structure(list(edge = e, tip.label = LETTERS[1:3], Nnode = 3L),                class = "phylo")trhas.singles(tr)## the following shows that node #4 (ie, the root) is a singleton## and node #6 is the first bifurcating nodetr$edge## A bifurcating tree has less nodes than it has tips:## the following used to fail with ape 4.1 or lower:plot(tr)collapse.singles(tr) # only 2 nodes## give branch lengths to use the 'root.edge' option:tr$edge.length <- runif(5)str(collapse.singles(tr, TRUE)) # has a 'root.edge'

Collapsed Coalescent Intervals

Description

This function takes a"coalescentIntervals" objects and collapses neighbouringcoalescent intervals into a single combined interval so that every collapsed interval islarger thanepsilon. Collapsed coalescent intervals are used, e.g., to obtain thegeneralized skyline plot (skyline). Forepsilon = 0 no intervalis collapsed.

Usage

collapsed.intervals(ci, epsilon=0)

Arguments

ci

coalescent intervals (i.e. an object of class"coalescentIntervals").

epsilon

collapsing parameter that controls the amount of smoothing(allowed range: from0 toci$total.depth)

Details

Proceeding from the tips to the root of the tree each smallinterval is pooled with the neighboring interval closer to the root. If theneighboring interval is also small, then pooling continues until the compositeinterval is larger thanepsilon. Note that this approach prevents theoccurrence of zero-length intervals at the present.For more details see Strimmer and Pybus (2001).

Value

An object of class"collapsedIntervals" with the following entries:

lineages

A vector with the number of lineages at the start of each coalescentinterval.

interval.length

A vector with the length of each coalescentinterval.

collapsed.interval

A vector indicating for each coalescent interval to whichcollapsed interval it belongs.

interval.count

The total number of coalescentintervals.

collapsed.interval.count

The number of collapsed intervals.

total.depth

The sum of the lengths of all coalescentintervals.

epsilon

The value of the underlying smoothing parameter.

Author(s)

Korbinian Strimmer

References

Strimmer, K. and Pybus, O. G. (2001) Exploring the demographic historyof DNA sequences using the generalized skyline plot.MolecularBiology and Evolution,18, 2298–2305.

Examples

data("hivtree.table") # example tree# colescent intervals from vector of interval lengthsci <- coalescent.intervals(hivtree.table$size)ci# collapsed intervalscl1 <- collapsed.intervals(ci,0)cl2 <- collapsed.intervals(ci,0.0119)cl1cl2

Cheverud's Comparative Method

Description

This function computes the phylogenetic variance component and theresidual deviation for continous characters, taking into account thephylogenetic relationships among species, following the comparativemethod described in Cheverud et al. (1985). The correction proposed byRholf (2001) is used.

Usage

compar.cheverud(y, W, tolerance = 1e-06, gold.tol = 1e-04)

Arguments

y

A vector containing the data to analyse.

W

The phylogenetic connectivity matrix. All diagonal elementswill be ignored.

tolerance

Minimum difference allowed to consider eigenvalues asdistinct.

gold.tol

Precision to use in golden section search alogrithm.

Details

Model:

y = \rho W y + e

wheree is the error term, assumed to be normally distributed.\rho is estimated by the maximum likelihood procedure givenin Rohlf (2001), using a golden section search algorithm. The code ofthis function is indeed adapted from a MatLab code given in appendixin Rohlf's article, to correct a mistake in Cheverud's original paper.

Value

A list with the following components:

rhohat

The maximum likelihood estimate of\rho

Wnorm

The normalized version ofW

residuals

Error terms (e)

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

References

Cheverud, J. M., Dow, M. M. and Leutenegger, W. (1985) The quantitativeassessment of phylogenetic constraints in comparative analyses: sexualdimorphism in body weight among primates.Evolution,39, 1335–1351.

Rohlf, F. J. (2001) Comparative methods for the analysis of continuousvariables: geometric interpretations.Evolution,55,2143–2160.

Harvey, P. H. and Pagel, M. D. (1991)The Comparative Method inEvolutionary Biology. Oxford University Press.

Examples

### Example from Harvey and Pagel's book:y<-c(10,8,3,4)W <- matrix(c(1,1/6,1/6,1/6,1/6,1,1/2,1/2,1/6,1/2,1,1,1/6,1/2,1,1), 4)compar.cheverud(y,W)### Example from Rohlf's 2001 article:W<- matrix(c(  0,1,1,2,0,0,0,0,  1,0,1,2,0,0,0,0,  1,1,0,2,0,0,0,0,  2,2,2,0,0,0,0,0,  0,0,0,0,0,1,1,2,  0,0,0,0,1,0,1,2,  0,0,0,0,1,1,0,2,  0,0,0,0,2,2,2,0),8)W <- 1/WW[W == Inf] <- 0y<-c(-0.12,0.36,-0.1,0.04,-0.15,0.29,-0.11,-0.06)compar.cheverud(y,W)

Comparative Analysis with GEEs

Description

compar.gee performs the comparative analysis using generalizedestimating equations as described by Paradis and Claude (2002).

drop1 tests single effects of a fitted model output fromcompar.gee.

predict returns the predicted (fitted) values of the model.

Usage

compar.gee(formula, data = NULL, family = "gaussian", phy, corStruct,          scale.fix = FALSE, scale.value = 1)## S3 method for class 'compar.gee'drop1(object, scope, quiet = FALSE, ...)## S3 method for class 'compar.gee'predict(object, newdata = NULL, type = c("link", "response"), ...)

Arguments

formula

a formula giving the model to be fitted.

data

the name of the data frame where the variables informula are to be found; by default, the variables are lookedfor in the global environment.

family

a function specifying the distribution assumed for theresponse; by default a Gaussian distribution (with link identity) isassumed (see?family for details on specifying thedistribution, and on changing the link function).

phy

an object of class"phylo" (ignored ifcorStruct is used).

corStruct

a (phylogenetic) correlation structure.

scale.fix

logical, indicates whether the scale parameter shouldbe fixed (TRUE) or estimated (FALSE, the default).

scale.value

ifscale.fix = TRUE, gives the value for thescale (default:scale.value = 1).

object

an object of class"compar.gee" resulting fromfittingcompar.gee.

scope

<unused>.

quiet

a logical specifying whether to display a warning messageabout eventual “marginality principle violation”.

newdata

a data frame with column names matching the variablesin the formula of the fitted object (seepredict for details).

type

a character string specifying the type of predictedvalues. By default, the linear (link) prediction is returned.

...

further arguments to be passed todrop1.

Details

If a data frame is specified for the argumentdata, then itsrownames are matched to the tip labels ofphy. The user must becareful here since the function requires that both series of namesperfectly match, so this operation may fail if there is a typing orsyntax error. If both series of names do not match, the values in thedata frame are taken to be in the same order than the tip labels ofphy, and a warning message is issued.

Ifdata = NULL, then it is assumed that the variables are inthe same order than the tip labels ofphy.

Value

compar.gee returns an object of class"compar.gee" withthe following components:

call

the function call, including the formula.

effect.assign

a vector of integers assigning the coefficientsto the effects (used bydrop1).

nobs

the number of observations.

QIC

the quasilikelihood information criterion as defined by Pan(2001).

coefficients

the estimated coefficients (or regression parameters).

residuals

the regression residuals.

family

a character string, the distribution assumed for the response.

link

a character string, the link function used for the mean function.

scale

the scale (or dispersion parameter).

W

the variance-covariance matrix of the estimated coefficients.

dfP

the phylogenetic degrees of freedom (see Paradis and Claudefor details on this).

drop1 returns an object of class"anova".

predict returns a vector or a data frame ifnewdata is used.

Note

The calculation of the phylogenetic degrees of freedom is likely to beapproximative for non-Brownian correlation structures (this will berefined soon).

The calculation of the quasilikelihood information criterion (QIC)needs to be tested.

Author(s)

Emmanuel Paradis

References

Pan, W. (2001) Akaike's information criterion in generalizedestimating equations.Biometrics,57, 120–125.

Paradis, E. and Claude J. (2002) Analysis of comparative data usinggeneralized estimating equations.Journal of theoreticalBiology,218, 175–185.

Examples

### The example in Phylip 3.5c (originally from Lynch 1991)### (the same analysis than in help(pic)...)tr <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = tr)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)### Both regressions... the results are quite close to those obtained### with pic().compar.gee(X ~ Y, phy = tree.primates)compar.gee(Y ~ X, phy = tree.primates)### Now do the GEE regressions through the origin: the results are quite### different!compar.gee(X ~ Y - 1, phy = tree.primates)compar.gee(Y ~ X - 1, phy = tree.primates)

Lynch's Comparative Method

Description

This function computes the heritable additive value and the residualdeviation for continous characters, taking into account thephylogenetic relationships among species, following the comparativemethod described in Lynch (1991).

Usage

compar.lynch(x, G, eps = 1e-4)

Arguments

x

eiher a matrix, a vector, or a data.frame containing the datawith species as rows and variables as columns.

G

a matrix that can be interpreted as an among-species correlationmatrix.

eps

a numeric value to detect convergence of the EM algorithm.

Details

The parameter estimates are computed following the EM(expectation-maximization) algorithm. This algorithm usually leads toconvergence but may lead to local optima of the likelihoodfunction. It is recommended to run several times the function in orderto detect these potential local optima. The ‘optimal’ value foreps depends actually on the range of the data and may bechanged by the user in order to check the stability of the parameterestimates. Convergence occurs when the differences between twosuccessive iterations of the EM algorithm leads to differences betweenboth residual and additive values less than or equal toeps.

Value

A list with the following components:

vare

estimated residual variance-covariance matrix.

vara

estimated additive effect variance covariance matrix.

u

estimates of the phylogeny-wide means.

A

addtitive value estimates.

E

residual values estimates.

lik

logarithm of the likelihood for the entire set of observedtaxon-specific mean.

Note

The present function does not perform the estimation of ancestralphentoypes as proposed by Lynch (1991). This will be implemented ina future version.

Author(s)

Julien Claudejulien.claude@umontpellier.fr

References

Lynch, M. (1991) Methods for the analysis of comparative data inevolutionary biology.Evolution,45, 1065–1080.

Examples

### The example in Lynch (1991)x <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = x)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)compar.lynch(cbind(X, Y),             G = vcv.phylo(tree.primates, cor = TRUE))

Ornstein–Uhlenbeck Model for Continuous Characters

Description

This function fits an Ornstein–Uhlenbeck model giving a phylogenetictree, and a continuous character. The user specifies the node(s) wherethe optimum changes. The parameters are estimated by maximumlikelihood; their standard-errors are computed assuming normality ofthese estimates.

Usage

compar.ou(x, phy, node = NULL, alpha = NULL)

Arguments

x

a numeric vector giving the values of a continuouscharacter.

phy

an object of class"phylo".

node

a vector giving the number(s) of the node(s) where theparameter ‘theta’ (the trait optimum) is assumed to change. Thenode(s) can be specified with their labels ifphy has nodelabels. By default there is no change (same optimum thoughout lineages).

alpha

the value of\alpha to be used when fittingthe model. By default, this parameter is estimated (see details).

Details

The Ornstein–Uhlenbeck (OU) process can be seen as a generalizationof the Brownian motion process. In the latter, characters are assumedto evolve randomly under a random walk, that is change is equallylikely in any direction. In the OU model, change is more likelytowards the direction of an optimum (denoted\theta) witha strength controlled by a parameter denoted\alpha.

The present function fits a model where the optimum parameter\theta, is allowed to vary throughout the tree. This isspecified with the argumentnode:\theta changesafter each node whose number is given there. Note that the optimumchangesonly for the lineages which are descendants of thisnode.

Hansen (1997) recommends to not estimate\alpha togetherwith the other parameters. The present function allows this by givinga numeric value to the argumentalpha. By default, thisparameter is estimated, but this seems to yield very largestandard-errors, thus validating Hansen's recommendation. In practice,a “poor man estimation” of\alpha can be done byrepeating the function call with different values ofalpha, andselecting the one that minimizes the deviance (see Hansen 1997 for anexample).

Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.

The user must be careful here since the function requires that bothseries of names perfectly match, so this operation may fail if thereis a typing or syntax error. If both series of names do not match, thevalues in thex are taken to be in the same order than the tiplabels ofphy, and a warning message is issued.

Value

an object of class"compar.ou" which is list with the followingcomponents:

deviance

the deviance (= -2 * loglik).

para

a data frame with the maximum likelihood estimates andtheir standard-errors.

call

the function call.

Note

The inversion of the variance-covariance matrix in the likelihoodfunction appeared as somehow problematic. The present implementationuses a Cholevski decomposition with the functionchol2inv instead of the usual functionsolve.

Author(s)

Emmanuel Paradis

References

Hansen, T. F. (1997) Stabilizing selection and the comparativeanalysis of adaptation.Evolution,51, 1341–1351.

Examples

data(bird.orders)### This is likely to give you estimates close to 0, 1, and 0### for alpha, sigma^2, and theta, respectively:compar.ou(x <- rnorm(23), bird.orders)### Much better with a fixed alpha:compar.ou(x, bird.orders, alpha = 0.1)### Let us 'mimick' the effect of different optima### for the two clades of birds...x <- c(rnorm(5, 0), rnorm(18, 5))### ... the model with two optima:compar.ou(x, bird.orders, node = 25, alpha = .1)### ... and the model with a single optimum:compar.ou(x, bird.orders, node = NULL, alpha = .1)### => Compare both models with the difference in deviances##     which follows a chi^2 with df = 1.

Compare Two "phylo" Objects

Description

This function compares two phylogenetic trees, rooted or unrooted, andreturns a detailed report of this comparison.

Usage

comparePhylo(x, y, plot = FALSE, force.rooted = FALSE,             use.edge.length = FALSE, commons = TRUE,             location = "bottomleft", ...)## S3 method for class 'comparePhylo'print(x, ...)

Arguments

x,y

two objects of class"phylo".

plot

a logical value. IfTRUE, the two trees are plottedon the same device and their similarities are shown.

force.rooted

a logical value. IfTRUE, the trees areconsidered rooted even ifis.rooted returnsFALSE.

use.edge.length

a logical value passed toplot.phylo (see below).

commons

whether to show the splits (the default), or the splitsspecific to each tree (applies only for unrooted trees).

location

location of where to position thelegend.

...

further parameters used byplot.phylo, in functionprint.comparePhylo unused.

Details

In all cases, the numbers of tips and of nodes and the tip labels arecompared.

If both trees are rooted, or ifforce.rooted = TRUE, the cladecompositions of each tree are compared. If both trees are alsoultrametric, their branching times are compared.

If both trees are unrooted and have the same number of nodes, thebipartitions (aka splits) are compared.

Ifplot = TRUE, the edge lengths are not used by defaultbecause in some situations with unrooted trees, some splits might notbe visible if the corresponding internal edge length is very short. Touse edge lengths, setuse.edge.length = TRUE.

Value

an object of class"comparePhylo" which is a list with messagesfrom the comparison and, optionally, tables comparing branching times.

Author(s)

Emmanuel Paradis, Klaus Schliep

Examples

## two unrooted trees but force comparison as rooted:a <- read.tree(text = "(a,b,(c,d));")b <- read.tree(text = "(a,c,(b,d));")comparePhylo(a, b, plot = TRUE, force.rooted = TRUE)## two random unrooted trees:c <- rtree(5, rooted = FALSE)d <- rtree(5, rooted = FALSE)comparePhylo(c, d, plot = TRUE)

Branch Lengths Computation

Description

This function computes branch lengths of a tree using differentmethods.

Usage

compute.brlen(phy, method = "Grafen", power = 1, ...)

Arguments

phy

an object of classphylo representing the tree.

method

the method to be used to compute the branch lengths;this must be one of the followings: (i)"Grafen" (thedefault), (ii) a numeric vector, or (iii) a function.

power

The power at which heights must be raised (see below).

...

further argument(s) to be passed tomethod if it isa function.

Details

Grafen's (1989) computation of branch lengths: each node is given a‘height’, namely the number of leaves of the subtree minus one, 0 forleaves. Each height is scaled so that root height is 1, and thenraised at power 'rho' (> 0). Branch lengths are then computed as thedifference between height of lower node and height of upper node.

If one or several numeric values are provided asmethod, theyare recycled if necessary. If a function is given instead, furtherarguments are given in place of... (they must be named, seeexamples).

Zero-length branches are not treated as multichotomies, and thus mayneed to be collapsed (seedi2multi).

Value

An object of classphylo with branch lengths.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de andEmmanuel Paradis

References

Grafen, A. (1989) The phylogenetic regression.PhilosophicalTransactions of the Royal society of London. Series B. BiologicalSciences,326, 119–157.

Examples

data(bird.orders)plot(compute.brlen(bird.orders, 1))plot(compute.brlen(bird.orders, runif, min = 0, max = 5))layout(matrix(1:4, 2, 2))plot(compute.brlen(bird.orders, power=1), main=expression(rho==1))plot(compute.brlen(bird.orders, power=3), main=expression(rho==3))plot(compute.brlen(bird.orders, power=0.5), main=expression(rho==0.5))plot(compute.brlen(bird.orders, power=0.1), main=expression(rho==0.1))layout(1)

Compute and Set Branching Times

Description

This function computes the branch lengths of a tree giving itsbranching times (aka node ages or heights).

Usage

compute.brtime(phy, method = "coalescent", force.positive = NULL)

Arguments

phy

an object of class"phylo".

method

either"coalescent" (the default), or a numericvector giving the branching times.

force.positive

a logical value (see details).

Details

By default, a set of random branching times is generated from a simplecoalescent, and the optionforce.positive is set toTRUEso that no branch length is negative.

If a numeric vector is passed tomethod, it is taken as thebranching times of the nodes with respect to their numbers (i.e., thefirst element ofmethod is the branching time of the nodenumberedn + 1 [= the root], the second element of the nodenumberedn + 2, and so on), soforce.positive is set toFALSE. This may result in negative branch lengths. To avoidthis, one should useforce.positive = TRUE in which case thebranching times are eventually reordered.

Value

An object of class"phylo" with branch lengths and ultrametric.

Author(s)

Emmanuel Paradis

Examples

tr <- rtree(10)layout(matrix(1:4, 2))plot(compute.brtime(tr)); axisPhylo()plot(compute.brtime(tr, force.positive = FALSE)); axisPhylo()plot(compute.brtime(tr, 1:9)); axisPhylo() # a bit nonsenseplot(compute.brtime(tr, 1:9, TRUE)); axisPhylo()layout(1)

Concensus Trees

Description

Given a series of trees, this function returns the consensus tree. Bydefault, the strict-consensus tree is computed. To get themajority-rule consensus tree, usep = 0.5. Any value between0.5 and 1 can be used.

Usage

consensus(..., p = 1, check.labels = TRUE, rooted = FALSE)

Arguments

...

either (i) a single object of class"phylo", (ii) aseries of such objects separated by commas, or (iii) a listcontaining such objects.

p

a numeric value between 0.5 and 1 giving the proportion for aclade to be represented in the consensus tree.

check.labels

a logical specifying whether to check the labelsof each tree. IfFALSE (the default), it is assumed that alltrees have the same tip labels, and that they are in the same order(see details).

rooted

a logical specifying whether the trees should be treated asrooted or not.

Details

Usingcheck.labels = FALSE results inconsiderable decrease in computing times. This requires that alltrees have the same tip labels,and these labels areordered similarly in all trees (in other words, the elementtip.label are identical in all trees).

Untilape 5.6-2, the trees passed to this function wereimplicitly treated as rooted, even when the optionrooted = FALSE was used. This is now fixed (see PR65 on GitHub) so that, bydefault, the trees are explicitly treated as unrooted (even ifis.rooted returnsTRUE). Thus, it couldbe that results now differ from previous analyses (settingrooted = TRUE might help to replicate previous results).

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis

References

Felsenstein, J. (2004)Inferring Phylogenies. Sunderland:Sinauer Associates.

Pairwise Distances from a Phylogenetic Tree

Description

cophenetic.phylo computes the pairwise distances between thepairs of tips from a phylogenetic tree using its branch lengths.

dist.nodes does the same but between all nodes, internal andterminal, of the tree.

Usage

## S3 method for class 'phylo'cophenetic(x)dist.nodes(x, fail.if.no.length = FALSE)

Arguments

x

an object of class"phylo".

fail.if.no.length

a logical values. If the tree has no branchlengths, these are all fixed to one (with a warning) so thecomputation is done. If you prefer to catch the case of no branchlengths with an error, set this option toTRUE.

Value

a numeric matrix with colnames and rownames set to the names of thetips (as given by the elementtip.label of the argumentphy), or, in the case ofdist.nodes, the numbers of thetips and the nodes (as given by the elementedge).

Author(s)

Emmanuel Paradis

Plots two phylogenetic trees face to face with links between the tips.

Description

This function plots two trees face to face with the links if specified. It is possible to rotate the branches of each tree around the nodes by clicking.

Usage

cophyloplot(x, y, assoc = NULL, use.edge.length = FALSE, space = 0,       length.line = 1, gap = 2, type = "phylogram", rotate = FALSE,       col = par("fg"), lwd = par("lwd"), lty = par("lty"),       show.tip.label = TRUE, font = 3, ...)

Arguments

x,y

two objects of class"phylo".

assoc

a matrix with 2 columns specifying the associationsbetween the tips. If NULL, no links will be drawn.

use.edge.length

a logical indicating whether the branch lengthsshould be used to plot the trees; default is FALSE.

space

a positive value that specifies the distance between thetwo trees.

length.line

a positive value that specifies the length of thehorizontal line associated to each taxa. Default is 1.

gap

a value specifying the distance between the tips of thephylogeny and the lines.

type

a character string specifying the type of phylogeny to bedrawn; it must be one of "phylogram" (the default) or "cladogram".

rotate

a logical indicating whether the nodes of the phylogenycan be rotated by clicking. Default is FALSE.

col

a character vector indicating the color to be used for thelinks; recycled as necessary.

lwd

id. for the width.

lty

id. for the line type.

show.tip.label

a logical indicating whether to show the tiplabels on the phylogeny (defaults to 'TRUE', i.e. the labels areshown).

font

an integer specifying the type of font for thelabels: 1 (plain text), 2 (bold), 3 (italic, the default), or 4(bold italic).

...

(unused)

Details

The aim of this function is to plot simultaneously two phylogenetic trees with associated taxa. The two trees do not necessarily have the same number of tips and more than one tip in one phylogeny can be associated with a tip in the other.

The association matrix used to draw the links has to be a matrix with two columns containing the names of the tips. One line in the matrix represents one link on the plot. The first column of the matrix has to contain tip labels of the first tree (phy1) and the second column of the matrix, tip labels of the second tree (phy2). There is no limit (low or high) for the number of lines in the matrix. A matrix with two colums and one line will give a plot with one link.

Argumentsgap,length.line andspace have to be changed to get a nice plot of the two phylogenies. Note that the function takes into account the length of the character strings corresponding to the names at the tips, so that the lines do not overwrite those names.

Therotate argument can be used to transform both phylogenies in order to get the more readable plot (typically by decreasing the number of crossing lines). This can be done by clicking on the nodes. The escape button or right click take back to the console.

Author(s)

Damien de Viennedamien.de-vienne@u-psud.fr

Examples

#two random treestree1 <- rtree(40)tree2 <- rtree(20)#creation of the association matrix:association <- cbind(tree2$tip.label, tree2$tip.label)cophyloplot(tree1, tree2, assoc = association,            length.line = 4, space = 28, gap = 3)#plot with rotations## Not run: cophyloplot(tree1, tree2, assoc=association, length.line=4, space=28, gap=3, rotate=TRUE)## End(Not run)

Blomberg et al.'s Correlation Structure

Description

The “ACDC” (accelerated/decelerated) model assumes that continuoustraits evolve under a Brownian motion model which rates accelerates(ifg < 1) or decelerates (ifg > 1) throughtime. Ifg = 1, then the model reduces to a Brownian motionmodel.

Usage

corBlomberg(value, phy, form = ~1, fixed = FALSE)## S3 method for class 'corBlomberg'corMatrix(object, covariate = getCovariate(object),                   corr = TRUE, ...)## S3 method for class 'corBlomberg'coef(object, unconstrained = TRUE, ...)

Arguments

value

the (initial) value of the parameterg.

phy

an object of class"phylo".

form

a one sided formula of the form ~ t, or ~ t | g,specifying the taxa covariate t and, optionally, a grouping factorg. A covariate for this correlation structure must be charactervalued, with entries matching the tip labels in the phylogenetictree. When a grouping factor is present in form, the correlationstructure is assumed to apply only to observations within the samegrouping level; observations with different grouping levels areassumed to be uncorrelated. Defaults to ~ 1, which corresponds tousing the order of the observations in the data as a covariate, andno groups.

fixed

a logical specifying whethergls shouldestimate\gamma (the default) or keep it fixed.

object

an (initialized) object of class"corBlomberg".

covariate

an optional covariate vector (matrix), or list ofcovariate vectors (matrices), at which values the correlationmatrix, or list of correlation matrices, are to beevaluated. Defaults to getCovariate(object).

corr

a logical value specifying whether to return thecorrelation matrix (the default) or the variance-covariance matrix.

unconstrained

a logical value. IfTRUE (the default),the coefficients are returned in unconstrained form (the same usedin the optimization algorithm). IfFALSE the coefficients arereturned in “natural”, possibly constrained, form.

...

further arguments passed to or from other methods.

Value

an object of class"corBlomberg", the coefficients from anobject of this class, or the correlation matrix of an initializedobject of this class. In most situations, onlycorBlomberg willbe called by the user.

Author(s)

Emmanuel Paradis

References

Blomberg, S. P., Garland, Jr, T., and Ives, A. R. (2003) Testing forphylogenetic signal in comparative data: behavioral traits are morelabile.Evolution,57, 717–745.

Brownian Correlation Structure

Description

Expected covariance under a Brownian model (Felsenstein 1985,Martinsand Hansen 1997)

V_{ij} = \gamma \times t_a

wheret_a is the distance on the phylogeny between the rootand the most recent common ancestor of taxai andjand\gamma is a constant.

Usage

corBrownian(value=1, phy, form=~1)## S3 method for class 'corBrownian'coef(object, unconstrained = TRUE, ...)## S3 method for class 'corBrownian'corMatrix(object, covariate = getCovariate(object), corr = TRUE, ...)

Arguments

value

The\gamma parameter (default to 1). Theexact value has no effect on model fitting with PGLS.

phy

An object of classphylo representing the phylogeny(with branch lengths) to consider.

object

An (initialized) object of classcorBrownian.

corr

a logical value. If 'TRUE' the function returns thecorrelation matrix, otherwise it returns the variance/covariance matrix.

form

a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.

covariate

an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object).

unconstrained

a logical value. If 'TRUE' the coefficients are returnedin unconstrained form (the same used in the optimizationalgorithm). If 'FALSE' the coefficients are returned in"natural", possibly constrained, form. Defaults to 'TRUE'

...

some methods for these generics require additional arguments.None are used in these methods.

Value

An object of classcorBrownian, or the coefficient from anobject of this class (actually sendsnumeric(0)), or thecorrelation matrix of an initialized object of this class.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

References

Felsenstein, J. (1985) Phylogenies and the comparative method.American Naturalist,125, 1–15.

Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparativemethod: a general approach to incorporating phylogenetic informationinto the analysis of interspecific data.American Naturalist,149, 646–667.

Phylogenetic Correlation Structures

Description

Classes of phylogenetic correlation structures ("corPhyl")available inape.

corBrownian: Brownian motion model (Felsenstein 1985)
corMartins: The covariance matrix defined in Martins and Hansen(1997)
corGrafen: The covariance matrix defined in Grafen (1989)
corPagel: The covariance matrix defined in Freckelton et al. (2002)
corBlomberg: The covariance matrix defined in Blomberg et al. (2003)

See the help page of each class for references and detaileddescription.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de, EmmanuelParadis

Examples

library(nlme)txt <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = txt)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")dat <- data.frame(Species = Species, X = X, Y = Y)m1 <- gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = ~Species))summary(m1)m2 <- gls(Y ~ X, dat, correlation=corMartins(1, tree.primates, form = ~Species))summary(m2)corMatrix(m2$modelStruct$corStruct)m3 <- gls(Y ~ X, dat, correlation=corGrafen(1, tree.primates, form = ~Species))summary(m3)corMatrix(m3$modelStruct$corStruct)

Grafen's (1989) Correlation Structure

Description

Grafen's (1989) covariance structure. Branch lengths are computed usingGrafen's method (seecompute.brlen). The covariancematrice is then the traditional variance-covariance matrix for aphylogeny.

Usage

corGrafen(value, phy, form=~1, fixed = FALSE)## S3 method for class 'corGrafen'coef(object, unconstrained = TRUE, ...)## S3 method for class 'corGrafen'corMatrix(object,                  covariate = getCovariate(object), corr = TRUE, ...)

Arguments

value

The\rho parameter

phy

An object of classphylo representing the phylogeny(branch lengths are ignored) to consider

object

An (initialized) object of classcorGrafen

corr

a logical value. If 'TRUE' the function returns thecorrelation matrix, otherwise it returns the variance/covariancematrix.

fixed

an optional logical value indicating whether thecoefficients should be allowed to vary in the optimization, or keptfixed at their initial value. Defaults to 'FALSE', in which case thecoefficients are allowed to vary.

form

a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.

covariate

an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object).

unconstrained

a logical value. If 'TRUE' the coefficients arereturned in unconstrained form (the same used in the optimizationalgorithm). If 'FALSE' the coefficients are returned in "natural",possibly constrained, form. Defaults to 'TRUE'

...

some methods for these generics require additionalarguments. None are used in these methods.

Value

An object of classcorGrafen or the rho coefficient from anobject of this class or the correlation matrix of an initializedobject of this class.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

References

Grafen, A. (1989) The phylogenetic regression.PhilosophicalTransactions of the Royal society of London. Series B. BiologicalSciences,326, 119–157.

Martins's (1997) Correlation Structure

Description

Martins and Hansen's (1997) covariance structure:

V_{ij} = \gamma \times e^{-\alpha t_{ij}}

wheret_{ij} is the phylogenetic distance between taxai andj and\gamma is a constant.

Usage

corMartins(value, phy, form = ~1, fixed = FALSE)## S3 method for class 'corMartins'coef(object, unconstrained = TRUE, ...)## S3 method for class 'corMartins'corMatrix(object,covariate = getCovariate(object), corr = TRUE, ...)

Arguments

value

The\alpha parameter

phy

An object of classphylo representing the phylogeny(with branch lengths) to consider

object

An (initialized) object of classcorMartins

corr

a logical value. If 'TRUE' the function returns thecorrelation matrix, otherwise it returns the variance/covariancematrix.

fixed

an optional logical value indicating whether thecoefficients should be allowed to vary in the optimization, ok keptfixed at their initial value. Defaults to 'FALSE', in which case thecoefficients are allowed to vary.

form

a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.

covariate

an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object).

unconstrained

...

some methods for these generics require additional arguments.None are used in these methods.

Value

An object of classcorMartins or the alpha coefficient from an object of this classor the correlation matrix of an initialized object of this class.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

References

Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparativemethod: a general approach to incorporating phylogenetic informationinto the analysis of interspecific data.American Naturalist,149, 646–667.

Pagel's “lambda” Correlation Structure

Description

The correlation structure from the present model is derived from theBrownian motion model by multiplying the off-diagonal elements (i.e.,the covariances) by\lambda. The variances are thus thesame than for a Brownian motion model.

Usage

corPagel(value, phy, form = ~1, fixed = FALSE)## S3 method for class 'corPagel'corMatrix(object, covariate = getCovariate(object),                   corr = TRUE, ...)## S3 method for class 'corPagel'coef(object, unconstrained = TRUE, ...)

Arguments

value

the (initial) value of the parameter\lambda.

phy

an object of class"phylo".

form

a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.

fixed

a logical specifying whethergls shouldestimate\lambda (the default) or keep it fixed.

object

an (initialized) object of class"corPagel".

covariate

an optional covariate vector (matrix), or list of covariate vectors (matrices), at which values the correlation matrix, or list of correlation matrices, are to be evaluated. Defaults to getCovariate(object).

corr

a logical value specifying whether to return thecorrelation matrix (the default) or the variance-covariance matrix.

unconstrained

...

further arguments passed to or from other methods.

Value

an object of class"corPagel", the coefficients from an objectof this class, or the correlation matrix of an initialized object ofthis class. In most situations, onlycorPagel will be calledby the user.

Author(s)

Emmanuel Paradis

References

Freckleton, R. P., Harvey, P. H. and M. Pagel, M. (2002) Phylogeneticanalysis and comparative data: a test and review of evidence.American Naturalist,160, 712–726.

Pagel, M. (1999) Inferring the historical patterns of biologicalevolution.Nature,401,877–884.

Correlations among Multiple Traits with Phylogenetic Signal

Description

This function calculates Pearson correlation coefficients for multiple continuous traits that may have phylogenetic signal, allowing users to specify measurement error as the standard error of trait values at the tips of the phylogenetic tree. Phylogenetic signal for each trait is estimated from the data assuming that trait evolution is given by a Ornstein-Uhlenbeck process. Thus, the function allows the estimation of phylogenetic signal in multiple traits while incorporating correlations among traits. It is also possible to include independent variables (covariates) for each trait to remove possible confounding effects. corphylo() returns the correlation matrix for trait values, estimates of phylogenetic signal for each trait, and regression coefficients for independent variables affecting each trait.

Usage

corphylo(X, U = list(), SeM = NULL, phy = NULL, REML = TRUE,method = c("Nelder-Mead", "SANN"), constrain.d = FALSE, reltol = 10^-6,maxit.NM = 1000, maxit.SA = 1000, temp.SA = 1, tmax.SA = 1, verbose = FALSE)## S3 method for class 'corphylo'print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

X

a n x p matrix with p columns containing the values for the n taxa. Rows of X should have rownames matching the taxon names in phy.

U

a list of p matrices corresponding to the p columns of X, with each matrix containing independent variables for the corresponding column of X. The rownames of each matrix within U must be the same as X, or alternatively, the order of values in rows must match those in X. If U is omitted, only the mean (aka intercept) for each column of X is estimated. If U[[i]] is NULL, only an intercept is estimated for X[, i]. If all values of U[[i]][j] are the same, this variable is automatically dropped from the analysis (i.e., there is no offset in the regression component of the model).

SeM

a n x p matrix with p columns containing standard errors of the trait values in X. The rownames of SeM must be the same as X, or alternatively, the order of values in rows must match those in X. If SeM is omitted, the trait values are assumed to be known without error. If only some traits have mesurement errors, the remaining traits can be given zero-valued standard errors.

phy

a phylo object giving the phylogenetic tree. The rownames of phy must be the same as X, or alternatively, the order of values in rows must match those in X.

REML

whether REML or ML is used for model fitting.

method

in optim(), either Nelder-Mead simplex minimization or SANN (simulated annealing) minimization is used. If SANN is used, it is followed by Nelder-Mead minimization.

constrain.d

if constrain.d is TRUE, the estimates of d are constrained to be between zero and 1. This can make estimation more stable and can be tried if convergence is problematic. This does not necessarily lead to loss of generality of the results, because before using corphylo, branch lengths of phy can be transformed so that the "starter" tree has strong phylogenetic signal.

reltol

a control parameter dictating the relative tolerance for convergence in the optimization; see optim().

maxit.NM

a control parameter dictating the maximum number of iterations in the optimization with Nelder-Mead minimization; see optim().

maxit.SA

a control parameter dictating the maximum number of iterations in the optimization with SANN minimization; see optim().

temp.SA

a control parameter dictating the starting temperature in the optimization with SANN minimization; see optim().

tmax.SA

a control parameter dictating the number of function evaluations at each temperature in the optimization with SANN minimization; see optim().

verbose

if TRUE, the model logLik and running estimates of thecorrelation coefficients and values of d are printed each iterationduring optimization.

x

an objects of class corphylo.

digits

the number of digits to be printed.

...

arguments passed to and from other methods.

Details

For the case of two variables, the function estimates parameters for the model of the form, for example,

X[1] = B[1,0] + B[1,1] * u[1,1] + \epsilon[1]

X[2] = B[2,0] + B[2,1] * u[2,1] + \epsilon[2]

\epsilon ~ Gaussian(0, V)

whereB[1,0],B[1,1],B[2,0], andB[2,1] are regression coefficients, andV is a variance-covariance matrix containing the correlation coefficient r, parameters of the OU processd1 andd2, and diagonal matricesM1 andM2 of measurement standard errors forX[1] andX[2]. The matrixV is2n x 2n, withn x n blocks given by

V[1,1] = C[1,1](d1) + M1

V[1,2] = C[1,2](d1,d2)

V[2,1] = C[2,1](d1,d2)

V[2,2] = C[2,2](d2) + M2

whereC[i,j](d1,d2) are derived from phy under the assumption of joint OU evolutionary processes for each trait (see Zheng et al. 2009). This formulation extends in the obvious way to more than two traits.

Value

An object of class "corphylo".

cor.matrix

the p x p matrix of correlation coefficients.

d

values of d from the OU process for each trait.

B

estimates of the regression coefficients, including intercepts. Coefficients are named according to the list U. For example, B1.2 is the coefficient corresponding to U[[1]][, 2], and if column 2 in U[[1]] is named "colname2", then the coefficient will be B1.colname2. Intercepts have the form B1.0.

B.se

standard errors of the regression coefficients.

B.cov

covariance matrix for regression coefficients.

B.zscore

Z scores for the regression coefficients.

B.pvalue

tests for the regression coefficients being different from zero.

logLik

he log likelihood for either the restricted likelihood (REML = TRUE) or the overall likelihood (REML = FALSE).

AIC

AIC for either the restricted likelihood (REML = TRUE) or the overall likelihood (REML = FALSE).

BIC

BIC for either the restricted likelihood (REML = TRUE) or the overall likelihood (REML = FALSE).

REML

whether REML is used rather than ML (TRUE or FALSE).

constrain.d

whether or not values of d were constrained to be between 0 and 1 (TRUE or FALSE).

XX

values of X in vectorized form, with each trait X[, i] standardized to have mean zero and standard deviation one.

UU

design matrix with values in UU corresponding to XX; each variable U[[i]][, j] is standardized to have mean zero and standard deviation one.

MM

vector of measurement standard errors corresponding to XX, with the standard errors suitably standardized.

Vphy

the phylogenetic covariance matrix computed from phy and standardized to have determinant equal to one.

R

covariance matrix of trait values relative to the standardized values of XX.

V

overall estimated covariance matrix of residuals for XX including trait correlations, phylogenetic signal, and measurement error variances. This matrix can be used to simulate data for parametric bootstrapping. See examples.

C

matrix V excluding measurement error variances.

convcode

he convergence code provided by optim().

niter

number of iterations performed by optim().

Author(s)

Anthony R. Ives

References

Zheng, L., A. R. Ives, T. Garland, B. R. Larget, Y. Yu, and K. F. Cao. 2009. New multivariate tests for phylogenetic signal and trait correlations applied to ecophysiological phenotypes of nineManglietia species.Functional Ecology23:1059–1069.

Examples

## Simple example using data without correlations or phylogenetic## signal. This illustrates the structure of the input data.phy <- rcoal(10, tip.label = 1:10)X <- matrix(rnorm(20), nrow = 10, ncol = 2)rownames(X) <- phy$tip.labelU <- list(NULL, matrix(rnorm(10, mean = 10, sd = 4), nrow = 10, ncol = 1))rownames(U[[2]]) <- phy$tip.labelSeM <- matrix(c(0.2, 0.4), nrow = 10, ncol = 2)rownames(SeM) <- phy$tip.labelcorphylo(X = X, SeM = SeM, U = U, phy = phy, method = "Nelder-Mead")## Not run: ## Simulation example for the correlation between two variables. The## example compares the estimates of the correlation coefficients from## corphylo when measurement error is incorporated into the analyses with## three other cases: (i) when measurement error is excluded, (ii) when## phylogenetic signal is ignored (assuming a "star" phylogeny), and (iii)## neither measurement error nor phylogenetic signal are included.## In the simulations, variable 2 is associated with a single## independent variable. This requires setting up a list U that has 2## elements: element U[[1]] is NULL and element U[[2]] is a n x 1 vector## containing simulated values of the independent variable.# Set up parameter values for simulating datan <- 50phy <- rcoal(n, tip.label = 1:n)R <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2)d <- c(0.3, .95)B2 <- 1Se <- c(0.2, 1)SeM <- matrix(Se, nrow = n, ncol = 2, byrow = T)rownames(SeM) <- phy$tip.label# Set up needed matrices for the simulationsp <- length(d)star <- stree(n)star$edge.length <- array(1, dim = c(n, 1))star$tip.label <- phy$tip.labelVphy <- vcv(phy)Vphy <- Vphy/max(Vphy)Vphy <- Vphy/exp(determinant(Vphy)$modulus[1]/n)tau <- matrix(1, nrow = n, ncol = 1) C <- matrix(0, nrow = p * n, ncol = p * n)for (i in 1:p) for (j in 1:p) {Cd <- (d[i]^tau * (d[j]^t(tau)) * (1 - (d[i] * d[j])^Vphy))/(1 - d[i] * d[j])C[(n * (i - 1) + 1):(i * n), (n * (j - 1) + 1):(j * n)] <- R[i, j] * Cd}MM <- matrix(SeM^2, ncol = 1)V <- C + diag(as.numeric(MM))## Perform a Cholesky decomposition of Vphy. This is used to generate## phylogenetic signal: a vector of independent normal random variables,## when multiplied by the transpose of the Cholesky deposition of Vphy will## have covariance matrix equal to Vphy.iD <- t(chol(V))# Perform Nrep simulations and collect the resultsNrep <- 100cor.list <- matrix(0, nrow = Nrep, ncol = 1)cor.noM.list <- matrix(0, nrow = Nrep, ncol = 1)cor.noP.list <- matrix(0, nrow = Nrep, ncol = 1)cor.noMP.list <- matrix(0, nrow = Nrep, ncol = 1)d.list <- matrix(0, nrow = Nrep, ncol = 2)d.noM.list <- matrix(0, nrow = Nrep, ncol = 2)B.list <- matrix(0, nrow = Nrep, ncol = 3)B.noM.list <- matrix(0, nrow = Nrep, ncol = 3)B.noP.list <- matrix(0, nrow = Nrep, ncol = 3)for (rep in 1:Nrep) {XX <- iD X <- matrix(XX, nrow = n, ncol = 2)rownames(X) <- phy$tip.labelU <- list(NULL, matrix(rnorm(n, mean = 2, sd = 10), nrow = n, ncol = 1))rownames(U[[2]]) <- phy$tip.labelcolnames(U[[2]]) <- "V1"X[,2] <- X[,2] + B2[1] * U[[2]][,1] - B2[1] * mean(U[[2]][,1])z <- corphylo(X = X, SeM = SeM, U = U, phy = phy, method = "Nelder-Mead")z.noM <- corphylo(X = X, U = U, phy = phy, method = "Nelder-Mead")z.noP <- corphylo(X = X, SeM = SeM, U = U, phy = star, method = "Nelder-Mead")cor.list[rep] <- z$cor.matrix[1, 2]cor.noM.list[rep] <- z.noM$cor.matrix[1, 2]cor.noP.list[rep] <- z.noP$cor.matrix[1, 2]cor.noMP.list[rep] <- cor(cbind(lm(X[,1] ~ 1)$residuals, lm(X[,2] ~ U[[2]])$residuals))[1,2]d.list[rep, ] <- z$dd.noM.list[rep, ] <- z.noM$dB.list[rep, ] <- z$BB.noM.list[rep, ] <- z.noM$BB.noP.list[rep, ] <- z.noP$Bshow(c(rep, z$convcode, z$cor.matrix[1, 2], z$d))}correlation <- rbind(R[1, 2], mean(cor.list), mean(cor.noM.list),                     mean(cor.noP.list), mean(cor.noMP.list))rownames(correlation) <- c("True", "With SeM and Phy", "Without SeM",                           "Without Phy", "Without Phy or SeM")correlationsignal.d <- rbind(d, colMeans(d.list), colMeans(d.noM.list))rownames(signal.d) <- c("True", "With SeM and Phy", "Without SeM")signal.dest.B <- rbind(c(0, 0, B2), colMeans(B.list), colMeans(B.noM.list),               colMeans(B.noP.list))rownames(est.B) <- c("True", "With SeM and Phy", "Without SeM", "Without Phy")colnames(est.B) <- rownames(z$B)est.B# Example simulation output# correlation                        # [,1]# True               0.7000000# With SeM and Phy   0.7055958# Without SeM        0.3125253# Without Phy        0.4054043# Without Phy or SeM 0.3476589# signal.d                     # [,1]      [,2]# True             0.300000 0.9500000# With SeM and Phy 0.301513 0.9276663# Without SeM      0.241319 0.4872675# est.B                        # B1.0      B2.0     B2.V1# True              0.00000000 0.0000000 1.0000000# With SeM and Phy -0.01285834 0.2807215 0.9963163# Without SeM       0.01406953 0.3059110 0.9977796# Without Phy       0.02139281 0.3165731 0.9942140## End(Not run)

Phylogenetic Correlogram

Description

This function computes a correlogram from taxonomic levels.

Usage

  correlogram.formula(formula, data = NULL, use = "all.obs")

Arguments

formula

a formula of the typey1+..+yn ~ g1/../gn, wherethey's are the data to analyse and theg's are thetaxonomic levels.

data

a data frame containing the variables specified in theformula. IfNULL, the variables are sought in the user'sworkspace.

use

a character string specifying how to handle missingvalues (i.e.,NA). This must be one of "all.obs","complete.obs", or "pairwise.complete.obs", or any unambiguousabbrevation of these. In the first case, the presence of missingvalues produces an error. In the second case, all rows with missingvalues will be removed before computation. In the last case, missingvalues are removed on a case-by-case basis.

Details

See the vignette in R:vignette("MoranI").

Value

An object of classcorrelogram which is a data frame with threecolumns:

obs

the computed Moran's I

p.values

the corresponding P-values

labels

the names of each level

or an object of classcorrelogramList containing a list ofobjects of classcorrelogram if several variables are given asresponse informula.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de andEmmanuel Paradis

Examples

data(carnivora)### Using the formula interface:co <- correlogram.formula(SW ~ Order/SuperFamily/Family/Genus,      data=carnivora)coplot(co)### Several correlograms on the same plot:cos <- correlogram.formula(SW + FW ~ Order/SuperFamily/Family/Genus,      data=carnivora)cosplot(cos)

NEXUS Data Example

Description

Example of Protein data in NEXUS format (Maddison et al., 1997).Data is written in interleaved format using a single DATA block.Original data from Rokas et al (2002).

Usage

data(cynipids)

Format

ASCII text in NEXUS format

References

Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.

Rokas, A., Nylander, J. A. A., Ronquist, F. and Stone, G. N. (2002) Amaximum likelihood analysis of eight phylogenetic markers in Gallwasps(Hymenoptera: Cynipidae): implications for insect phylogeneticstudies.Molecular Phylogenetics and Evolution,22,206–219.

Probability Density Under Birth–Death Models

Description

These functions compute the probability density under somebirth–death models, that is the probability of obtainingxspecies after a timet giving how speciation and extinctionprobabilities vary through time (these may be constant, or even equalto zero for extinction).

Usage

dyule(x, lambda = 0.1, t = 1, log = FALSE)dbd(x, lambda, mu, t, conditional = FALSE, log = FALSE)dbdTime(x, birth, death, t, conditional = FALSE,        BIRTH = NULL, DEATH = NULL, fast = FALSE)

Arguments

x

a numeric vector of species numbers (see Details).

lambda

a numerical value giving the probability of speciation;can be a vector with several values fordyule.

mu

id. for extinction.

t

id. for the time(s).

log

a logical value specifying whether the probabilities shouldbe returned log-transformed; the default isFALSE.

conditional

a logical specifying whether the probabilitiesshould be computed conditional under the assumption of no extinctionafter timet.

birth,death

a (vectorized) function specifying how thespeciation or extinction probability changes through time (seeyule.time and below).

BIRTH,DEATH

a (vectorized) function giving the primitiveofbirth ordeath.

fast

a logical value specifying whether to use fasterintegration (seebd.time).

Details

These three functions compute the probabilities to observexspecies starting from a single one after timet (assumed to becontinuous). The first function is a short-cut for the second one withmu = 0 and with default values for the two other arguments.dbdTime is for time-varyinglambda andmuspecified asR functions.

dyule is vectorized simultaneously on its three argumentsx,lambda, andt, according toR's rules ofrecycling arguments.dbd is vectorized simultaneouslyxandt (to make likelihood calculations easy), anddbdTime is vectorized only onx; the other arguments areeventually shortened with a warning if necessary.

The returned value is, logically, zero for values ofx out ofrange, i.e., negative or zero fordyule or ifconditional = TRUE. However, it is not checked if the values ofx arepositive non-integers and the probabilities are computed and returned.

The details on the form of the argumentsbirth,death,BIRTH,DEATH, andfast can be found in the linksbelow.

Value

a numeric vector.

Note

If you use these functions to calculate a likelihood function, it isstrongly recommended to compute the log-likelihood with, for instancein the case of a Yule process,sum(dyule( , log = TRUE)) (seeexamples).

Author(s)

Emmanuel Paradis

References

Kendall, D. G. (1948) On the generalized “birth-and-death”process.Annals of Mathematical Statistics,19, 1–15.

Examples

x <- 0:10plot(x, dyule(x), type = "h", main = "Density of the Yule process")text(7, 0.85, expression(list(lambda == 0.1, t == 1)))y <- dbd(x, 0.1, 0.05, 10)z <- dbd(x, 0.1, 0.05, 10, conditional = TRUE)d <- rbind(y, z)colnames(d) <- xbarplot(d, beside = TRUE, ylab = "Density", xlab = "Number of species",        legend = c("unconditional", "conditional on\nno extinction"),        args.legend = list(bty = "n"))title("Density of the birth-death process")text(17, 0.4, expression(list(lambda == 0.1, mu == 0.05, t == 10)))## Not run: ### generate 1000 values from a Yule process with lambda = 0.05x <- replicate(1e3, Ntip(rlineage(0.05, 0)))### the correct way to calculate the log-likelihood...:sum(dyule(x, 0.05, 50, log = TRUE))### ... and the wrong way:log(prod(dyule(x, 0.05, 50)))### a third, less preferred, way:sum(log(dyule(x, 0.05, 50)))## End(Not run)

Definition of Vectors for Plotting or Annotating

Description

This function can be used to define vectors to annotate a set of taxonnames, labels, etc. It should facilitate the (re)definition of coloursor similar attributes for plotting trees or other graphics.

Usage

def(x, ..., default = NULL, regexp = FALSE)

Arguments

x

a vector of mode character.

...

a series of statements defining the attributes.

default

the default to be used (see details).

regexp

a logical value specifying whether the statementsdefined in... should be taken as regular expressions.

Details

The idea of this function is to make the definition of colours, etc.,simpler than what is done usually. A typical use is:

def(tr$tip.label, Homo_sapiens = "blue")

which will return a vector of character strings all "black" except onematching the tip label "Homo_sapiens" which will be "blue". Another usecould be:

def(tr$tip.label, Homo_sapiens = 2)

which will return a vector a numerical values all 1 except for"Homo_sapiens" which will be 2. Several definitions can be done, e.g.:

def(tr$tip.label, Homo_sapiens = "blue", Pan_paniscus = "red")

The default value is determined with respect to the mode of the valuesgiven with the... (either "black" or 1).

Ifregexp = TRUE is used, then the names of the statements must bequoted, e.g.:

def(tr$tip.label, "^Pan_" = "red", regexp = TRUE)

will return "red" for all labels starting with "Pan_".

Value

a vector of the same length thanx.

Author(s)

Emmanuel Paradis

Examples

data(bird.orders)a <- def(bird.orders$tip.label, Galliformes = 2)str(a) # numericplot(bird.orders, font = a)co <- def(bird.orders$tip.label, Passeriformes = "red", Trogoniformes = "blue")str(co) # characterplot(bird.orders, tip.color = co)### use of a regexp (so we need to quote it) to colour all orders### with names starting with "C" (and change the default):co2 <- def(bird.orders$tip.label, "^C" = "gold", default = "grey", regexp = TRUE)plot(bird.orders, tip.color = co2)

Vertex Degrees in Trees and Networks

Description

degree is a generic function to calculate the degree of allnodes in a tree or in a network.

Usage

degree(x, ...)## S3 method for class 'phylo'degree(x, details = FALSE, ...)## S3 method for class 'evonet'degree(x, details = FALSE, ...)

Arguments

x

an object (tree, network, ...).

details

whether to return the degree of each node in the tree,or a summary table (the default).

...

arguments passed to methods.

Details

The degree of a node (or vertex) in a network is defined by the numberof branches (or edges) that connect to this node. In a phylogenetictree, the tips (or terminal nodes) are of degree one, and the(internal) nodes are of degree two or more.

There are currently two methods for the classes"phylo" and"evonet". The default of these functions is to return a summarytable with the degrees observed in the tree or network in the firstcolumn, and the number of nodes in the second column. Ifdetails = TRUE, a vector giving the degree of each node (as numbered in theedge matrix) is returned.

The validity of the object is not checked, sodegree can beused to check problems with badly conformed trees.

Value

a data frame ifdetails = FALSE, or a vector of integersotherwise.

Author(s)

Emmanuel Paradis

Examples

data(bird.orders)degree(bird.orders)degree(bird.orders, details = TRUE)data(bird.families)degree(bird.families)degree(rtree(10)) # 10, 1, 8degree(rtree(10, rooted = FALSE)) # 10, 0, 8degree(stree(10)) # 10 + 1 node of degree 10

Delete Alignment Gaps in DNA or AA Sequences

Description

These functions remove gaps ("-") in a sample of DNA sequences.

Usage

del.gaps(x)del.colgapsonly(x, threshold = 1, freq.only = FALSE)del.rowgapsonly(x, threshold = 1, freq.only = FALSE)

Arguments

x

a matrix, a list, or a vector containing the DNA or AAsequences; only matrices fordel.colgapsonly and fordel.rowgapsonly.

threshold

the largest gap proportion to delete the column or row.

freq.only

ifTRUE, returns only the numbers of gaps foreach column or row.

Details

del.gaps remove all gaps, so the returned sequences may nothave all the same lengths and are therefore returned in a list.

del.colgapsonly removes the columns with a proportion at leastthreshold of gaps. Thus by default, only the columns with gapsonly are removed (useful when a small matrix is extracted from a largealignment).del.rowgapsonly does the same for the rows.

The class of the input sequences is respected and kept unchanged,unless it contains neither"DNAbin" nor"AAbin" in whichcase the object is first converted into the class"DNAbin".

Value

del.gaps returns a vector (if there is only one input sequence)or a list of sequences;del.colgapsonly anddel.rowgapsonly return a matrix of sequences or a numericvector (with names for the second function) iffreq.only = TRUE.

Author(s)

Emmanuel Paradis

Delta Plots

Description

This function makes a\delta plot following Holland etal. (2002).

Usage

delta.plot(X, k = 20, plot = TRUE, which = 1:2)

Arguments

X

a distance matrix, may be an object of class “dist”.

k

an integer giving the number of intervals in the plot.

plot

a logical specifying whether to draw the\delta plot (the default).

which

a numeric vector indicating which plots are done; 1: thehistogram of the\delta_q values, 2: the plot of theindividual\bar{\delta} values. By default, bothplots are done.

Details

See Holland et al. (2002) for details and interpretation.

The computing time of this function is proportional to the fourthpower of the number of observations (O(n^4)), so calculationsmay be very long with only a slight increase in sample size.

Value

This function returns invisibly a named list with two components:

counts: the counts for the histogram of\delta_q values
delta.bar: the mean\delta value for eachobservation

Author(s)

Emmanuel Paradis

References

Holland, B. R., Huber, K. T., Dress, A. and Moulton, V. (2002) Deltaplots: a tool for analyzing phylogenetic distance data.Molecular Biology and Evolution,12, 2051–2059.

Examples

data(woodmouse)d <- dist.dna(woodmouse)delta.plot(d)layout(1)delta.plot(d, 40, which = 1)

Pairwise Distances from DNA Sequences

Description

This function computes a matrix of pairwise distances from DNAsequences using a model of DNA evolution. Eleven substitution models(and the raw distance) are currently available.

Usage

dist.dna(x, model = "K80", variance = FALSE,         gamma = FALSE, pairwise.deletion = FALSE,         base.freq = NULL, as.matrix = FALSE)

Arguments

x

a matrix or a list containing the DNA sequences; this must beof class"DNAbin" (useas.DNAbin is they arestored as character).

model

a character string specifying the evolutionary model to beused; must be one of"raw","N","TS","TV","JC69","K80" (the default),"F81","K81","F84","BH87","T92","TN93","GG95","logdet","paralin","indel", or"indelblock".

variance

a logical indicating whether to compute the variancesof the distances; defaults toFALSE so the variances are notcomputed.

gamma

a value for the gamma parameter possibly used to apply acorrection to the distances (by default no correction is applied).

pairwise.deletion

a logical indicating whether to delete thesites with missing data in a pairwise way. The default is to deletethe sites with at least one missing data for all sequences (ignoredifmodel = "indel" or"indelblock").

base.freq

the base frequencies to be used in the computations(if applicable). By default, the base frequencies are computed fromthe whole set of sequences.

as.matrix

a logical indicating whether to return the results asa matrix. The default is to return an object of classdist.

Details

The molecular evolutionary models available through the optionmodel have been extensively described in the literature. Abrief description is given below; more details can be found in thereferences.

raw,N: This is simply the proportion or the number ofsites that differ between each pair of sequences. This may be usefulto draw “saturation plots”. The optionsvariance andgamma have no effect, butpairwise.deletion may have.
TS,TV: These are the numbers of transitions andtransversions, respectively.
JC69: This model was developed by Jukes and Cantor (1969). Itassumes that all substitutions (i.e. a change of a base by anotherone) have the same probability. This probability is the same for allsites along the DNA sequence. This last assumption can be relaxed byassuming that the substition rate varies among site following agamma distribution which parameter must be given by the user. Bydefault, no gamma correction is applied. Another assumption is thatthe base frequencies are balanced and thus equal to 0.25.
K80: The distance derived by Kimura (1980), sometimes referredto as “Kimura's 2-parameters distance”, has the same underlyingassumptions than the Jukes–Cantor distance except that two kinds ofsubstitutions are considered: transitions (A <-> G, C <-> T), andtransversions (A <-> C, A <-> T, C <-> G, G <-> T). They are assumedto have different probabilities. A transition is the substitution ofa purine (C, T) by another one, or the substitution of a pyrimidine(A, G) by another one. A transversion is the substitution of apurine by a pyrimidine, or vice-versa. Both transition andtransversion rates are the same for all sites along the DNAsequence. Jin and Nei (1990) modified the Kimura model to allow forvariation among sites following a gamma distribution. Like for theJukes–Cantor model, the gamma parameter must be given by theuser. By default, no gamma correction is applied.
F81: Felsenstein (1981) generalized the Jukes–Cantor modelby relaxing the assumption of equal base frequencies. The formulaeused in this function were taken from McGuire et al. (1999).
K81: Kimura (1981) generalized his model (Kimura 1980) byassuming different rates for two kinds of transversions: A <-> C andG <-> T on one side, and A <-> T and C <-> G on the other. This iswhat Kimura called his “three substitution types model” (3ST), andis sometimes referred to as “Kimura's 3-parameters distance”.
F84: This model generalizes K80 by relaxing the assumptionof equal base frequencies. It was first introduced by Felsenstein in1984 in Phylip, and is fully described by Felsenstein and Churchill(1996). The formulae used in this function were taken from McGuireet al. (1999).
BH87: Barry and Hartigan (1987) developed a distance basedon the observed proportions of changes among the four bases. Thisdistance is not symmetric.
T92: Tamura (1992) generalized the Kimura model by relaxingthe assumption of equal base frequencies. This is done by takinginto account the bias in G+C content in the sequences. Thesubstitution rates are assumed to be the same for all sites alongthe DNA sequence.
TN93: Tamura and Nei (1993) developed a model which assumesdistinct rates for both kinds of transition (A <-> G versus C <->T), and transversions. The base frequencies are not assumed to beequal and are estimated from the data. A gamma correction of theinter-site variation in substitution rates is possible.
GG95: Galtier and Gouy (1995) introduced a model where theG+C content may change through time. Different rates are assumed fortransitons and transversions.
logdet: The Log-Det distance, developed by Lockhart etal. (1994), is related to BH87. However, this distance issymmetric. Formulae from Gu and Li (1996) are used.dist.logdet inphangorn uses a differentimplementation that gives substantially different distances forlow-diverging sequences.
paralin: Lake (1994) developed the paralinear distance whichcan be viewed as another variant of the Barry–Hartigan distance.
indel: this counts the number of sites where there is aninsertion/deletion gap in one sequence and not in the other.
indelblock: same than before but contiguous gaps arecounted as a single unit. Note that the distance between-A- andA-- is 3 because there are three different blocks of gaps, whereasthe “indel” distance will be 2.

Value

an object of classdist (by default), or a numericmatrix ifas.matrix = TRUE. Ifmodel = "BH87", a numericmatrix is returned because the Barry–Hartigan distance is notsymmetric.

Ifvariance = TRUE an attribute called"variance" isgiven to the returned object.

Note

If the sequences are very different, most evolutionary distances areundefined and a non-finite value (Inf or NaN) is returned. You may dodist.dna(, model = "raw") to check whether some values arehigher than 0.75.

Author(s)

Emmanuel Paradis

References

Barry, D. and Hartigan, J. A. (1987) Asynchronous distance betweenhomologous DNA sequences.Biometrics,43, 261–276.

Felsenstein, J. (1981) Evolutionary trees from DNA sequences: amaximum likelihood approach.Journal of Molecular Evolution,17, 368–376.

Felsenstein, J. and Churchill, G. A. (1996) A Hidden Markov modelapproach to variation among sites in rate of evolution.Molecular Biology and Evolution,13, 93–104.

Galtier, N. and Gouy, M. (1995) Inferring phylogenies from DNAsequences of unequal base compositions.Proceedings of theNational Academy of Sciences USA,92, 11317–11321.

Gu, X. and Li, W.-H. (1996) Bias-corrected paralinear and LogDetdistances and tests of molecular clocks and phylogenies undernonstationary nucleotide frequencies.Molecular Biology andEvolution,13, 1375–1383.

Jukes, T. H. and Cantor, C. R. (1969) Evolution of proteinmolecules. inMammalian Protein Metabolism, ed. Munro, H. N.,pp. 21–132, New York: Academic Press.

Kimura, M. (1980) A simple method for estimating evolutionary rates ofbase substitutions through comparative studies of nucleotidesequences.Journal of Molecular Evolution,16, 111–120.

Kimura, M. (1981) Estimation of evolutionary distances betweenhomologous nucleotide sequences.Proceedings of the NationalAcademy of Sciences USA,78, 454–458.

Jin, L. and Nei, M. (1990) Limitations of the evolutionary parsimonymethod of phylogenetic analysis.Molecular Biology andEvolution,7, 82–102.

Lake, J. A. (1994) Reconstructing evolutionary trees from DNA andprotein sequences: paralinear distances.Proceedings of theNational Academy of Sciences USA,91, 1455–1459.

Lockhart, P. J., Steel, M. A., Hendy, M. D. and Penny, D. (1994)Recovering evolutionary trees under a more realistic model of sequenceevolution.Molecular Biology and Evolution,11,605–602.

McGuire, G., Prentice, M. J. and Wright, F. (1999). Improved errorbounds for genetic distances from DNA sequences.Biometrics,55, 1064–1070.

Tamura, K. (1992) Estimation of the number of nucleotide substitutionswhen there are strong transition-transversion and G + C-contentbiases.Molecular Biology and Evolution,9, 678–687.

Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotidesubstitutions in the control region of mitochondrial DNA in humans andchimpanzees.Molecular Biology and Evolution,10, 512–526.

Pairwise Distances from Genetic Data

Description

This function computes a matrix of distances between pairs ofindividuals from a matrix or a data frame of genetic data.

Usage

dist.gene(x, method = "pairwise", pairwise.deletion = FALSE,          variance = FALSE)

Arguments

x

a matrix or a data frame (will be coerced as a matrix).

method

a character string specifying the method used to computethe distances; two choices are available:"pairwise" and"percentage", or any unambiguous abbreviation of these.

pairwise.deletion

a logical indicating whether to delete thecolumns with missing data on a pairwise basis. The default is todelete the columns with at least one missing observation.

variance

a logical, indicates whether the variance of thedistances should be returned (default toFALSE).

Details

This function is meant to be very general and accepts different kindsof data (alleles, haplotypes, SNP, DNA sequences, ...). The rows ofthe data matrix represent the individuals, and the columns the loci.

In the case of the pairwise method, the distanced between twoindividuals is the number of loci for which they differ, and theassociated variance isd(L - d)/L, whereL is the numberof loci.

In the case of the percentage method, this distance is divided byL,and the associated variance isd(1 - d)/L.

For more elaborate distances with DNA sequences, see the functiondist.dna.

Value

an object of classdist. Ifvariance = TRUE anattribute called"variance" is given to the returned object.

Note

Missing data (NA) are coded and treated in R's usual way.

Author(s)

Emmanuel Paradis

Topological Distances Between Two Trees

Description

This function computes the topological distance between twophylogenetic trees or among trees in a list (ify = NULL usingdifferent methods.

Usage

dist.topo(x, y = NULL, method = "PH85", mc.cores = 1)

Arguments

x

an object of class"phylo" or of class"multiPhylo".

y

an (optional) object of class"phylo".

method

a character string giving the method to be used: either"PH85", or"score".

mc.cores

the number of cores (CPUs) to be used (passed toparallel).

Details

Two methods are available: the one by Penny and Hendy (1985,originally from Robinson and Foulds 1981), and the branch length scoreby Kuhner and Felsenstein (1994). The trees are always considered asunrooted.

The topological distance is defined as twice the number of internalbranches defining different bipartitions of the tips (Robinson andFoulds 1981; Penny and Hendy 1985). Rzhetsky and Nei (1992) proposed amodification of the original formula to take multifurcations intoaccount.

The branch length score may be seen as similar to the previousdistance but taking branch lengths into account. Kuhner andFelsenstein (1994) proposed to calculate the square root of the sum ofthe squared differences of the (internal) branch lengths definingsimilar bipartitions (or splits) in both trees.

Value

a single numeric value if bothx andy are used, anobject of class"dist" otherwise.

Note

The geodesic distance of Billera et al. (2001) has been disabled: seethe packagedistory on CRAN.

Author(s)

Emmanuel Paradis

References

Billera, L. J., Holmes, S. P. and Vogtmann, K. (2001) Geometry of thespace of phylogenetic trees.Advances in Applied Mathematics,27, 733–767.

Kuhner, M. K. and Felsenstein, J. (1994) Simulation comparison ofphylogeny algorithms under equal and unequal evolutionary rates.Molecular Biology and Evolution,11, 459–468.

Nei, M. and Kumar, S. (2000)Molecular Evolution andPhylogenetics. Oxford: Oxford University Press.

Penny, D. and Hendy, M. D. (1985) The use of tree comparisonmetrics.Systemetic Zoology,34, 75–82.

Robinson, D. F. and Foulds, L. R. (1981) Comparison of phylogenetictrees.Mathematical Biosciences,53, 131–147.

Rzhetsky, A. and Nei, M. (1992) A simple method for estimating andtesting minimum-evolution trees.Molecular Biology andEvolution,9, 945–967.

Examples

ta <- rtree(30, rooted = FALSE)tb <- rtree(30, rooted = FALSE)dist.topo(ta, ta) # 0dist.topo(ta, tb) # unlikely to be 0## rmtopology() simulated unrooted trees by default:TR <- rmtopology(100, 10)## these trees have 7 internal branches, so the maximum distance## between two of them is 14:DTR <- dist.topo(TR)table(DTR)

Tests of Constant Diversification Rates

Description

This function computes two tests of the distribution of branchingtimes using the Cramér–von Mises and Anderson–Darlinggoodness-of-fit tests. By default, it is assumed that thediversification rate is constant, and an exponential distribution isassumed for the branching times. In this case, the expecteddistribution under this model is computed with a rate estimated fromthe data. Alternatively, the user may specify an expected cumulativedensity function (z): in this case,x andz mustbe of the same length. See the examples for how to compute the latterfrom a sample of expected branching times.

Usage

diversi.gof(x, null = "exponential", z = NULL)

Arguments

x

a numeric vector with the branching times.

null

a character string specifying the null distribution forthe branching times. Only two choices are possible: either"exponential", or"user".

z

used ifnull = "user"; gives the expected distributionunder the model.

Details

The Cramér–von Mises and Anderson–Darling testscompare the empirical density function (EDF) of the observations to anexpected cumulative density function. By contrast to theKolmogorov–Smirnov test where the greatest difference between thesetwo functions is used, in both tests all differences are taken intoaccount.

The distributions of both test statistics depend on the nullhypothesis, and on whether or not some parameters were estimated fromthe data. However, these distributions are not known precisely andcritical values were determined by Stephens (1974) usingsimulations. These critical values were used for the present function.

Value

A NULL value is returned, the results are simply printed.

Author(s)

Emmanuel Paradis

References

Paradis, E. (1998) Testing for constant diversification rates usingmolecular phylogenies: a general approach based on statistical testsfor goodness of fit.Molecular Biology and Evolution,15, 476–479.

Stephens, M. A. (1974) EDF statistics for goodness of fit and somecomparisons.Journal of the American Statistical Association,69, 730–737.

Examples

data(bird.families)x <- branching.times(bird.families)### suppose we have a sample of expected branching times `y';### for simplicity, take them from a uniform distribution:y <- runif(500, 0, max(x) + 1) # + 1 to avoid A2 = Inf### now compute the expected cumulative distribution:x <- sort(x)N <- length(x)ecdf <- numeric(N)for (i in 1:N) ecdf[i] <- sum(y <= x[i])/500### finally do the test:diversi.gof(x, "user", z = ecdf)

Analysis of Diversification with Survival Models

Description

This functions fits survival models to a set of branching times, someof them may be known approximately (censored). Three models arefitted, Model A assuming constant diversification, Model B assumingthat diversification follows a Weibull law, and Model C assuming thatdiversification changes with a breakpoint at time ‘Tc’. The models arefitted by maximum likelihood.

Usage

diversi.time(x, census = NULL, censoring.codes = c(1, 0), Tc = NULL)

Arguments

x

a numeric vector with the branching times.

census

a vector of the same length than ‘x’ used as anindicator variable; thus, it must have only two values, one codingfor accurately known branching times, and the other for censoredbranching times. This argument can be of any mode (numeric, character,logical), or can even be a factor.

censoring.codes

a vector of length two giving the codes usedforcensus: by default 1 (accurately known times) and 0 (censoredtimes). The mode must be the same than the one ofcensus.

Tc

a single numeric value specifying the break-point time tofit Model C. If none is provided, then it is set arbitrarily to themean of the analysed branching times.

Details

The principle of the method is to consider each branching time as anevent: if the branching time is accurately known, then it is a failureevent; if it is approximately knwon then it is a censoring event. Ananalogy is thus made between the failure (or hazard) rate estimated bythe survival models and the diversification rate of the lineage. Timeis here considered from present to past.

Model B assumes a monotonically changing diversification rate. Theparameter that controls the change of this rate is called beta. Ifbeta is greater than one, then the diversification rate decreasesthrough time; if it is lesser than one, the the rate increases throughtime. If beta is equal to one, then Model B reduces to Model A.

Value

A NULL value is returned, the results are simply printed.

Author(s)

Emmanuel Paradis

References

Paradis, E. (1997) Assessing temporal variations in diversificationrates from phylogenies: estimation and hypothesistesting.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,264, 1141–1147.

Diversity Contrast Test

Description

This function performs the diversity contrast test comparing pairs ofsister-clades.

Usage

diversity.contrast.test(x, method = "ratiolog",        alternative = "two.sided", nrep = 0, ...)

Arguments

x

a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease or decrease diversification rate, and the second one thenumber of species in the sister-clades without the trait. Each rowrepresents a pair of sister-clades.

method

a character string specifying the kind of test:"ratiolog" (default),"proportion","difference","logratio", or any unambiguousabbreviation of these.

alternative

a character string defining the alternativehypothesis:"two.sided" (default),"less","greater", or any unambiguous abbreviation of these.

nrep

the number of replications of the randomization test; bydefault, a Wilcoxon test is done.

...

arguments passed to the functionwilcox.test.

Details

Ifmethod = "ratiolog", the test described in Barraclough etal. (1996) is performed. Ifmethod = "proportion", the versionin Barraclough et al. (1995) is used. Ifmethod = "difference",the signed difference is used (Sargent 2004). Ifmethod = "logratio",then this is Wiegmann et al.'s (1993) version. Thesefour tests are essentially different versions of the same test (Vamosiand Vamosi 2005, Vamosi 2007). See Paradis (2012) for a comparison oftheir statistical performance with other tests.

Ifnrep = 0, a Wilcoxon test is done on the species diversitycontrasts with the null hypothesis is that they are distributed aroundzero. Ifnrep > 0, a randomization procedure is done where thesigns of the diversity contrasts are randomly chosen. This is used tocreate a distribution of the test statistic which is compared with theobserved value (the sum of the diversity contrasts).

Value

a single numeric value with theP-value.

Author(s)

Emmanuel Paradis

References

Barraclough, T. G., Harvey, P. H. and Nee, S. (1995) Sexualselection and taxonomic diversity in passerine birds.Proceedings of the Royal Society of London. Series B. BiologicalSciences,259, 211–215.

Barraclough, T. G., Harvey, P. H., and Nee, S. (1996) Rate ofrbcL gene sequence evolution and species diversification inflowering plants (angiosperms).Proceedings of the Royal Societyof London. Series B. Biological Sciences,263, 589–591.

Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.

Sargent, R. D. (2004) Floral symmetry affects speciation rates inangiosperms.Proceedings of the Royal Society of London. SeriesB. Biological Sciences,271, 603–608.

Vamosi, S. M. (2007) Endless tests: guidelines for analysing non-nestedsister-group comparisons. An addendum.Evolutionary EcologyResearch,9, 717.

Vamosi, S. M. and Vamosi, J. C. (2005) Endless tests: guidelines foranalysing non-nested sister-group comparisons.EvolutionaryEcology Research,7, 567–579.

Wiegmann, B., Mitter, C. and Farrell, B. 1993. Diversification ofcarnivorous parasitic insects: extraordinary radiation or specializeddead end?American Naturalist,142, 737–754.

Examples

### data from Vamosi & Vamosi (2005):fleshy <- c(1, 1, 1, 1, 1, 3, 3, 5, 9, 16, 33, 40, 50, 100, 216, 393, 850, 947,1700)dry <- c(2, 64, 300, 89, 67, 4, 34, 10, 150, 35, 2, 60, 81, 1, 3, 1, 11, 1, 18)x <- cbind(fleshy, dry)diversity.contrast.test(x)diversity.contrast.test(x, alt = "g")diversity.contrast.test(x, alt = "g", nrep = 1e4)slowinskiguyer.test(x)mcconwaysims.test(x)

dN/dS Ratio

Description

This function computes the pairwise ratios dN/dS for a set of alignedDNA sequences using Li's (1993) method.

Usage

dnds(x, code = 1, codonstart = 1, quiet = FALSE,     details = FALSE, return.categories = FALSE)

Arguments

x

an object of class"DNAbin" (matrix or list) with thealigned sequences.

code

an integer value giving the genetic code to beused. Currently, the codes 1 to 6 are supported.

codonstart

an integer giving where to start the translation. Thisshould be 1, 2, or 3, but larger values are accepted and have foreffect to start the translation further within the sequence.

quiet

single logical value: whether to indicate progress ofcalculations.

details

single logical value (see details).

return.categories

a logical value: ifTRUE, a matrix ofthe same size thanx is returned giving the degeneracy categoryof each base in the original alignment.

Details

Sinceape 5.6, the degeneracy of each codon is calculateddirectly from the genetic code using the functiontrans. A consequence is that ambiguous bases are ignored(seesolveAmbiguousBases).

Ifdetails = TRUE, a table is printed for each pair ofsequences giving the numbers of transitions and transversions for eachcategory of degeneracy (nondegenerate, twofold, and fourfold). This ishelpful when non-meaningful values are returned (e.g., NaN, Inf,negative values).

Value

an object of class"dist", or a numeric matrix ifreturn.categories = TRUE.

Author(s)

Emmanuel Paradis

References

Li, W.-H. (1993) Unbiased estimation of the rates of synonymous andnonsynonymous substitution.Journal of Molecular Evolution,36, 96–99.

Examples

data(woodmouse)res <- dnds(woodmouse, quiet = TRUE) # NOT correctres2 <- dnds(woodmouse, code = 2, quiet = TRUE) # using the correct codeidentical(res, res2) # FALSE...cor(res, res2) # ... but very close## There a few N's in the woodmouse data, but this does not affect## greatly the results:res3 <- dnds(solveAmbiguousBases(woodmouse), code = 2, quiet = TRUE)cor(res, res3)## a simple example showing the usefulness of 'details = TRUE'X <- as.DNAbin(matrix(c("C", "A", "G", "G", "T", "T"), 2, 3))alview(X)dnds(X, quiet = TRUE) # NaNdnds(X, details = TRUE) # only a TV at a nondegenerate site

Remove Tips in a Phylogenetic Tree

Description

drop.tip removes the terminal branches of a phylogenetic tree,possibly removing the corresponding internal branches.keep.tipdoes the opposite operation (i.e., returns the induced tree).

extract.clade does the inverse operation: it keeps all the tipsfrom a given node, and deletes all the other tips.

Usage

drop.tip(phy, tip, ...)## S3 method for class 'phylo'drop.tip(phy, tip, trim.internal = TRUE, subtree = FALSE,         root.edge = 0, rooted = is.rooted(phy), collapse.singles = TRUE,         interactive = FALSE, ...)## S3 method for class 'multiPhylo'drop.tip(phy, tip, ...)keep.tip(phy, tip, ...)## S3 method for class 'phylo'keep.tip(phy, tip, ...)## S3 method for class 'multiPhylo'keep.tip(phy, tip, ...)extract.clade(phy, node, root.edge = 0, collapse.singles = TRUE,              interactive = FALSE)

Arguments

phy

an object of class"phylo".

tip

a vector of mode numeric or character specifying the tipsto delete.

trim.internal

a logical specifying whether to delete thecorresponding internal branches.

subtree

a logical specifying whether to output in the tree howmany tips have been deleted and where.

root.edge

an integer giving the number of internal branches tobe used to build the new root edge. This has no effect iftrim.internal = FALSE.

rooted

a logical indicating whether the tree must be treated asrooted or not. This allows to force the tree to be considered asunrooted (see examples). See details about a possible root.edgeelement in the tree.

collapse.singles

a logical specifying whether to delete theinternal nodes of degree 2.

node

a node number or label.

interactive

ifTRUE the user is asked to select the tipsor the node by clicking on the tree which must be plotted.

...

arguments passed from and to methods.

Details

The argumenttip can be either character or numeric. In thefirst case, it gives the labels of the tips to be deleted; in thesecond case the numbers of these labels in the vectorphy$tip.label are given.

This also applies tonode, but if this argument is characterand the tree has no node label, this results in an error. If more thanone value is given withnode (i.e., a vector of length two ormore), only the first one is used with a warning.

Iftrim.internal = FALSE, the new tips are given"NA" aslabels, unless there are node labels in the tree in which case theyare used.

Ifsubtree = TRUE, the returned tree has one or severalterminal branches named with node labels if available. Otherwise it isindicated how many tips have been removed (with a label"[x_tips]").This is done for as many monophyletic groups that have been deleted.

Note thatsubtree = TRUE impliestrim.internal = TRUE.

To undestand how the optionroot.edge works, see the examplesbelow. Ifrooted = FALSE and the tree has a root edge, thelatter is removed in the output.

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis, Klaus Schliep, Joseph Brown

Examples

data(bird.families)tip <- c("Eopsaltriidae", "Acanthisittidae", "Pittidae", "Eurylaimidae","Philepittidae", "Tyrannidae", "Thamnophilidae", "Furnariidae","Formicariidae", "Conopophagidae", "Rhinocryptidae", "Climacteridae","Menuridae", "Ptilonorhynchidae", "Maluridae", "Meliphagidae","Pardalotidae", "Petroicidae", "Irenidae", "Orthonychidae","Pomatostomidae", "Laniidae", "Vireonidae", "Corvidae","Callaeatidae", "Picathartidae", "Bombycillidae", "Cinclidae","Muscicapidae", "Sturnidae", "Sittidae", "Certhiidae","Paridae", "Aegithalidae", "Hirundinidae", "Regulidae","Pycnonotidae", "Hypocoliidae", "Cisticolidae", "Zosteropidae","Sylviidae", "Alaudidae", "Nectariniidae", "Melanocharitidae","Paramythiidae","Passeridae", "Fringillidae")plot(drop.tip(bird.families, tip))plot(drop.tip(bird.families, tip, trim.internal = FALSE))data(bird.orders)plot(drop.tip(bird.orders, 6:23, subtree = TRUE))plot(drop.tip(bird.orders, c(1:5, 20:23), subtree = TRUE))plot(drop.tip(bird.orders, c(1:20, 23), subtree = TRUE))plot(drop.tip(bird.orders, c(1:20, 23), subtree = TRUE, rooted = FALSE))### Examples of the use of `root.edge'tr <- read.tree(text = "(A:1,(B:1,(C:1,(D:1,E:1):1):1):1):1;")drop.tip(tr, c("A", "B"), root.edge = 0) # = (C:1,(D:1,E:1):1);drop.tip(tr, c("A", "B"), root.edge = 1) # = (C:1,(D:1,E:1):1):1;drop.tip(tr, c("A", "B"), root.edge = 2) # = (C:1,(D:1,E:1):1):2;drop.tip(tr, c("A", "B"), root.edge = 3) # = (C:1,(D:1,E:1):1):3;

Draw Additional Edges on a Plotted Tree

Description

edges draws edges on a plotted tree.fancyarrowsenhancesarrows with triangle and harpoonheads; it can be called fromedges.

Usage

edges(nodes0, nodes1, arrows = 0, type = "classical", ...)fancyarrows(x0, y0, x1, y1, length = 0.25, angle = 30, code = 2,            col = par("fg"), lty = par("lty"), lwd = par("lwd"),            type = "triangle", ...)

Arguments

nodes0,nodes1

vectors of integers giving the tip and/or nodenumbers where to start and to end the edges (eventually recycled).

arrows

an integer between 0 and 3; 0: lines (the default); 1:an arrow head is drawn atnodes0; 2: atnodes1; 3:both.

type

if the previous argument is not 0, the type of arrow head:"classical" (just lines, the default),"triangle","harpoon", or any unambiguous abbreviations of these. Forfancyarrows only the last two are available.

x0,y0,x1,y1

the coordinates of the start and end points forfancyarrows (these are not recycled and so should be vectorsof the same length).

length,angle,code,col,lty,lwd

default options similar tothose ofarrows.

...

further arguments passed tosegments.

Details

The first function is helpful when drawing reticulations on a phylogeny,especially if computed from the edge matrix.

fancyarrows does not work with log-transformed scale(s).

Author(s)

Emmanuel Paradis

Examples

set.seed(2)tr <- rcoal(6)plot(tr, "c")edges(10, 9, col = "red", lty = 2)edges(10:11, 8, col = c("blue", "green")) # recycling of 'nodes1'edges(1, 2, lwd = 2, type = "h", arrows = 3, col = "green")nodelabels()

Evolutionary Networks

Description

evonet builds a network from a tree of class"phylo". There areprint,plot, andreorder methods as well as a few conversion functions.

Usage

evonet(phy, from, to = NULL)## S3 method for class 'evonet'print(x, ...)## S3 method for class 'evonet'plot(x, col = "blue", lty = 1, lwd = 1, alpha = 0.5,              arrows = 0, arrow.type = "classical", ...)## S3 method for class 'evonet'Nedge(phy)## S3 method for class 'evonet'reorder(x, order = "cladewise", index.only = FALSE, ...)## S3 method for class 'evonet'as.phylo(x, ...)## S3 method for class 'evonet'as.networx(x, weight = NA, ...)## S3 method for class 'evonet'as.network(x, directed = TRUE, ...)## S3 method for class 'evonet'as.igraph(x, directed = TRUE, use.labels = TRUE, ...)as.evonet(x, ...)## S3 method for class 'phylo'as.evonet(x, ...)read.evonet(file = "", text = NULL, comment.char = "", ...)write.evonet(x, file = "", ...)

Arguments

phy

an object of class"phylo".

x

an object of class"evonet".

from

a vector (or a matrix ifto = NULL) giving the nodeor tip numbers involved in the reticulations.

to

a vector of the same length thanfrom.

col,lty,lwd

colors, line type and width of the reticulations(recycled if necessary).

alpha

a value between 0 and 1 specifying the transparency ofthe reticulations.

arrows,arrow.type

seefancyarrows.

order,index.only

seereorder.phylo.

weight

a numeric vector giving the weights for thereticulations when converting to the class"networx"(recycled or shortened if needed).

directed

a logical: should the network be considered asdirected?TRUE by default.

use.labels

a logical specifying whether to use the tip and nodelabels when building the network of class"igraph".

file,text,comment.char

seeread.tree.

...

arguments passed to other methods.

Details

evonet is a constructor function that checks the arguments.

The classes"networx","network", and"igraph"are defined in the packagesphangorn,network, andigraph, respectively.

read.evonet reads networks from files in extended newick format(Cardona et al. 2008).

Value

an object of classc("evonet", "phylo") which is made of anobject of class"phylo" plus an elementreticulation coding additional edges among nodes and uses thesame coding rules than theedge matrix.

The conversion functions return an object of the appropriate class.

Author(s)

Emmanuel Paradis, Klaus Schliep

References

Cardona, G., Rossell, F., and Valiente, G. (2008) Extended Newick: itis time for a standard representation of phylogeneticnetworks.BMC Bioinformatics,9, 532.

Examples

tr <- rcoal(5)(x <- evonet(tr, 6:7, 8:9))plot(x)## simple example of extended Newick format:(enet <- read.evonet(text = "((a:2,(b:1)#H1:1):1,(#H1,c:1):2);"))plot(enet, arrows=1)## from Fig. 2 in Cardona et al. 2008:z <- read.evonet(text ="((1,((2,(3,(4)Y#H1)g)e,(((Y#H1, 5)h,6)f)X#H2)c)a,((X#H2,7)d,8)b)r;")zplot(z)## Not run: if (require(igraph)) {    plot(as.igraph(z))}## End(Not run)

Incomplete distances and edge weights of unrooted topology

Description

This function implements a method for checking whether an incompleteset of distances satisfy certain conditions that might make ituniquely determine the edge weights of a given topology, T. It printsinformation about whether the graph with vertex set the set of leaves,denoted by X, and edge set the set of non-missing distance pairs,denoted by L, is connected or strongly non-bipartite. It then alsochecks whether L is a triplet cover for T.

Usage

ewLasso(X, phy)

Arguments

X

a distance matrix.

phy

an unrooted tree of class"phylo".

Details

Missing values must be represented by eitherNA or a negative value.

This implements a method for checking whether an incomplete set ofdistances satisfies certain conditions that might make it uniquelydetermine the edge weights of a given topology, T. It printsinformation about whether the graph, G, with vertex set the set ofleaves, denoted by X, and edge set the set of non-missing distancepairs, denoted by L, is connected or strongly non-bipartite. It alsochecks whether L is a triplet cover for T. If G is not connected, thenT does not need to be the only topology satisfying the inputincomplete distances. If G is not strongly non-bipartite then theedge-weights of the edges of T are not the unique ones for which theinput distance is satisfied. If L is a triplet cover, then the inputdistance matrix uniquely determines the edge weights of T. See Dresset al. (2012) for details.

Value

NULL, the results are printed in the console.

Author(s)

Andrei Popescu

References

Dress, A. W. M., Huber, K. T., and Steel, M. (2012) ‘Lassoing’ aphylogentic tree I: basic properties, shellings and covers.Journal of Mathematical Biology,65(1), 77–105.

Gamma-Statistic of Pybus and Harvey

Description

This function computes the gamma-statistic which summarizes theinformation contained in the inter-node intervals of a phylogeny. Itis assumed that the tree is ultrametric. Note that the function doesnot check that the tree is effectively ultrametric, so if it is not,the returned result may not be meaningful.

Usage

gammaStat(phy)

Arguments

phy

an object of class"phylo".

Details

The gamma-statistic is a summary of the information contained in theinter-node intervals of a phylogeny; it follows, under the assumptionthat the clade diversified with constant rates, a normal distributionwith mean zero and standard-deviation unity (Pybus and Harvey2000). Thus, the null hypothesis that the clade diversified withconstant rates may be tested with2*(1 - pnorm(abs(gammaStat(phy)))) for a two-tailed test, or1 - pnorm(abs(gammaStat(phy))) for a one-tailed test, both returningthe corresponding P-value.

Value

a numeric vector of length one.

Author(s)

Emmanuel Paradis

References

Pybus, O. G. and Harvey, P. H. (2000) Testing macro-evolutionarymodels using incomplete molecular phylogenies.Proceedings ofthe Royal Society of London. Series B. Biological Sciences,267, 2267–2272.

Read Annotations from GenBank

Description

This function connects to the GenBank database and reads sequence annotationsusing accession number(s) given as argument.

Usage

getAnnotationsGenBank(access.nb, quiet = TRUE)

Arguments

access.nb

a vector of mode character giving the accessionnumbers.

quiet

a logical value indicating whether to show the progressof the downloads.

Details

The sequence annotations (a.k.a. feature list) are returned in a dataframe with five or six columns: start, end, type, product, others, andgene (the last being optional). This is the same information that canbe downloaded from NCBI's Web interface by clicking on ‘Send to:’,‘File’, and then selecting ‘Feature Table’ under ‘Format’.

A warning is given if some features are incomplete (this informationis then dropped from the returned object).

A warning is given if some accession numbers are not found on GenBank.

Value

One of the followings: (i) a data frame ifaccess.nb contains asingle accession number; (ii) a list of data frames ifaccess.nb contains several accession numbers, the names are setwithaccess.nb (if some accession numbers are not found onGenBank, the corresponding entries are set toNULL); (iii)NULL if all accession numbers are not found on GenBank.

Author(s)

Emmanuel Paradis

References

https://www.ncbi.nlm.nih.gov/Sequin/table.html (Note: itseems this URL is broken; 2022-01-03)

Examples

## The 8 sequences of tanagers (Ramphocelus):ref <- c("U15717", "U15718", "U15719", "U15720",         "U15721", "U15722", "U15723", "U15724")## Copy/paste or type the following commands if you## want to try them.## Not run: annot.rampho <- getAnnotationsGenBank(ref)annot.rampho## check all annotations are the same:unique(do.call(rbind, annot.rampho)[, -5])## End(Not run)

Phylogenetic Tree of 193 HIV-1 Sequences

Description

This data set describes an estimated clock-like phylogeny of 193 HIV-1group M sequences sampled in the Democratic Republic of Congo.

Usage

data(hivtree.newick)data(hivtree.table)

Format

hivtree.newick is a string with the tree in Newick format.The data framehivtree.table contains the corresponding internodedistances.

Source

This is a data example from Strimmer and Pybus (2001).

References

Strimmer, K. and Pybus, O. G. (2001) Exploring the demographic historyof DNA sequences using the generalized skyline plot.MolecularBiology and Evolution,18, 2298–2305.

Calculate Numbers of Phylogenetic Trees

Description

howmanytrees calculates the number of possible phylogenetictrees for a given number of tips.

LargeNumber is a utility function to compute (approximately)large numbers from the powera^b.

Usage

howmanytrees(n, rooted = TRUE, binary = TRUE,             labeled = TRUE, detail = FALSE)LargeNumber(a, b)## S3 method for class 'LargeNumber'print(x, latex = FALSE, digits = 1, ...)

Arguments

n

a positive numeric integer giving the number of tips.

rooted

a logical indicating whether the trees are rooted(default isTRUE).

binary

a logical indicating whether the trees are bifurcating(default isTRUE).

labeled

a logical indicating whether the trees have tipslabeled (default isTRUE).

detail

a logical indicating whether the eventual intermediatecalculations should be returned (default isFALSE). Thisapplies only for the multifurcating trees, and the bifurcating,rooted, unlabeled trees (aka tree shapes).

a,b

two numbers.

x

an object of class"LargeNumber".

latex

a logical value specifying whether to print the number inLaTeX code in addition to return it.

digits

the number of digits printed for the real part of thelarge number (unused iflatex = FALSE).

...

(unused).

Details

In the cases of labeled binary trees, the calculation is done directlyand a single numeric value is returned (or an object of class"LargeNumber").

For multifurcating trees, and bifurcating, rooted, unlabeled trees,the calculation is done iteratively for 1 ton tips. Thus theuser can print all the intermediate values ifdetail = TRUE, oronly a single value ifdetail = FALSE (the default).

For multifurcating trees, ifdetail = TRUE, a matrix isreturned with the number of tips as rows (named from1 ton), and the number of nodes as columns (named from1 ton - 1). For bifurcating, rooted, unlabeled trees, a vector isreturned with names equal to the number of tips (from1 ton).

The number of unlabeled trees (aka tree shapes) can be computed onlyfor the rooted binary cases.

Note that if an infinite value (Inf) is returned this does notmean that there is an infinite number of trees (this cannot be if thenumber of tips is finite), but that the calculation is beyond thelimits of the computer. Only for the cases of rooted, binary, labeledtopologies an approximate number is returned in the form a"LargeNumber" object.

Value

a single numeric value, an object of class"LargeNumber", or inthe case wheredetail = TRUE is used, a named vector ormatrix.

Author(s)

Emmanuel Paradis

References

Felsenstein, J. (2004)Inferring Phylogenies. Sunderland:Sinauer Associates.

Examples

### Table 3.1 in Felsenstein 2004:for (i in c(1:20, 30, 40, 50))  cat(paste(i, howmanytrees(i), sep = "\t"), sep ="\n")### Table 3.6:howmanytrees(8, binary = FALSE, detail = TRUE)

Graphical Identification of Nodes and Tips

Description

This function allows to identify a clade on a plotted tree by clickingon the plot with the mouse. The tree, specified in the argumentx, must be plotted beforehand.

Usage

## S3 method for class 'phylo'identify(x, nodes = TRUE, tips = FALSE,                  labels = FALSE, quiet = FALSE, ...)

Arguments

x

an object of class"phylo".

nodes

a logical specifying whether to identify the node.

tips

a logical specifying whether to return the tipinformation.

labels

a logical specifying whether to return the labels; bydefault only the numbers are returned.

quiet

a logical controlling whether to print a message invitingthe user to click on the tree.

...

further arguments to be passed to or from other methods.

Details

By default, the clade is identified by its number as found in the‘edge’ matrix of the tree. Iftips = TRUE, the tips descendingfrom the identified node are returned, possibly together with thenode. Iflabels = TRUE, the labels are returned (if the treehas no node labels, then the node numbered is returned).

The node is identified by the shortest distance where the clickoccurs. If the click occurs close to a tip, the function returns itsinformation.

Value

A list with one or two vectors named"tips" and/or"nodes" with the identification of the tips and/or of thenodes.

Note

This function does not add anything on the plot, but it can be wrappedwith, e.g.,nodelabels (see example), or its results canbe sent to, e.g.,drop.tip.

Author(s)

Emmanuel Paradis

Examples

## Not run: tr <- rtree(20)f <- function(col) {    o <- identify(tr)    nodelabels(node=o$nodes, pch = 19, col = col)}plot(tr)f("red") # click close to a nodef("green")## End(Not run)

Plot of DNA Sequence Alignement

Description

This function plots an image of an alignment of nucleotide sequences.

Usage

## S3 method for class 'DNAbin'image(x, what, col, bg = "white", xlab = "", ylab = "",      show.labels = TRUE, cex.lab = 1, legend = TRUE,      grid = FALSE, show.bases = FALSE, base.cex = 1,      base.font = 1, base.col = "black", scheme = "Ape_NT", ...)

Arguments

x

a matrix of DNA sequences (class"DNAbin").

what

a vector of characters specifying the bases tovisualize. If missing, this is set to “a”, “g”, “c”, “t”,“n”, and “-” (in this order).

col

a vector of colours. If missing, this is set to “red”,“yellow”, “green”, “blue”, “grey”, and “black”. If it isshorter (or longer) thanwhat, it is recycled (or shortened).

bg

the colour used for nucleotides whose base is not amongwhat.

xlab

the label for thex-axis; none by default.

ylab

Idem for they-axis. Note that by default, thelabels of the sequences are printed on they-axis (see next option).

show.labels

a logical controlling whether the sequence labelsare printed (TRUE by default).

cex.lab

a single numeric controlling the size of the sequence labels.Usecex.axis to control the size of the annotations on thex-axis.

legend

a logical controlling whether the legend is plotted(TRUE by default).

grid

a logical controlling whether to draw a grid (FALSEby default).

show.bases

a logical controlling whether to show the base symbols(FALSE by default).

base.cex,base.font,base.col

control the aspect of the basesymbols (ignored if the previous isFALSE).

scheme

a predefined color scheme. For amino acid options are "Ape_AA","Zappo_AA", "Clustal", "Polarity" and "Transmembrane_tendency", fornucleotides "Ape_NT" and "RY_NT".

...

further arguments passed toimage.default (e.g.,xlab,cex.axis).

Details

The idea of this function is to allow flexible plotting and colouringof a nucleotide alignment. By default, the most common bases (a, g, c,t, and n) and alignment gap are plotted using a standard colourscheme.

It is possible to plot only one base specified aswhat with achosen colour: this might be useful to check, for instance, thedistribution of alignment gaps (image(x, "-")) or missing data(see examples).

Author(s)

Emmanuel Paradis, Klaus Schliep

Examples

data(woodmouse)image(woodmouse)rug(seg.sites(woodmouse), -0.02, 3, 1)image(woodmouse, "n", "blue") # show missing dataimage(woodmouse, c("g", "c"), "green") # G+Cpar(mfcol = c(2, 2))### barcoding style:for (x in c("a", "g", "c", "t"))    image(woodmouse, x, "black", cex.lab = 0.5, cex.axis = 0.7)par(mfcol = c(1, 1))### zoom on a portion of the data:image(woodmouse[11:15, 1:50], c("a", "n"), c("blue", "grey"))grid(50, 5, col = "black")### see the guanines on a black background:image(woodmouse, "g", "yellow", "black")### Amino acidX <- trans(woodmouse, 2)image(X) # default ape colorsimage(X, scheme="Clustal") # Clustal coloring

Test for Binary Tree

Description

This function tests whether a phylogenetic tree is binary.

Usage

is.binary(phy)## S3 method for class 'phylo'is.binary(phy)## S3 method for class 'multiPhylo'is.binary(phy)## S3 method for class 'tree'is.binary(phy)

Arguments

phy

an object of class"phylo" or"multiPhylo".

Details

The test differs whether the tree is rooted or not. An urooted tree isconsidered binary if all its nodes are of degree three (i.e., threeedges connect to each node). A rooted tree is considered binary if allnodes (including the root node) have exactly two descendant nodes, sothat they are of degree three expect the root which is of degree 2.

The test ignores branch lengths. Consider usingdi2multiif you want to treat zero-branch lengths as resulting frommultichotomies.

is.binary.tree is deprecated and will be removed soon:currently it callsis.binary.

Value

a logical vector.

Author(s)

Emmanuel Paradis

Examples

is.binary(rtree(10))is.binary(rtree(10, rooted = FALSE))is.binary(stree(10))x <- setNames(rmtree(10, 10), LETTERS[1:10])is.binary(x)

Check Compatibility of Splits

Description

is.compatible is a generic function with a method for the class"bitsplits". It checks whether a set of splits is compatibleusing thearecompatible function.

Usage

is.compatible(obj)## S3 method for class 'bitsplits'is.compatible(obj)arecompatible(x, y, n)

Arguments

obj

an object of class"bitsplits".

x,y

a vector of mode raw.

n

the number of taxa in the splits.

Value

TRUE if the splits are compatible,FALSE otherwise.

Author(s)

Andrei Popescu

Is Group Monophyletic

Description

This function tests whether a list of tip labels is monophyletic on a given tree.

Usage

is.monophyletic(phy, tips, reroot = !is.rooted(phy), plot = FALSE, ...)

Arguments

phy

a phylogenetic tree description of class"phylo".

tips

a vector of mode numeric or character specifying the tips to be tested.

reroot

a logical. IfFALSE, then the input tree is not unrooted before the test.

plot

a logical. IfTRUE, then the tree is plotted with the specified grouptips highlighted.

...

further arguments passed toplot.

Details

Ifphy is rooted, the test is done on the rooted tree, otherwisethe tree is first unrooted, then arbitrarily rerooted, in order to beindependent on the current position of the root. That is, the testasks iftips could be monophyletic given any favourably rootingofphy.

Ifphy is unrooted the test is done on an unrooted tree, unlessreroot = FALSE is specified.

If tip labels in the listtips are given as characters, they needto be spelled as in the objectphy.

Value

TRUE orFALSE.

Author(s)

Johan Nylanderjnylander@users.sourceforge.net

Examples

    ## Test one monophyletic and one paraphyletic group on the bird.orders tree    ## Not run: data("bird.orders")    ## Not run: is.monophyletic(phy = bird.orders, tips = c("Ciconiiformes", "Gruiformes"))    ## Not run: is.monophyletic(bird.orders, c("Passeriformes", "Ciconiiformes", "Gruiformes"))

Test if a Tree is Ultrametric

Description

This function tests whether a tree is ultrametric using the distancesfrom each tip to the root.

Usage

is.ultrametric(phy, ...)## S3 method for class 'phylo'is.ultrametric(phy, tol = .Machine$double.eps^0.5, option = 1, ...)## S3 method for class 'multiPhylo'is.ultrametric(phy, tol = .Machine$double.eps^0.5, option = 1, ...)

Arguments

phy

an object of class"phylo" or"multiPhylo".

tol

a numeric >= 0, variation below this value are considerednon-significant.

option

an integer (1 or 2; see details).

...

arguments passed among methods.

Details

The test is based on the distances from each tip to the root and acriterion: ifoption = 1, the criterion is the scaled range((max - min/max)), ifoption = 2, the variance is used (thiswas the method used until ape 3.5). The default criterion is invariantto linear changes of the branch lengths.

Value

a logical vector.

Author(s)

Emmanuel Paradis

Examples

is.ultrametric(rtree(10))is.ultrametric(rcoal(10))

Plot Multiple Chronograms on the Same Scale

Description

The main argument is a list of (rooted) trees which are plotted on thesame scale.

Usage

kronoviz(x, layout = length(x), horiz = TRUE, ...,         direction = ifelse(horiz, "rightwards", "upwards"), side = 2)

Arguments

x

a list of (rooted) trees of class"phylo".

layout

an integer giving the number of trees plottedsimultaneously; by default all.

horiz

a logical specifying whether the trees should be plottedrightwards (the default) or upwards.

...

further arguments passed toplot.phylo.

direction

a character string specifying the direction of thetree. Four values are possible: "rightwards" (the default),"leftwards", "upwards", and "downwards".

side

Where to put the axis, see example.

Details

The size of the individual plots is proportional to the size of thetrees.

Value

NULL

Author(s)

Emmanuel Paradis, Klaus Schliep

Examples

TR <- replicate(10, rcoal(sample(11:20, size = 1)), simplify = FALSE)kronoviz(TR)kronoviz(TR, side = 1)kronoviz(TR, horiz = FALSE, type = "c", show.tip.label = FALSE)kronoviz(TR, direction = "d", side = c(1,2))

Label Management

Description

These functions work on a vector of character strings storing bi- or trinomial species names, typically “Genus_species_subspecies”.

Usage

label2table(x, sep = NULL, as.is = FALSE)stripLabel(x, species = FALSE, subsp = TRUE, sep = NULL)abbreviateGenus(x, genus = TRUE, species = FALSE, sep = NULL)

Arguments

x

a vector of mode character.

sep

the separator (a single character) between the taxonomic levels (see details).

as.is

a logical specifying whether to convert characters into factors (like inread.table).

species,subsp,genus

a logical specifying whether the taxonomic level is concerned by the operation.

Details

label2table returns a data frame with three columns named “genus”, “species”, and “subspecies” (withNA if the level is missing).

stripLabel deletes the subspecies names from the input. Ifspecies = TRUE, the species names are also removed, thus returning only the genus names.

abbreviateGenus abbreviates the genus names keeping only the first letter. Ifspecies = TRUE, the species names are abbreviated.

By default, these functions try to guess what is the separator between the genus, species and/or subspecies names. If an underscore is present in the input, then this character is assumed to be the separator; otherwise, a space. If this does not work, you can specifysep to its appropriate value.

Value

A vector of mode character or a data frame.

Author(s)

Emmanuel Paradis

Examples

x <- c("Panthera_leo", "Panthera_pardus", "Panthera_onca", "Panthera_uncia",       "Panthera_tigris_altaica", "Panthera_tigris_amoyensis")label2table(x)stripLabel(x)stripLabel(x, TRUE)abbreviateGenus(x)abbreviateGenus(x, species = TRUE)abbreviateGenus(x, genus = FALSE, species = TRUE)

Ladderize a Tree

Description

This function reorganizes the internal structure of the tree to getthe ladderized effect when plotted.

Usage

ladderize(phy, right = TRUE)

Arguments

phy

an object of class"phylo".

right

a logical specifying whether the smallest clade is on theright-hand side (when the tree is plotted upwards), or the opposite(ifFALSE).

Author(s)

Emmanuel Paradis

Examples

tr <- rcoal(50)layout(matrix(1:4, 2, 2))plot(tr, main = "normal")plot(ladderize(tr), main = "right-ladderized")plot(ladderize(tr, FALSE), main = "left-ladderized")layout(matrix(1, 1))

Leading and Trailing Alignment Gaps to N

Description

Substitutes leading and trailing alignment gaps in aligned sequencesintoN (i.e., A, C, G, or T). The gaps in the middle of thesequences are left unchanged.

Usage

latag2n(x)

Arguments

x

an object of class"DNAbin" with the aligned sequences.

Details

This function is called by others inape and inpegas. Itis documented here in case it needs to be called by other packages.

Value

an object of class"DNAbin".

Author(s)

Emmanuel Paradis

Examples

x <- as.DNAbin(matrix(c("-", "A", "G", "-", "T", "C"), 2, 3))y <- latag2n(x)alview(x)alview(y)

Multiple regression through the origin

Description

Functionlmorigin computes a multiple linear regression and performs tests of significance of the equation parameters (F-test of R-square and t-tests of regression coefficients) using permutations.

The regression line can be forced through the origin. Testing the significance in that case requires a special permutation procedure. This option was developed for the analysis of independent contrasts, which requires regression through the origin. A permutation test, described by Legendre & Desdevises (2009), is needed to analyze contrasts that are not normally distributed.

Usage

lmorigin(formula, data, origin=TRUE, nperm=999, method=NULL, silent=FALSE)

Arguments

formula

A formula specifying the bivariate model, as inlm andaov.

data

A data frame containing the two variables specified in the formula.

origin

origin = TRUE (default) to compute regression through the origin;origin = FALSE to compute multiple regression with estimation of the intercept.

nperm

Number of permutations for the tests. Ifnperm = 0, permutation tests will not be computed. The default value isnperm = 999. For large data files, the permutation test is rather slow since the permutation procedure is not compiled.

method

method = "raw" computes t-tests of the regression coefficients by permutation of the raw data.method = "residuals" computes t-tests of the regression coefficients by permutation of the residuals of the full model. Ifmethod = NULL, permutation of the raw data is used to test the regression coefficients in regression through the origin; permutation of the residuals of the full model is used to test the regression coefficients in ordinary multiple regression.

silent

Informative messages and the time to compute the tests will not be written to theR console if silent=TRUE. Useful when the function is called by a numerical simulation function.

Details

The permutation F-test of R-square is always done by permutation of the raw data. When there is a single explanatory variable, permutation of the raw data is used for the t-test of the single regression coefficient, whatever the method chosen by the user. The rationale is found in Anderson & Legendre (1999).

Theprint.lmorigin function prints out the results of the parametric tests (in all cases) and the results of the permutational tests (when nperm > 0).

Value

reg

The regression output object produced by functionlm.

p.param.t.2tail

Parametric probabilities for 2-tailed tests of the regression coefficients.

p.param.t.1tail

Parametric probabilities for 1-tailed tests of the regression coefficients. Each test is carried out in the direction of the sign of the coefficient.

p.perm.t.2tail

Permutational probabilities for 2-tailed tests of the regression coefficients.

p.perm.t.1tail

Permutational probabilities for 1-tailed tests of the regression coefficients. Each test is carried out in the direction of the sign of the coefficient.

p.perm.F

Permutational probability for the F-test of R-square.

origin

TRUE is regression through the origin has been computed, FALSE if multiple regression with estimation of the intercept has been used.

nperm

Number of permutations used in the permutation tests.

method

Permutation method for the t-tests of the regression coefficients:method = "raw" ormethod = "residuals".

var.names

Vector containing the names of the variables used in the regression.

call

The function call.

Author(s)

Pierre Legendre, Universite de Montreal

References

Anderson, M. J. and Legendre, P. (1999) An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model.Journal of Statistical Computation and Simulation,62, 271–303.

Legendre, P. and Desdevises, Y. (2009) Independent contrasts and regression through the origin.Journal of Theoretical Biology,259, 727–743.

Sokal, R. R. and Rohlf, F. J. (1995)Biometry - The principles andpractice of statistics in biological research. Third edition. NewYork: W. H. Freeman.

Examples

## Example 1 from Sokal & Rohlf (1995) Table 16.1## SO2 air pollution in 41 cities of the USAdata(lmorigin.ex1)out <- lmorigin(SO2 ~ ., data=lmorigin.ex1, origin=FALSE, nperm=99)out## Example 2: Contrasts computed on the phylogenetic tree of Lamellodiscus## parasites. Response variable: non-specificity index (NSI); explanatory## variable: maximum host size. Data from Table 1 of Legendre & Desdevises## (2009).data(lmorigin.ex2)out <- lmorigin(NSI ~ MaxHostSize, data=lmorigin.ex2, origin=TRUE, nperm=99)out## Example 3: random numbersy <- rnorm(50)X <- as.data.frame(matrix(rnorm(250),50,5))out <- lmorigin(y ~ ., data=X, origin=FALSE, nperm=99)out

Lineages Through Time Plot

Description

These functions provide tools for plotting the numbers of lineagesthrough time from phylogenetic trees.

Usage

ltt.plot(phy, xlab = "Time", ylab = "N",         backward = TRUE, tol = 1e-6, ...)ltt.lines(phy, backward = TRUE, tol = 1e-6, ...)mltt.plot(phy, ..., dcol = TRUE, dlty = FALSE, legend = TRUE,          xlab = "Time", ylab = "N", log = "", backward = TRUE,          tol = 1e-6)ltt.coplot(phy, backward = TRUE, ...)ltt.plot.coords(phy, backward = TRUE, tol = 1e-6, type = "S")

Arguments

phy

an object of class"phylo"; this could be an objectof class"multiPhylo" in the case ofmltt.plot.

xlab

a character string (or a variable of mode character)giving the label for thex-axis (default is "Time").

ylab

idem for they-axis (default is "N").

backward

a logical value: should the time axis be traced fromthe present (the default), or from the root of the tree?

tol

a numeric value (see details).

...

in the cases ofltt.plot(),ltt.lines(),orltt.coplot() these are further (graphical) arguments to bepassed toplot(),lines(), orplot.phylo(),respectively (see details on how to transform the axes); inthe case ofmltt.plot() these are additional trees to be plotted(see details).

dcol

a logical specifying whether the different curves shouldbe differentiated with colors (default isTRUE).

dlty

a logical specifying whether the different curves shouldbe differentiated with patterns of dots and dashes (default isFALSE).

legend

a logical specifying whether a legend should beplotted.

log

a character string specifying which axis(es) to belog-transformed; must be one of the followings:"","x","y", or"xy".

type

either"S" or"s", the preferred type of step function, correspondingto argumenttype of base functionplot(). See section "Value" below.

Details

ltt.plot does a simple lineages through time (LTT)plot. Additional arguments (...) may be used to change, forinstance, the limits on the axes (withxlim and/orylim) or other graphical settings (col for the color,lwd for the line thickness,lty for the line type may beuseful; seepar for an exhaustive listing ofgraphical parameters). They-axis can be log-transformed byadding the following option:log = "y".

The optiontol is used as follows: first the most distant tipfrom the root is found, then all tips whose distance to the root isnot different from the previous one more thantol areconsidered to be contemporaneous with it.

If the tree is not ultrametric, the plot is done assuming the tips,except the most distant from the root, represent extinction events. Ifa root edge is present, it is taken into account.

ltt.lines adds a LTT curve to an existing plot. Additionalarguments (...) may be used to change the settings of the addedline.

mltt.plot does a multiple LTT plot taking as arguments one orseveral trees. These trees may be given as objects of class"phylo" (single trees) and/or"multiPhylo" (multipletrees). Any number of objects may be given. This function is mainlyfor exploratory analyses with the advantages that the axes are setproperly to view all lines, and the legend is plotted by default. Theplot will certainly make sense if all trees have theirmost-distant-from-the-root tips contemporaneous (i.e., trees with onlyextinct lineages will not be represented properly). For more flexiblesettings of line drawings, it may be better to combineltt.plot() with successive calls ofltt.lines() (seeexamples).

ltt.coplot is meant to show how to set a tree and a LTT plotson the same scales. All extra arguments modify only the appearance ofthe tree. The code can be easily edited and tailored.

Value

ltt.plot.coords returns a two-column matrix with the timepoints and the number of lineages, respectively.type = "S" returns the number of lineages to the left of (or "up to") the corresponding point in time, whiletype = "s" returns the number oflineages to the right of this point (i.e, between that time and the next).

Author(s)

Emmanuel Paradis

References

Harvey, P. H., May, R. M. and Nee, S. (1994) Phylogenies withoutfossils.Evolution,48, 523–529.

Nee, S., Holmes, E. C., Rambaut, A. and Harvey, P. H. (1995) Inferringpopulation history from molecular phylogenies.PhilosophicalTransactions of the Royal Society of London. Series B. BiologicalSciences,349, 25–31.

Examples

data(bird.families)opar <- par(mfrow = c(2, 1))ltt.plot(bird.families)title("Lineages Through Time Plot of the Bird Families")ltt.plot(bird.families, log = "y")title(main = "Lineages Through Time Plot of the Bird Families",      sub = "(with logarithmic transformation of the y-axis)")par(opar)### to plot the tree and the LTT plot togetherdata(bird.orders)layout(matrix(1:4, 2, 2))plot(bird.families, show.tip.label = FALSE)ltt.plot(bird.families, main = "Bird families")plot(bird.orders, show.tip.label = FALSE)ltt.plot(bird.orders, main = "Bird orders")layout(1)### better with ltt.coplot():ltt.coplot(bird.families, show.tip.label = FALSE, x.lim = 27.5)data(chiroptera)chiroptera <- compute.brlen(chiroptera)ltt.coplot(chiroptera, show.tip.label = FALSE, type = "c")### with extinct lineages and a root edge:omar <- par("mar")set.seed(31)tr <- rlineage(0.2, 0.15)tr$root.edge <- 5ltt.coplot(tr, show.tip.label = FALSE, x.lim = 55)## compare with:## ltt.coplot(drop.fossil(tr), show.tip.label = FALSE)layout(1)par(mar = omar)mltt.plot(bird.families, bird.orders)### Generates 10 random trees with 23 tips:TR <- replicate(10, rcoal(23), FALSE)### Give names to each tree:names(TR) <- paste("random tree", 1:10)### And specify the class of the list so that mltt.plot()### does not trash it!class(TR) <- "multiPhylo"mltt.plot(TR, bird.orders)### And now for something (not so) completely different:ltt.plot(bird.orders, lwd = 2)for (i in 1:10) ltt.lines(TR[[i]], lty = 2)legend(-20, 10, lwd = c(2, 1), lty = c(1, 2), bty = "n",       legend = c("Bird orders", "Random (coalescent) trees"))

Label Management

Description

This is a generic function with methods for character vectors, treesof class"phylo", lists of trees of class"multiPhylo",and DNA sequences of class"DNAbin". All options for the classcharacter may be used in the other methods.

Usage

makeLabel(x, ...)## S3 method for class 'character'makeLabel(x, len = 99, space = "_", make.unique = TRUE,          illegal = "():;,[]", quote = FALSE, ...)## S3 method for class 'phylo'makeLabel(x, tips = TRUE, nodes = TRUE, ...)## S3 method for class 'multiPhylo'makeLabel(x, tips = TRUE, nodes = TRUE, ...)## S3 method for class 'DNAbin'makeLabel(x, ...)

Arguments

x

a vector of mode character or an object for which labels areto be changed.

len

the maximum length of the labels: those longer than ‘len’will be truncated.

space

the character to replace spaces, tabulations, andlinebreaks.

make.unique

a logical specifying whether duplicate labelsshould be made unique by appending numerals;TRUE bydefault.

illegal

a string specifying the characters to be deleted.

quote

a logical specifying whether to quote the labels;FALSE by default.

tips

a logical specifying whether tip labels are to bemodified;TRUE by default.

nodes

a logical specifying whether node labels are to bemodified;TRUE by default.

...

further arguments to be passed to or from other methods.

Details

The optionmake.unique does not work exactly in the same waythen the function of the same name: numbers are suffixed to all labelsthat are identical (without separator). See the examples.

If there are 10–99 identical labels, the labels returned are "xxx01","xxx02", etc, or "xxx001", "xxx002", etc, if they are 100–999, and soon. The number of digits added preserves the option ‘len’.

The default for ‘len’ makes labels short enough to be read byPhyML. Clustal accepts labels up to 30 character long.

Value

An object of the appropriate class.

Note

The current version does not perform well when trying to make veryshort unique labels (e.g., less than 5 character long).

Author(s)

Emmanuel Paradis

Examples

x <- rep("a", 3)makeLabel(x)make.unique(x) # <- from R's basex <- rep("aaaaa", 2)makeLabel(x, len = 3) # made unique and of length 3makeLabel(x, len = 3, make.unique = FALSE)

Makes Node Labels

Description

This function makes node labels in a tree in a flexible way.

Usage

makeNodeLabel(phy, ...)## S3 method for class 'phylo'makeNodeLabel(phy, method = "number",                      prefix = "Node", nodeList = list(), ...)## S3 method for class 'multiPhylo'makeNodeLabel(phy, method = "number",                      prefix = "Node", nodeList = list(), ...)

Arguments

phy

an object of class"phylo".

method

a character string giving the method used to create thelabels. Three choices are possible:"number" (the default),"md5sum", and"user", or any unambiguous abbreviationof these.

prefix

the prefix used ifmethod = "number".

nodeList

a named list specifying how nodes are names ifmethod = "user" (see details and examples).

...

further arguments passed togrep.

Details

The three methods are described below:

“number”! The labels are created with 1, 2, ... prefixedwith the argumentprefix; thus the default is to haveNode1, Node2, ... Setprefix = "" to have only numbers.
“md5sum”: For each node, the labels of the tips descendantfrom this node are extracted, sorted alphabetically, and writteninto a temporary file, then the md5sum of this file is extractedand used as label. This results in a 32-character string which isunique (even accross trees) for a given set of tip labels.
“user”: the argumentnodeList must be a list withnames, the latter will be used as node labels. For each element ofnodeList, the tip labels of the tree are searched forpatterns present in this element: this is done usinggrep. Then the most recent common ancestor ofthe matching tips is given the corresponding names as labels. Thisis repeated for each element ofnodeList.

The method"user" can be used in combination with either of thetwo others (see examples). Note that this method only modifies thespecified node labels (so that if the other nodes have already labelsthey are not modified) while the two others change all labels.

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis

Examples

tr <-"((Pan_paniscus,Pan_troglodytes),((Homo_sapiens,Homo_erectus),Homo_abilis));"tr <- read.tree(text = tr)tr <- makeNodeLabel(tr, "u", nodeList = list(Pan = "Pan", Homo = "Homo"))plot(tr, show.node.label = TRUE)### does not erase the previous node labels:tr <- makeNodeLabel(tr, "u", nodeList = list(Hominid = c("Pan","Homo")))plot(tr, show.node.label = TRUE)### the two previous commands could be combined:L <- list(Pan = "Pan", Homo = "Homo", Hominid = c("Pan","Homo"))tr <- makeNodeLabel(tr, "u", nodeList = L)### combining different methods:tr <- makeNodeLabel(tr, c("n", "u"), prefix = "#", nodeList = list(Hominid = c("Pan","Homo")))plot(tr, show.node.label = TRUE)

Mantel Test for Similarity of Two Matrices

Description

This function computes Mantel's permutation test for similarity of twomatrices. It permutes the rows and columns of the second matrixrandomly and calculates aZ-statistic.

Usage

mantel.test(m1, m2, nperm = 999, graph = FALSE,            alternative = "two.sided",  ...)

Arguments

m1

a numeric matrix giving a measure of pairwise distances,correlations, or similarities among observations.

m2

a second numeric matrix giving another measure of pairwisedistances, correlations, or similarities among observations.

nperm

the number of times to permute the data.

graph

a logical indicating whether to produce a summary graph(by default the graph is not plotted).

alternative

a character string defining the alternativehypothesis:"two.sided" (default),"less","greater", or any unambiguous abbreviation of these.

...

further arguments to be passed toplot() (to add atitle, change the axis labels, and so on).

Details

The function calculates aZ-statistic for the Mantel test, equal tothe sum of the pairwise product of the lower triangles of thepermuted matrices, for each permutation of rows and columns. Itcompares the permuted distribution with theZ-statistic observedfor the actual data.

The present implementation can analyse symmetric as well as (sinceversion 5.1 ofape) asymmetric matrices (see Mantel 1967,Sects. 4 and 5). The diagonals of both matrices are ignored.

Ifgraph = TRUE, the functions plots the density estimate ofthe permutation distribution along with the observedZ-statisticas a vertical line.

The... argument allows the user to give further options totheplot function: the title main be changed withmain=,the axis labels withxlab =, andylab =, and so on.

Value

z.stat

theZ-statistic (sum of rows*columns of lowertriangle) of the data matrices.

p

P-value (quantile of the observedZ-statistic inthe permutation distribution).

alternative

the alternative hypothesis.

Author(s)

Original code in S by Ben Bolker, ported toR by Julien Claude

References

Mantel, N. (1967) The detection of disease clustering and ageneralized regression approach.Cancer Research,27,209–220.

Manly, B. F. J. (1986)Multivariate statistical methods: a primer.London: Chapman & Hall.

Examples

q1 <- matrix(runif(36), nrow = 6)q2 <- matrix(runif(36), nrow = 6)diag(q1) <- diag(q2) <- 0mantel.test(q1, q2, graph = TRUE,            main = "Mantel test: a random example with 6 X 6 matricesrepresenting asymmetric relationships",            xlab = "z-statistic", ylab = "Density",            sub = "The vertical line shows the observed z-statistic")

Three Matrices

Description

Three matrices respectively representing Serological (asymmetric),DNA hybridization (asymmetric) and Anatomical (symmetric) distancesamong 9 families.

Usage

data(mat3)

Format

A data frame with 27 observations and 9 variables.

Source

Lapointe, F.-J., J. A. W. Kirsch and J. M. Hutcheon. 1999. Totalevidence, consensus, and bat phylogeny: a distance-basedapproach. Molecular Phylogenetics and Evolution 11: 55-66.

Five Trees

Description

Three partly similar trees, two independent trees.

Usage

data(mat5M3ID)

Format

A data frame with 250 observations and 50 variables.

Source

Data provided by V. Campbell.

Five Independent Trees

Description

Five independent additive trees.

Usage

data(mat5Mrand)

Format

A data frame with 250 observations and 50 variables.

Source

Data provided by V. Campbell.

Matrix Exponential

Description

This function computes the exponential of a square matrix using aspectral decomposition.

Usage

matexpo(x)

Arguments

x

a square matrix of mode numeric.

Value

a numeric matrix of the same dimensions than ‘x’.

Author(s)

Emmanuel Paradis

Examples

### a simple rate matrix:m <- matrix(0.1, 4, 4)diag(m) <- -0.3### towards equilibrium:for (t in c(1, 5, 10, 50)) print(matexpo(m*t))

McConway-Sims Test of Homogeneous Diversification

Description

This function performs the McConway–Sims test that a trait orvariable does not affect diversification rate.

Usage

mcconwaysims.test(x)

Arguments

x

a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease or decrease diversification rate, and the second one the number ofspecies in the sister-clades without the trait. Eachrow represents a pair of sister-clades.

Details

The McConway–Sims test compares a series of sister-clades where oneof the two is characterized by a trait supposed to affectdiversification rate. The null hypothesis is that the trait does notaffect diversification. The alternative hypothesis is thatdiversification rate is increased or decreased by the trait (bycontrast to the Slowinski–Guyer test). The test is a likelihood-ratioof a null Yule model and an alternative model with two parameters.

Value

a data frame with the\chi^2, the number of degrees offreedom, and theP-value.

Author(s)

Emmanuel Paradis

References

McConway, K. J. and Sims, H. J. (2004) A likelihood-based method fortesting for nonstochastic variation of diversification rates inphylogenies.Evolution,58, 12–23.

Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.

Examples

### simulate 10 clades with lambda = 0.1 and mu = 0.09:n0 <- replicate(10, balance(rbdtree(.1, .09, Tmax = 35))[1])### simulate 10 clades with lambda = 0.15 and mu = 0.1:n1 <- replicate(10, balance(rbdtree(.15, .1, Tmax = 35))[1])x <- cbind(n1, n0)mcconwaysims.test(x)slowinskiguyer.test(x)richness.yule.test(x, 35)

Reversible Jump MCMC to Infer Demographic History

Description

These functions implement a reversible jump MCMC framework to infer the demographic history,as well as corresponding confidence bands,from a genealogical tree. The computed demographic history is a continousand smooth function in time.mcmc.popsize runs the actual MCMC chain and outputs information about thesampling steps,extract.popsize generates from this MCMCoutput a table of population size in time, andplot.popsize andlines.popsizeprovide utility functions to plot the corresponding demographic functions.

Usage

mcmc.popsize(tree,nstep, thinning=1, burn.in=0,progress.bar=TRUE,    method.prior.changepoints=c("hierarchical", "fixed.lambda"), max.nodes=30,   lambda=0.5, gamma.shape=0.5, gamma.scale=2,    method.prior.heights=c("skyline", "constant", "custom"),    prior.height.mean,    prior.height.var)extract.popsize(mcmc.out, credible.interval=0.95, time.points=200, thinning=1, burn.in=0)## S3 method for class 'popsize'plot(x, show.median=TRUE, show.years=FALSE,             subst.rate, present.year, xlab = NULL,             ylab = "Effective population size", log = "y", ...)## S3 method for class 'popsize'lines(x, show.median=TRUE,show.years=FALSE, subst.rate, present.year, ...)

Arguments

tree

Either an ultrametric tree (i.e. an object of class"phylo"),or coalescent intervals (i.e. an object of class"coalescentIntervals").

nstep

Number of MCMC steps, i.e. length of the Markov chain (suggested value: 10,000-50,000).

thinning

Thinning factor (suggest value: 10-100).

burn.in

Number of steps dropped from the chain to allow for a burn-in phase (suggest value: 1000).

progress.bar

Show progress bar during the MCMC run.

method.prior.changepoints

Ifhierarchicalis chosen (the default) then the smoothing parameter lambda is drawn froma gamma distribution with some specified shape and scale parameters.Alternatively, forfixed.lambda the value of lambda is a given constant.

max.nodes

Upper limit for the number of internal nodes of the approximating spline (default: 30).

lambda

Smoothing parameter. Formethod="fixed.lambda" the specifed value of lambda determinesthe mean of the prior distribution for the number of internal nodes of the approximatingspline for the demographic function (suggested value: 0.1-1.0).

gamma.shape

Shape parameter of the gamma function from whichlambda is drawn formethod="hierarchical".

gamma.scale

Scale parameter of the gamma function from whichlambda is drawn formethod="hierarchical".

method.prior.heights

Determines the prior for the heights of the change points.Ifcustom is chosen then two functions describing the mean and varianceof the heigths in depence of time have to be specified (viaprior.height.meanandprior.height.var options). Alternatively, two built-in priors are available:constant assumes constant population size and variance determined by Felsenstein(1992), andskyline assumes a skyline plot (see Opgen-Rhein et al. 2004 formore details).

prior.height.mean

Function describing the mean of the prior distribution for the heights(only used ifmethod.prior.heights = custom).

prior.height.var

Function describing the variance of the prior distribution for the heights(only used ifmethod.prior.heights = custom).

mcmc.out

Output frommcmc.popsize - this is needed as input forextract.popsize.

credible.interval

Probability mass of the confidence band (default: 0.95).

time.points

Number of discrete time points in the table output byextract.popsize.

x

Table with population size versus time, as computed byextract.popsize.

show.median

Plot median rather than mean as point estimate for demographic function (default: TRUE).

show.years

Option that determines whether the time is plotted in units ofof substitutions (default) or in years (requires specification of substution rateand year of present).

subst.rate

Substitution rate (see option show.years).

present.year

Present year (see option show.years).

xlab

label on the x-axis (depends on the value ofshow.years).

ylab

label on the y-axis.

log

log-transformation of axes; by default, the y-axis islog-transformed.

...

Further arguments to be passed on toplot orlines.

Details

Please refer to Opgen-Rhein et al. (2005) for methodological details, and the help page ofskyline for information on a related approach.

Author(s)

Rainer Opgen-Rhein and Korbinian Strimmer. Parts of the rjMCMCsampling procedure are adapted fromR code by Karl Broman.

References

Opgen-Rhein, R., Fahrmeir, L. and Strimmer, K. 2005. Inference ofdemographic history from genealogical trees using reversible jumpMarkov chain Monte Carlo.BMC Evolutionary Biology,5,6.

Examples

# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# run mcmc chainmcmc.out <- mcmc.popsize(tree.hiv, nstep=100, thinning=1, burn.in=0,progress.bar=FALSE) # toy run#mcmc.out <- mcmc.popsize(tree.hiv, nstep=10000, thinning=5, burn.in=500) # remove comments!!# make list of population size versus timepopsize  <- extract.popsize(mcmc.out)# plot and compare with skyline plotsk <- skyline(tree.hiv)plot(sk, lwd=1, lty=3, show.years=TRUE, subst.rate=0.0023, present.year = 1997)lines(popsize, show.years=TRUE, subst.rate=0.0023, present.year = 1997)

Mixed Font Labels for Plotting

Description

This function helps to format labels with bits of text in differentfont shapes (italics, bold, or bolditalics) and differentseparators. The output is intended to be used for plotting.

Usage

mixedFontLabel(..., sep = " ", italic = NULL, bold = NULL,               parenthesis = NULL,               always.upright = c("sp.", "spp.", "ssp."))

Arguments

...

vectors of mode character to be formatted. They may beof different lengths in which case the shortest ones arerecycled.

sep

a vector of mode character giving the separators to beprinted between the elements in....

italic

a vector of integers specifying the elements in... to be printed in italics.

bold

id. in boldface.

parenthesis

id. within parentheses.

always.upright

of vector of mode character giving the stringsto not print in italics. Usealways.upright = "" to cancelthis option.

Details

The idea is to have different bits of text in different vectors thatare put together to make a vector ofR expressions. This vector isinterpreted by graphical functions to format the text. A simple usemay bemixedFontLabel(genus, species, italic = 1:2), but it ismore interesting when mixing fonts (see examples).

To have an element in bolditalics, its number must given in bothitalic andbold.

The vector returned by this function may be assigned as thetip.label element of a tree of class"phylo", or even asitsnode.label element.

Value

A vector of mode expression.

Author(s)

Emmanuel Paradis

Examples

tr <- read.tree(text = "((a,(b,c)),d);")genus <- c("Gorilla", "Pan", "Homo", "Pongo")species <- c("gorilla", "spp.", "sapiens", "pygmaeus")geo <- c("Africa", "Africa", "World", "Asia")tr$tip.label <- mixedFontLabel(genus, species, geo, italic = 1:2,  parenthesis = 3)layout(matrix(c(1, 2), 2))plot(tr)tr$tip.label <- mixedFontLabel(genus, species, geo, sep = c(" ", " | "),  italic = 1:2, bold = 3)plot(tr)layout(1)

Find Most Recent Common Ancestors Between Pairs

Description

mrca returns for each pair of tips (and nodes) its mostrecent common ancestor (MRCA).

getMRCA returns the MRCA of two or more tips.

Usage

mrca(phy, full = FALSE)getMRCA(phy, tip)

Arguments

phy

an object of class"phylo".

full

a logical indicating whether to return the MRCAs amongall tips and nodes (ifTRUE); the default is to return onlythe MRCAs among tips.

tip

a vector of mode numeric or character specifying the tips;can also be node numbers.

Details

Formrca, the diagonal is set to the number of the tips (andnodes iffull = TRUE). Iffull = FALSE, the colnames andrownames are set with the tip labels of the tree; otherwise thenumbers are given as names.

ForgetMRCA, iftip is of length one or zero thenNULL is returned.

Value

a matrix of mode numeric (mrca) or a single numeric value(getMRCA).

Author(s)

Emmanuel Paradis, Klaus Schliep, Joseph W. Brown

Minimum Spanning Tree

Description

The functionmst finds the minimum spanning tree between a set ofobservations using a matrix of pairwise distances.

Theplot method plots the minimum spanning tree showing thelinks where the observations are identified by their numbers.

Usage

mst(X)## S3 method for class 'mst'plot(x, graph = "circle", x1 = NULL, x2 = NULL, ...)

Arguments

X

either a matrix that can be interpreted as a distance matrix,or an object of class"dist".

x

an object of class"mst" (e.g. returned bymst()).

graph

a character string indicating the type of graph to plotthe minimum spanning tree; two choices are possible:"circle" wherethe observations are plotted regularly spaced on a circle, and"nsca" where the two first axes of a non-symmetric correspondenceanalysis are used to plot the observations (see Details below). Ifboth argumentsx1 andx2 are given, the argumentgraph is ignored.

x1

a numeric vector giving the coordinates of the observationson thex-axis. Bothx1 andx2 must be specifiedto be used.

x2

a numeric vector giving the coordinates of the observationson they-axis. Bothx1 andx2 must be specifiedto be used.

...

further arguments to be passed toplot().

Details

These functions provide two ways to plot the minimum spanning tree whichtry to space as much as possible the observations in order to show asclearly as possible the links. The optiongraph = "circle"simply plots regularly the observations on a circle, whereasgraph = "nsca" uses a non-symmetric correspondence analysiswhere each observation is represented at the centroid of its neighbours.

Alternatively, the user may use any system of coordinates for theobsevations, for instance a principal components analysis (PCA) if thedistances were computed from an original matrix of continous variables.

Value

an object of class"mst" which is a square numeric matrix of sizeequal to the number of observations with either1 if a linkbetween the corresponding observations was found, or0otherwise. The names of the rows and columns of the distance matrix,if available, are given as rownames and colnames to the returned object.

Author(s)

Yvonnick Noelnoel@univ-lille3.fr,Julien Claudejulien.claude@umontpellier.fr andEmmanuel Paradis

Examples

require(stats)X <- matrix(runif(200), 20, 10)d <- dist(X)PC <- prcomp(X)M <- mst(d)opar <- par(mfcol = c(2, 2))plot(M)plot(M, graph = "nsca")plot(M, x1 = PC$x[, 1], x2 = PC$x[, 2])par(opar)

Collapse and Resolve Multichotomies

Description

These two functions collapse or resolve multichotomies in phylogenetictrees.

Usage

multi2di(phy, ...)## S3 method for class 'phylo'multi2di(phy, random = TRUE, equiprob = TRUE, ...)## S3 method for class 'multiPhylo'multi2di(phy, random = TRUE, equiprob = TRUE, ...)di2multi(phy, ...)## S3 method for class 'phylo'di2multi(phy, tol = 1e-08, ...)## S3 method for class 'multiPhylo'di2multi(phy, tol = 1e-08, ...)

Arguments

phy

an object of class"phylo" or"multiPhylo".

random

a logical value specifying whether to resolve themultichotomies randomly (the default) or in the order they appear inthe tree (ifrandom = FALSE).

equiprob

a logical value: should topologies generated in equalprobabilities; see details inrtree (ignored ifrandom = FALSE).

tol

a numeric value giving the tolerance to consider a branchlength significantly greater than zero.

...

arguments passed among methods.

Details

multi2di transforms all multichotomies into a series ofdichotomies with one (or several) branch(es) of length zero.

di2multi deletes all branches smaller thantol andcollapses the corresponding dichotomies into a multichotomy.

Value

an object of the same class than the input.

Author(s)

Emmanuel Paradis

Examples

data(bird.families)is.binary(bird.families)is.binary(multi2di(bird.families))all.equal(di2multi(multi2di(bird.families)), bird.families)### To see the results of randomly resolving a trichotomy:tr <- read.tree(text = "(a:1,b:1,c:1);")layout(matrix(1:4, 2, 2))for (i in 1:4)  plot(multi2di(tr), use.edge.length = FALSE, cex = 1.5)layout(1)

Manipulating Lists of Trees

Description

These are extraction and replacement operators for lists of treesstored in the class"multiPhylo".

Usage

## S3 method for class 'multiPhylo'x[i]## S3 method for class 'multiPhylo'x[[i]]## S3 method for class 'multiPhylo'x$name## S3 replacement method for class 'multiPhylo'x[i] <- value## S3 replacement method for class 'multiPhylo'x[[i]] <- value## S3 replacement method for class 'multiPhylo'x$i <- value

Arguments

x,value

an object of class"phylo" or"multiPhylo".

i

index(ices) of the tree(s) to select from a list; this may be avector of integers, logicals, or names.

name

a character string specifying the tree to be extracted.

Details

The subsetting operator[ keeps the class correctly("multiPhylo").

The replacement operators check the labels ofvalue ifxhas a single vector of tip labels for all trees (see examples).

Value

An object of class"phylo" ([[,$) or of class"multiPhylo" ([ and the replacement operators).

Author(s)

Emmanuel Paradis

Examples

x <- rmtree(10, 20)names(x) <- paste("tree", 1:10, sep = "")x[1:5]x[1] # subsettingx[[1]] # extractionx$tree1 # same than abovex[[1]] <- rtree(20)y <- .compressTipLabel(x)## up to here 'x' and 'y' have exactly the same information## but 'y' has a unique vector of tip labels for all the treesx[[1]] <- rtree(10) # no errortry(y[[1]] <- rtree(10)) # errortry(x[1] <- rtree(20)) # error## use instead one of the two:x[1] <- list(rtree(20))x[1] <- c(rtree(20))x[1:5] <- rmtree(5, 20) # replacementx[11:20] <- rmtree(10, 20) # elongationx # 20 trees

Minimum Variance Reduction

Description

Phylogenetic tree construction based on the minimum variance reduction.

Usage

mvr(X, V)mvrs(X, V, fs = 15)

Arguments

X

a distance matrix.

V

a variance matrix.

fs

agglomeration criterion parameter: it is coerced as aninteger and must at least equal to one.

Details

The MVR method can be seen as a version of BIONJ which is notrestricted to the Poison model of variance (Gascuel 2000).

Value

an object of class"phylo".

Author(s)

Andrei Popescu

References

Criscuolo, A. and Gascuel, O. (2008). Fast NJ-like algorithms to dealwith incomplete distance matrices.BMC Bioinformatics, 9.

Gascuel, O. (2000). Data model and classification by trees: theminimum variance reduction (MVR) method.Journal ofClassification,17, 67–99.

Examples

data(woodmouse)rt <- dist.dna(woodmouse, variance = TRUE)v <- attr(rt, "variance")tr <- mvr(rt, v)plot(tr, "u")

Neighbor-Joining Tree Estimation

Description

This function performs the neighbor-joining tree estimation of Saitouand Nei (1987).

Usage

nj(X)

Arguments

X

a distance matrix; may be an object of class “dist”.

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis

References

Saitou, N. and Nei, M. (1987) The neighbor-joining method: a newmethod for reconstructing phylogenetic trees.Molecular Biologyand Evolution,4, 406–425.

Studier, J. A. and Keppler, K. J. (1988) A note on theneighbor-joining algorithm of Saitou and Nei.Molecular Biologyand Evolution,5, 729–731.

Examples

### From Saitou and Nei (1987, Table 1):x <- c(7, 8, 11, 13, 16, 13, 17, 5, 8, 10, 13,       10, 14, 5, 7, 10, 7, 11, 8, 11, 8, 12,       5, 6, 10, 9, 13, 8)M <- matrix(0, 8, 8)M[lower.tri(M)] <- xM <- t(M)M[lower.tri(M)] <- xdimnames(M) <- list(1:8, 1:8)tr <- nj(M)plot(tr, "u")### a less theoretical exampledata(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw)

Tree Reconstruction from Incomplete Distances With NJ* or bio-NJ*

Description

Reconstructs a phylogenetic tree from a distance matrix with possiblymissing values.

Usage

njs(X, fs = 15)bionjs(X, fs = 15)

Arguments

X

a distance matrix.

fs

arguments of the agglomerative criterion: it iscoerced as an integer and must at least equal to one.

Details

Missing values represented by eitherNA or any negative number.

Basically, the Q* criterion is applied to all the pairs of leaves, andthes highest scoring ones are chosen for further analysis bythe agglomeration criteria that better handle missing distances (seereferences for details).

Value

an object of class"phylo".

Author(s)

Andrei Popescu

References

Criscuolo, A., Gascuel, O. (2008) Fast NJ-like algorithms to deal withincomplete distance matrices.BMC Bioinformatics,9,166.

Examples

data(woodmouse)d <- dist.dna(woodmouse)dm <- ddm[sample(length(dm), size = 3)] <- NAdist.topo(njs(dm), nj(d)) # often 0dm[sample(length(dm), size = 10)] <- NAdist.topo(njs(dm), nj(d)) # sometimes 0

node.dating

Description

Estimate the dates of a rooted phylogenetic tree from the tip dates.

Usage

estimate.mu(t, node.dates, p.tol = 0.05)estimate.dates(t, node.dates, mu = estimate.mu(t, node.dates),               min.date = -.Machine$double.xmax, show.steps = 0,               opt.tol = 1e-8, nsteps = 1000,               lik.tol = 0, is.binary = is.binary.phylo(t))

Arguments

t

an object of class "phylo"

node.dates

a numeric vector of dates for the tips, in the sameorder as 't$tip.label' or a vector of dates for all of the nodes.

p.tol

p-value cutoff for failed regression.

mu

mutation rate.

min.date

the minimum bound on the dates of nodes

show.steps

print the log likelihood every show.steps. If 0 willsupress output.

opt.tol

tolerance for optimization precision.

lik.tol

tolerance for likelihood comparison.

nsteps

the maximum number of steps to run.

is.binary

if TRUE, will run a faster optimization method thatonly works if the tree is binary; otherwise will use optimize() asthe optimization method.

Details

This code duplicates the functionality of the program Tip.Dates (see references).The dates of the internal nodes of 't' are estimated using a maximum likelihoodapproach.

't' must be rooted and have branch lengths in units of expected substitutions persite.

'node.dates' can be either a numeric vector of dates for the tips or a numericvector for all of the nodes of 't'. 'estimate.mu' will use all of the valuesgiven in 'node.dates' to estimate the mutation rate. Dates can be censored withNA. 'node.dates' must contain all of the tip dates when it is a parameter of'estimate.dates'. If only tip dates are given, then 'estimate.dates' will run aninitial step to estimate the dates of the internal nodes. If 'node.dates'contains dates for some of the nodes, 'estimate.dates' will use those dates aspriors in the inital step. If all of the dates for nodes are given, then'estimate.dates' will not run the inital step.

If 'is.binary' is set to FALSE, 'estimate.dates' uses the "optimize" function asthe optimization method. By default, R's "optimize" function uses a precisionof ".Machine$double.eps^0.25", which is about 0.0001 on a 64-bit system. Thisshould be set to a smaller value if the branch lengths of 't' are very short. If'is.binary' is set to TRUE, estimate dates uses calculus to deterimine the maximumlikelihood at each step, which is faster. The bounds of permissible values arereduced by 'opt.tol'.

'estimate.dates' has several criteria to decide how many steps it will run. If'lik.tol' and 'nsteps' are both 0, then 'estimate.dates' will only run the initialstep. If 'lik.tol' is greater than 0 and 'nsteps' is 0, then 'estimate.dates'will run until the difference between successive steps is less than 'lik.tol'. If'lik.tol' is 0 and 'nsteps' is greater than 0, then 'estimate.dates' will run theinital step and then 'nsteps' steps. If 'lik.tol' and 'nsteps' are both greaterthan 0, then 'estimate.dates' will run the inital step and then either 'nsteps'steps or until the difference between successive steps is less than 'lik.tol'.

Value

The estimated mutation rate as a numeric vector of length one for estimate.mu.

The estimated dates of all of the nodes of the tree as a numeric vector withlength equal to the number of nodes in the tree.

Note

This model assumes that the tree follows a molecular clock. It only performs arudimentary statistical test of the molecular clock hypothesis.

Author(s)

Bradley R. Jones <email: brj1@sfu.ca>

References

Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihoodapproach.Journal of Molecular Evolution,17, 368–376.

Rambaut, A. (2000) Estimating the rate of molecular evolution:incorporating non-contemporaneous sequences into maximum likelihoodphylogenies.Bioinformatics,16, 395–399.

Jones, Bradley R., and Poon, Art F. Y. (2016)node.dating: dating ancestors in phylogenetic trees in RBioinformatics,33, 932–934.

Examples

t <- rtree(100)tip.date <- rnorm(t$tip.label, mean = node.depth.edgelength(t)[1:Ntip(t)])^2t <- rtt(t, tip.date)mu <- estimate.mu(t, tip.date)## Run for 100 stepsnode.date <- estimate.dates(t, tip.date, mu, nsteps = 100)## Run until the difference between successive log likelihoods is## less than $10^{-4}$ starting with the 100th step's resultsnode.date <- estimate.dates(t, node.date, mu, nsteps = 0, lik.tol = 1e-4)## To rescale the tree over timet$edge.length <- node.date[t$edge[, 2]] - node.date[t$edge[, 1]]

Depth and Heights of Nodes and Tips

Description

These functions return the depths or heights of nodes and tips.

Usage

node.depth(phy, method = 1)node.depth.edgelength(phy)node.height(phy, clado.style = FALSE)

Arguments

phy

an object of class "phylo".

method

an integer value (1 or 2); 1: the node depths areproportional to the number of tips descending from each node, 2:they are evenly spaced.

clado.style

a logical value; ifTRUE, the node heightsare calculated for a cladogram.

Details

node.depth computes the depth of a node depending on the valueofmethod (see the optionnode.depth inplot.phylo). The value of 1 is given to the tips.

node.depth.edgelength does the same but using branch lengths.

node.height computes the heights of nodes and tips as plottedby a phylogram or a cladogram.

Value

A numeric vector indexed with the node numbers of the matrix ‘edge’ ofphy.

Author(s)

Emmanuel Paradis

Labelling the Nodes, Tips, and Edges of a Tree

Description

These functions add labels to or near the nodes, the tips, or theedges of a tree using text or plotting symbols. The text can beframed.

Usage

nodelabels(text, node, adj = c(0.5, 0.5), frame = "rect",           pch = NULL, thermo = NULL, pie = NULL, piecol = NULL,           col = "black", bg = "lightblue", horiz = FALSE,           width = NULL, height = NULL, ...)tiplabels(text, tip, adj = c(0.5, 0.5), frame = "rect",          pch = NULL, thermo = NULL, pie = NULL, piecol = NULL,          col = "black", bg = "yellow", horiz = FALSE,          width = NULL, height = NULL, offset = 0, ...)edgelabels(text, edge, adj = c(0.5, 0.5), frame = "rect",           pch = NULL, thermo = NULL, pie = NULL, piecol = NULL,           col = "black", bg = "lightgreen", horiz = FALSE,           width = NULL, height = NULL, date = NULL, ...)

Arguments

text

a vector of mode character giving the text to beprinted. Can be left empty.

node

a vector of mode numeric giving the numbers of the nodeswhere the text or the symbols are to be printed. Can be left empty.

tip

a vector of mode numeric giving the numbers of the tipswhere the text or the symbols are to be printed. Can be left empty.

edge

a vector of mode numeric giving the numbers of the edgeswhere the text or the symbols are to be printed. Can be left empty.

adj

one or two numeric values specifying the horizontal andvertical, respectively, justification of the text or symbols. Bydefault, the text is centered horizontally and vertically. If asingle value is given, this alters only the horizontal position ofthe text.

frame

a character string specifying the kind of frame to beprinted around the text. This must be one of "rect" (the default),"circle", "none", or any unambiguous abbreviation of these.

pch

a numeric giving the type of plotting symbol to be used;this is eventually recycled. Seepar for R'splotting symbols. Ifpch is used, thentext isignored.

thermo

a numeric vector giving some proportions (values between0 and 1) for each node, or a numeric matrix giving some proportions(the rows must sum to one). It can be a data frame which is thenconverted into a matrix.

pie

same thanthermo.

piecol

a list of colours (given as a character vector) to beused bythermo orpie; if leftNULL, a seriesof colours given by the functionrainbow is used.

col

a character string giving the color to be used for thetext or the plotting symbols; this is eventually recycled.

bg

a character string giving the color to be used for thebackground of the text frames or of the plotting symbols if itapplies; this is eventually recycled.

...

further arguments passed to thetext orpoints functions (e.g.cex to alter the size of thetext or the symbols, orfont for the text; see the examplesbelow).

horiz,width,height

parameters controlling the aspect ofthermometers; by default, their width and height are determinedautomatically.

offset

offset of the tip labels (can be negative).

date

specifies the positions of labels on edges of chronogramswith respect to the time scale.

Details

These three functions have the same optional arguments and the samefunctioning.

If the argumentstext is missing andpch andthermo are left asNULL, then the numbers of the nodes(or of the tips) are printed.

Ifnode,tip, oredge is missing, then the textor the symbols are printed on all nodes, tips, or edges.

The optioncex can be used to change the size of all types oflabels.

A simple call of these functions with no arguments (e.g.,nodelabels()) prints the numbers of all nodes (or tips).

In the case oftiplabels, it would be useful to play with theoptionsx.lim andlabel.offset (and possiblyshow.tip.label) ofplot.phylo in most cases (see theexamples).

Author(s)

Emmanuel Paradis, Ben Bolker, and Jim Lemon

Examples

tr <- read.tree(text = "((Homo,Pan),Gorilla);")plot(tr)nodelabels("7.3 Ma", 4, frame = "r", bg = "yellow", adj = 0)nodelabels("5.4 Ma", 5, frame = "c", bg = "tomato", font = 3)## A trick by Liam Revell when there are many categories:plot(tr, x.lim = c(-1, 4))nodelabels(node = 4, pie = matrix(rep(1, 100), 1), cex = 5)op <- par(fg = "transparent")nodelabels(node = 5, pie = matrix(rep(1, 100), 1), cex = 5)par(op)data(bird.orders)plot(bird.orders, use.edge.length = FALSE, font = 1)bs <- round(runif(22, 90, 100), 0) # some imaginary bootstrap valuesbs2 <- round(runif(22, 90, 100), 0)bs3 <- round(runif(22, 90, 100), 0)nodelabels(bs, adj = 1.2)nodelabels(bs2, adj = -0.2, bg = "yellow")### something more classicalplot(bird.orders, use.edge.length = FALSE, font = 1)nodelabels(bs, adj = -0.2, frame = "n", cex = 0.8)nodelabels(bs2, adj = c(1.2, 1), frame = "n", cex = 0.8)nodelabels(bs3, adj = c(1.2, -0.2), frame = "n", cex = 0.8)### the same but we play with the fontplot(bird.orders, use.edge.length = FALSE, font = 1)nodelabels(bs, adj = -0.2, frame = "n", cex = 0.8, font = 2)nodelabels(bs2, adj = c(1.2, 1), frame = "n", cex = 0.8, font = 3)nodelabels(bs3, adj = c(1.2, -0.2), frame = "n", cex = 0.8)plot(bird.orders, "c", use.edge.length = FALSE, font = 1)nodelabels(thermo = runif(22), cex = .8)plot(bird.orders, "u", FALSE, font = 1, lab4ut = "a")nodelabels(cex = .75, bg = "yellow")### representing two characters at the tips (you could have as many### as you want)plot(bird.orders, "c", FALSE, font = 1, label.offset = 3,     x.lim = 31, no.margin = TRUE)tiplabels(pch = 21, bg = gray(1:23/23), cex = 2, adj = 1.4)tiplabels(pch = 19, col = c("yellow", "red", "blue"), adj = 2.5, cex = 2)### This can be used to highlight tip labels:plot(bird.orders, font = 1)i <- c(1, 7, 18)tiplabels(bird.orders$tip.label[i], i, adj = 0)### Some random data to compare piecharts and thermometres:tr <- rtree(15)x <- runif(14, 0, 0.33)y <- runif(14, 0, 0.33)z <- runif(14, 0, 0.33)x <- cbind(x, y, z, 1 - x - y - z)layout(matrix(1:2, 1, 2))plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(pie = x, cex = 1.3)text(4.5, 15, "Are you \"pie\"...", font = 4, cex = 1.5)plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(thermo = x, col = rainbow(4), cex = 1.3)text(4.5, 15, "... or \"thermo\"?", font = 4, cex = 1.5)plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(thermo = x, col = rainbow(4), cex = 1.3)plot(tr, "c", FALSE, no.margin = TRUE)nodelabels(thermo = x, col = rainbow(4), width = 3, horiz = TRUE)layout(1)plot(tr, main = "Showing Edge Lengths")edgelabels(round(tr$edge.length, 3), srt = 90)plot(tr, "p", FALSE)edgelabels("above", adj = c(0.5, -0.25), bg = "yellow")edgelabels("below", adj = c(0.5, 1.25), bg = "lightblue")

Find Paths of Nodes

Description

This function finds paths of nodes in a tree. The nodes can beinternal and/or terminal (i.e., tips).

Usage

nodepath(phy, from = NULL, to = NULL)

Arguments

phy

an object of class"phylo".

from,to

integers giving node or tip numbers.

Details

By default, this function returns all the paths from the root to eachtip of the tree. If both argumentsfrom andto arespecified, the shortest path of nodes linking them is returned.

Value

a list of vectors of integers (by default), or a single vector ofintegers.

Author(s)

Emmanuel Paradis

Examples

tr <- rtree(2)nodepath(tr)nodepath(tr, 1, 2)

Test of host-parasite coevolution

Description

Functionparafit tests the hypothesis of coevolution between a clade of hosts and a clade of parasites. The null hypothesis (H0) of the global test is that the evolution of the two groups, as revealed by the two phylogenetic trees and the set of host-parasite association links, has been independent. Tests of individual host-parasite links are also available as an option.

The method, which is described in detail in Legendre et al. (2002), requires some estimates of the phylogenetic trees or phylogenetic distances, and also a description of the host-parasite associations (H-P links) observed in nature.

Usage

parafit(host.D, para.D, HP, nperm = 999, test.links = FALSE,        seed = NULL, correction = "none", silent = FALSE)

Arguments

host.D

A matrix of phylogenetic or patristic distances among the hosts (object class:matrix,data.frame ordist). A matrix of patristic distances exactly represents the information in a phylogenetic tree.

para.D

A matrix of phylogenetic or patristic distances among the parasites (object class:matrix,data.frame ordist). A matrix of patristic distances exactly represents the information in a phylogenetic tree.

HP

A rectangular matrix with hosts as rows and parasites as columns. The matrix contains 1's when a host-parasite link has been observed in nature between the host in the row and the parasite in the column, and 0's otherwise.

nperm

test.links

test.links = TRUE will test the significance of individual host-parasite links. Default:test.links = FALSE.

seed

seed = NULL (default): a seed is chosen at random by the function. That seed is used as the starting point for all tests of significance, i.e. the global H-P test and the tests of individual H-P links if they are requested. Users can select a seed of their choice by giving any integer value toseed, for exampleseed = -123456. Running the function again with the same seed value will produce the exact same test results.

correction

Correction methods for negative eigenvalues (details below):correction="lingoes" andcorrection="cailliez". Default value:"none".

silent

Informative messages and the time to compute the tests will not be written to theR console if silent=TRUE. Useful when the function is called by a numerical simulation function.

Details

Two types of test are produced by the program: a global test of coevolution and, optionally, a test on the individual host-parasite (H-P) link.

The function computes principal coordinates for the host and the parasite distance matrices. The principal coordinates (all of them) act as a complete representation of either the phylogenetic distance matrix or the phylogenetic tree.

Phylogenetic distance matrices are normally Euclidean. Patristic distance matrices are additive, thus they are metric and Euclidean. Euclidean matrices are fully represented by real-valued principal coordinate axes. For non-Euclidean matrices, negative eigenvalues are produced; complex principal coordinate axes are associated with the negative eigenvalues. So, the program rejects matrices that are not Euclidean and stops.

Negative eigenvalues can be corrected for by one of two methods: the Lingoes or the Caillez correction. It is up to the user to decide which correction method should be applied. This is done by selecting the optioncorrection="lingoes" orcorrection="cailliez". Details on these correction methods are given in the help file of thepcoa function.

The principle of the global test is the following (H0: independent evolution of the hosts and parasites): (1) Compute matrix D = C t(A) B. Note: D is a fourth-corner matrix (sensu Legendre et al. 1997), where A is the H-P link matrix, B is the matrix of principal coordinates computed from the host.D matrix, and C is the matrix of principal coordinates computed from the para.D matrix. (2) Compute the statistic ParaFitGlobal, the sum of squares of all values in matrix D. (3) Permute at random, separately, each row of matrix A, obtaining matrix A.perm. Compute D.perm = C

The test of each individual H-P link is carried out as follows (H0: this particular link is random): (1) Remove one link (k) from matrix A. (2) Compute matrix D = C t(A) B. (3a) Compute trace(k), the sum of squares of all values in matrix D. (3b) Compute the statistic ParaFitLink1 = (trace - trace(k)) where trace is the ParaFitGlobal statistic. (3c) Compute the statistic ParaFitLink2 = (trace - trace(k)) / (tracemax - trace) where tracemax is the maximum value that can be taken by trace. (4) Permute at random, separately, each row of matrix A, obtaining A.perm. Use the same sequences of permutations as were used in the test of ParaFitGlobal. Using the values of trace and trace.perm saved during the global test, compute the permuted values of the two statistics, ParaFit1.perm and ParaFit2.perm. (5) Repeat step 4 a large number of times. (6) Add the reference value of ParaFit1 to the distribution of ParaFit1.perm values; add the reference value of ParaFit2 to the distribution of ParaFit2.perm values. Calculate the permutational probabilities associated to ParaFit1 and ParaFit2.

Theprint.parafit function prints out the results of the global test and, optionally, the results of the tests of the individual host-parasite links.

Value

ParaFitGlobal

The statistic of the global H-P test.

p.global

The permutational p-value associated with the ParaFitGlobal statistic.

link.table

The results of the tests of individual H-P links, including the ParaFitLink1 and ParaFitLink2 statistics and the p-values obtained from their respective permutational tests.

para.per.host

Number of parasites per host.

host.per.para

Number of hosts per parasite.

nperm

Number of permutations for the tests.

Author(s)

Pierre Legendre, Universite de Montreal

References

Hafner, M. S, P. D. Sudman, F. X. Villablanca, T. A. Spradling, J. W. Demastes and S. A. Nadler. 1994. Disparate rates of molecular evolution in cospeciating hosts and parasites.Science,265, 1087–1090.

Legendre, P., Y. Desdevises and E. Bazin. 2002. A statistical test for host-parasite coevolution.Systematic Biology,51(2), 217–234.

Examples

## Gopher and lice data from Hafner et al. (1994)data(gopher.D)data(lice.D)data(HP.links)res <- parafit(gopher.D, lice.D, HP.links, nperm=99, test.links=TRUE)# res     # or else: print(res)

Principal Coordinate Analysis

Description

Functionpcoa computes principal coordinate decomposition(also called classical scaling) of a distance matrix D (Gower 1966). Itimplements two correction methods for negative eigenvalues.

Usage

pcoa(D, correction="none", rn=NULL)## S3 method for class 'pcoa'biplot(x, Y=NULL, plot.axes = c(1,2), dir.axis1=1,       dir.axis2=1, rn=NULL, main=NULL, ...)

Arguments

D

A distance matrix of classdist ormatrix.

correction

Correction methods for negative eigenvalues (detailsbelow):"lingoes" and"cailliez". Default value:"none".

rn

An optional vector of row names, of length n, for the nobjects.

x

Output object frompcoa.

Y

Any rectangular data table containing explanatory variablesto be projected onto the ordination plot. That table may contain,for example, the community composition data used to compute D, orany transformation of these data; see examples.

plot.axes

The two PCoA axes to plot.

dir.axis1

= -1 to revert axis 1 for the projection of pointsand variables. Default value: +1.

dir.axis2

= -1 to revert axis 2 for the projection of pointsand variables. Default value: +1.

main

An optional title.

...

Other graphical arguments passed to function.

Details

This function implements two methods for correcting for negativevalues in principal coordinate analysis (PCoA). Negative eigenvaluescan be produced in PCoA when decomposing distance matrices produced bycoefficients that are not Euclidean (Gower and Legendre 1986,Legendreand Legendre 1998).

Inpcoa, when negative eigenvalues are present in thedecomposition results, the distance matrix D can be modified usingeither the Lingoes or the Cailliez procedure to produce resultswithout negative eigenvalues.

In the Lingoes (1971) procedure, a constant c1, equal to twiceabsolute value of the largest negative value of the original principalcoordinate analysis, is added to each original squared distance in thedistance matrix, except the diagonal values. A newe principalcoordinate analysis, performed on the modified distances, has at most(n-2) positive eigenvalues, at least 2 null eigenvalues, and nonegative eigenvalue.

In the Cailliez (1983) procedure, a constant c2 is added to theoriginal distances in the distance matrix, except the diagonalvalues. The calculation of c2 is described in Legendre and Legendre(1998). A new principal coordinate analysis, performed on the modifieddistances, has at most (n-2) positive eigenvalues, at least 2 nulleigenvalues, and no negative eigenvalue.

In all cases, only the eigenvectors corresponding to positiveeigenvalues are shown in the output list. The eigenvectors are scaledto the square root of the corresponding eigenvalues. Gower (1966) hasshown that eigenvectors scaled in that way preserve the originaldistance (in the D matrix) among the objects. These eigenvectors canbe used to plot ordination graphs of the objects.

We recommend not to use PCoA to produce ordinations from the chord,chi-square, abundance profile, or Hellinger distances. It is easier tofirst transform the community composition data using the followingtransformations, available in thedecostand function of thevegan package, and then carry out a principal componentanalysis (PCA) on the transformed data:

Chord transformation: decostand(spiders,"normalize")
Transformation to relative abundance profiles:decostand(spiders,"total")
Hellinger transformation: decostand(spiders,"hellinger")
Chi-square transformation: decostand(spiders,"chi.square")

The ordination results will be identical and the calculationsshorter. This two-step ordination method, called transformation-basedPCA (tb-PCA), was described by Legendre and Gallagher (2001).

Thebiplot.pcoa function produces plots for any pair ofprincipal coordinates. The original variables can be projected ontothe ordination plot.

Value

correction

The values of parametercorrection andvariable 'correct' in the function.

note

A note describing the type of correction done, if any.

values

The eigenvalues and related information:

Eigenvalues

All eigenvalues (positive, null, negative).

Relative_eig

Relative eigenvalues.

Corr_eig

Corrected eigenvalues (Lingoes correction); Legendreand Legendre (1998, p. 438, eq. 9.27).

Rel_corr_eig

Relative eigenvalues after Lingoes or Cailliezcorrection.

Broken_stick

Expected fractions of variance under the brokenstick model.

Cumul_eig

Cumulative relative eigenvalues.

Cum_corr_eig

Cumulative corrected relative eigenvalues.

Cumul_br_stick

Cumulative broken stick fractions.

vectors

The principal coordinates with positive eigenvalues.

trace

The trace of the distance matrix. This is also the sum ofall eigenvalues, positive and negative.

vectors.cor

The principal coordinates with positiveeigenvalues from the distance matrix corrected using the methodspecified by parametercorrection.

trace.cor

The trace of the corrected distance matrix. This isalso the sum of its eigenvalues.

Author(s)

Pierre Legendre, Universite de Montreal

References

Cailliez, F. (1983) The analytical solution of the additive constantproblem.Psychometrika,48, 305–308.

Gower, J. C. (1966) Some distance properties of latent root and vectormethods used in multivariate analysis.Biometrika,53,325–338.

Gower, J. C. and Legendre, P. (1986) Metric and Euclidean propertiesof dissimilarity coefficients.Journal of Classification,3, 5–48.

Legendre, P. and Gallagher, E. D. (2001) Ecologically meaningfultransformations for ordination of species data.Oecologia,129, 271–280.

Legendre, P. and Legendre, L. (1998)Numerical Ecology, 2ndEnglish edition. Amsterdam: Elsevier Science BV.

Lingoes, J. C. (1971) Some boundary conditions for a monotone analysisof symmetric matrices.Psychometrika,36, 195–203.

Examples

## Oribatid mite data from Borcard and Legendre (1994)## Not run: if (require(vegan)) {data(mite) # Community composition data, 70 peat cores, 35 species## Select rows 1:30. Species 35 is absent from these rows. Transform to logmite.log <- log(mite[1:30, -35] + 1)  # Equivalent: log1p(mite[1:30, -35])## Principal coordinate analysis and simple ordination plotmite.D <- vegdist(mite.log, "bray")res <- pcoa(mite.D)res$valuesbiplot(res)## Project unstandardized and standardized species on the PCoA ordination plotmite.log.st = apply(mite.log, 2, scale, center=TRUE, scale=TRUE)par(mfrow=c(1,2))biplot(res, mite.log)biplot(res, mite.log.st)# Reverse the ordination axes in the  plotpar(mfrow=c(1,2))biplot(res, mite.log, dir.axis1=-1, dir.axis2=-1)biplot(res, mite.log.st, dir.axis1=-1, dir.axis2=-1)}## End(Not run)

Tree Annotation

Description

phydataplot plots data on a tree in a way that adapts to thetype of tree.ring does the same for circular trees.

Both functions match the data with the labels of the tree.

Usage

phydataplot(x, phy, style = "bars", offset = 1, scaling = 1,            continuous = FALSE, width = NULL, legend = "below",            funcol = rainbow, ...)ring(x, phy, style = "ring", offset = 1, ...)

Arguments

x

a vector, a factor, a matrix, or a data frame.

phy

the tree (which must be already plotted).

style

a character string specifying the type of graphics; canbe abbreviated (see details).

offset

the space between the tips of the tree and the plot.

scaling

the scaling factor to apply to the data.

continuous

(used if style="mosaic") a logical specifyingwhether to treat the values inx as continuous or not; can bean integer value giving the number of categories.

width

(used if style = "mosaic") the width of the cells; bydefault, all the available space is used.

legend

(used if style = "mosaic") the place where to draw thelegend; one of"below" (the default),"side", or"none", or an unambiguous abbreviation of these.

funcol

(used if style = "mosaic") the function used to generatethe colours (see details and examples).

...

further arguments passed to the graphical functions.

Details

The possible values forstyle are “bars”, “segments”,“image”, “arrows”, “boxplot”, “dotchart”, or “mosaic” forphydataplot, and “ring”, “segments”, or “arrows” forring.

style = "image" works only with square matrices (e.g.,similarities). If you want to plot a DNA alignment in the same waythanimage.DNAbin, trystyle = "mosaic".

style = "mosaic" can plot any kind of matrices, possibly afterdiscretizing its values (usingcontinuous). The default colourpalette is taken from the functionrainbow.If you want to use specified colours, a function simply returning thevector of colours must be used, possibly with names if you want toassign a specific colour to each value (see examples).

Note

For the moment, only rightwards trees are supported (does not apply tocircular trees).

Author(s)

Emmanuel Paradis

Examples

## demonstrates matching with names:tr <- rcoal(n <- 10)x <- 1:nnames(x) <- tr$tip.labelplot(tr, x.lim = 11)phydataplot(x, tr)## shuffle x but matching names with tip labels reorders them:phydataplot(sample(x), tr, "s", lwd = 3, lty = 3)## adapts to the tree:plot(tr, "f", x.l = c(-11, 11), y.l = c(-11, 11))phydataplot(x, tr, "s")## leave more space with x.lim to show a barplot and a dotchart:plot(tr, x.lim = 22)phydataplot(x, tr, col = "yellow")phydataplot(x, tr, "d", offset = 13)ts <- rcoal(N <- 100)X <- rTraitCont(ts) # names are setdd <- dist(X)op <- par(mar = rep(0, 4))plot(ts, x.lim = 10, cex = 0.4, font = 1)phydataplot(as.matrix(dd), ts, "i", offset = 0.2)par(xpd = TRUE, mar = op$mar)co <- c("blue", "red"); l <- c(-2, 2)X <- X + abs(min(X)) # move scale so X >= 0plot(ts, "f", show.tip.label = FALSE, x.lim = l, y.lim = l, open.angle = 30)phydataplot(X, ts, "s", col = co, offset = 0.05)ring(X, ts, "ring", col = co, offset = max(X) + 0.1) # the same info as a ring## as many rings as you want...co <- c("blue", "yellow")plot(ts, "r", show.tip.label = FALSE, x.l = c(-1, 1), y.l = c(-1, 1))for (o in seq(0, 0.4, 0.2)) {    co <- rev(co)    ring(0.2, ts, "r", col = rep(co, each = 5), offset = o)}lim <- c(-5, 5)co <- rgb(0, 0.4, 1, alpha = 0.1)y <- seq(0.01, 1, 0.01)plot(ts, "f", x.lim = lim, y.lim = lim, show.tip.label = FALSE)ring(y, ts, offset = 0, col = co, lwd = 0.1)for (i in 1:3) {    y <- y + 1    ring(y, ts, offset = 0, col = co, lwd = 0.1)}## rings can be in the backgroundplot(ts, "r", plot = FALSE)ring(1, ts, "r", col = rainbow(100), offset = -1)par(new = TRUE)plot(ts, "r", font = 1, edge.color = "white")## might be more useful:co <- c("lightblue", "yellow")plot(ts, "r", plot = FALSE)ring(0.1, ts, "r", col = sample(co, size = N, rep = TRUE), offset = -.1)par(new = TRUE)plot(ts, "r", font = 1)## if x is matrix:tx <- rcoal(m <- 20)X <- runif(m, 0, 0.5); Y <- runif(m, 0, 0.5)X <- cbind(X, Y, 1 - X - Y)rownames(X) <- tx$tip.labelplot(tx, x.lim = 6)co <- rgb(diag(3))phydataplot(X, tx, col = co)## a variation:plot(tx, show.tip.label = FALSE, x.lim = 5)phydataplot(X, tx, col = co, offset = 0.05, border = NA)plot(tx, "f", show.tip.label = FALSE, open.angle = 180)ring(X, tx, col = co, offset = 0.05)Z <- matrix(rnorm(m * 5), m)rownames(Z) <- rownames(X)plot(tx, x.lim = 5)phydataplot(Z, tx, "bo", scaling = .5, offset = 0.5,            boxfill = c("gold", "skyblue"))## plot an alignment with a NJ tree:data(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw, x.lim = 0.1, align.tip = TRUE, font = 1)phydataplot(woodmouse[, 1:50], trw, "m", 0.02, border = NA)## use type = "mosaic" on a 30x5 matrix:tr <- rtree(n <- 30)p <- 5x <- matrix(sample(3, size = n*p, replace = TRUE), n, p)dimnames(x) <- list(paste0("t", 1:n), LETTERS[1:p])plot(tr, x.lim = 35, align.tip = TRUE, adj = 1)phydataplot(x, tr, "m", 2)## change the aspect:plot(tr, x.lim = 35, align.tip = TRUE, adj = 1)phydataplot(x, tr, "m", 2, width = 2, border = "white", lwd = 3, legend = "side")## user-defined colour:f <- function(n) c("yellow", "blue", "red")phydataplot(x, tr, "m", 18, width = 2, border = "white", lwd = 3,            legend = "side", funcol = f)## alternative colour function...:## fb <- function(n) c("3" = "red", "2" = "blue", "1" = "yellow")## ... but since the values are sorted alphabetically,## both f and fb will produce the same plot.## use continuous = TRUE with two different scales:x[] <- 1:(n*p)plot(tr, x.lim = 35, align.tip = TRUE, adj = 1)phydataplot(x, tr, "m", 2, width = 1.5, continuous = TRUE, legend = "side",            funcol = colorRampPalette(c("white", "darkgreen")))phydataplot(x, tr, "m", 18, width = 1.5, continuous = 5, legend = "side",            funcol = topo.colors)

Fits a Bunch of Models with PhyML

Description

This function calls PhyML and fits successively 28 models of DNAevolution. The results are saved on disk, as PhyML usually does, andreturned inR as a vector with the log-likelihood value of each model.

Usage

phymltest(seqfile, format = "interleaved", itree = NULL,          exclude = NULL, execname = NULL, append = TRUE)## S3 method for class 'phymltest'print(x, ...)## S3 method for class 'phymltest'summary(object, ...)## S3 method for class 'phymltest'plot(x, main = NULL, col = "blue", ...)

Arguments

seqfile

a character string giving the name of the file thatcontains the DNA sequences to be analysed by PhyML.

format

a character string specifying the format of the DNAsequences: either"interleaved" (the default), or"sequential".

itree

a character string giving the name of a file with a treein Newick format to be used as an initial tree by PhyML. IfNULL (the default), PhyML uses a “BIONJ” tree.

exclude

a vector of mode character giving the models to beexcluded from the analysis. These must be among those below, andfollow the same syntax.

execname

a character string specifying the name of the PhyMLexecutable. This argument can be left asNULL if PhyML'sdefault names are used:"phyml_3.0_linux32","phyml_3.0_macintel", or"phyml_3.0_win32.exe", underLinux, MacOS, or Windows respectively.

append

a logical indicating whether to erase previous PhyMLoutput files if present; the default is to not erase.

x

an object of class"phymltest".

object

an object of class"phymltest".

main

a title for the plot; if leftNULL, a title is madewith the name of the object (usemain = "" to have notitle).

col

a colour used for the segments showing the AIC values (blueby default).

...

further arguments passed to or from other methods.

Details

The present function requires version 3.0.1 of PhyML; it won't work witholder versions.

The user must take care to set correctly the three different pathsinvolved here: the path to PhyML's binary, the path to the sequencefile, and the path to R's working directory. The function should workif all three paths are different. Obviously, there should be no problemif they are all the same.

The following syntax is used for the models:

"X[Y][Z]00[+I][+G]"

where "X" is the first letter of the author of the model, "Y" and "Z"are possibly other co-authors of the model, "00" is the year of thepublication of the model, and "+I" and "+G" indicates whether thepresence of invariant sites and/or a gamma distribution ofsubstitution rates have been specified. Thus, Kimura's model isdenoted "K80" and not "K2P". The exception to this rule is the generaltime-reversible model which is simply denoted "GTR" model.

The seven substitution models used are: "JC69", "K80", "F81", "F84","HKY85", "TN93", and "GTR". These models are then altered by addingthe "+I" and/or "+G", resulting thus in four variants for each of them(e.g., "JC69", "JC69+I", "JC69+G", "JC69+I+G"). Some of these modelsare described in the help page ofdist.dna.

When a gamma distribution of substitution rates is specified, fourcategories are used (which is PhyML's default behaviour), and the“alpha” parameter is estimated from the data.

For the models with a different substition rate for transitions andtransversions, these rates are left free and estimated from the data(and not constrained with a ratio of 4 as in PhyML's default).

The optionpath2exec has been removed in the present version:the path to PhyML's executable can be specified with the optionexecname.

Value

phymltest returns an object of class"phymltest": anumeric vector with the models as names.

Theprint method prints an object of class"phymltest"as matrix with the name of the models, the number of free parameters,the log-likelihood value, and the value of the Akaike informationcriterion (AIC = -2 * loglik + 2 * number of free parameters)

Thesummary method prints all the possible likelihood ratiotests for an object of class"phymltest".

Theplot method plots the values of AIC of an object of class"phymltest" on a vertical scale.

Note

It is important to note that the models fitted by this function isonly a small fraction of the models possible with PhyML. For instance,it is possible to vary the number of categories in the (discretized)gamma distribution of substitution rates, and many parameters can befixed by the user. The results from the present function should ratherbe taken as indicative of a best model.

Author(s)

Emmanuel Paradis

References

Posada, D. and Crandall, K. A. (2001) Selecting the best-fit model ofnucleotide substitution.Systematic Biology,50,580–601.

Guindon, S. and Gascuel, O. (2003) A simple, fast, and accuratealgorithm to estimate large phylogenies by maximum likelihood.Systematic Biology,52, 696–704.http://www.atgc-montpellier.fr/phyml/

Examples

### A `fake' example with random likelihood values: it does not### make sense, but does not need PhyML and gives you a flavour### of what the output looks like:x <- runif(28, -100, -50)names(x) <- ape:::.phymltest.modelclass(x) <- "phymltest"xsummary(x)plot(x)plot(x, main = "", col = "red")### This example needs PhyML, copy/paste or type the### following commands if you want to try them, eventually### changing setwd() and the options of phymltest()## Not run: setwd("D:/phyml_v2.4/exe") # under Windowsdata(woodmouse)write.dna(woodmouse, "woodmouse.txt")X <- phymltest("woodmouse.txt")Xsummary(X)plot(X)## End(Not run)

Phylogenetically Independent Contrasts

Description

Compute the phylogenetically independent contrasts using the methoddescribed by Felsenstein (1985).

Usage

pic(x, phy, scaled = TRUE, var.contrasts = FALSE,    rescaled.tree = FALSE)

Arguments

x

a numeric vector.

phy

an object of class"phylo".

scaled

logical, indicates whether the contrasts should bescaled with their expected variances (default toTRUE).

var.contrasts

logical, indicates whether the expectedvariances of the contrasts should be returned (default toFALSE).

rescaled.tree

logical, ifTRUE the rescaled tree isreturned together with the main results.

Details

Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.

The user must be careful here since the function requires that bothseries of names perfectly match. If both series of names do not match,the values in thex are taken to be in the same order than thetip labels ofphy, and a warning message is issued.

Value

either a vector of phylogenetically independent contrasts (ifvar.contrasts = FALSE), or a two-column matrix with thephylogenetically independent contrasts in the first column and theirexpected variance in the second column (ifvar.contrasts = TRUE). If the tree has node labels, these are used as labels of thereturned object.

Ifrescaled.tree = TRUE, a list is returned with two elementsnamed “contr” with the above results and “rescaled.tree” with thetree and its rescaled branch lengths (see Felsenstein 1985).

Author(s)

Emmanuel Paradis

References

Felsenstein, J. (1985) Phylogenies and the comparative method.American Naturalist,125, 1–15.

Examples

### The example in Phylip 3.5c (originally from Lynch 1991)x <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"tree.primates <- read.tree(text = x)X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)names(X) <- names(Y) <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")pic.X <- pic(X, tree.primates)pic.Y <- pic(Y, tree.primates)cor.test(pic.X, pic.Y)lm(pic.Y ~ pic.X - 1) # both regressionslm(pic.X ~ pic.Y - 1) # through the origin

Phylogenetically Independent Orthonormal Contrasts

Description

This function computes the orthonormal contrasts using the methoddescribed by Felsenstein (2008). Only a single trait can be analyzed;there can be several observations per species.

Usage

pic.ortho(x, phy, var.contrasts = FALSE, intra = FALSE)

Arguments

x

a numeric vector or a list of numeric vectors.

phy

an object of class"phylo".

var.contrasts

logical, indicates whether the expectedvariances of the contrasts should be returned (default toFALSE).

intra

logical, whether to return the intraspecific contrasts.

Details

The datax can be in two forms: a vector if there is a singleobservation for each species, or a list whose elements are vectorscontaining the individual observations for each species. These vectorsmay be of different lengths.

Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.

Value

either a vector of contrasts, or a two-column matrix with thecontrasts in the first column and their expected variances in thesecond column (ifvar.contrasts = TRUE). If the tree has nodelabels, these are used as labels of the returned object.

Ifintra = TRUE, the attribute"intra", a list ofvectors with the intraspecific contrasts orNULL for thespecies with a one observation, is attached to the returned object.

Author(s)

Emmanuel Paradis

References

Felsenstein, J. (2008) Comparative methods with sampling error andwithin-species variation: Contrasts revisited and revised.American Naturalist,171, 713–725.

Examples

tr <- rcoal(30)### a single observation per species:x <- rTraitCont(tr)pic.ortho(x, tr)pic.ortho(x, tr, TRUE)### different number of observations per species:x <- lapply(sample(1:5, 30, TRUE), rnorm)pic.ortho(x, tr, intra = TRUE)

Plot a Correlogram

Description

These functions plot correlagrams previously computed withcorrelogram.formula.

Usage

  ## S3 method for class 'correlogram'plot(x, legend = TRUE, test.level = 0.05,                col = c("grey", "red"), type = "b", xlab = "",                ylab = "Moran's I", pch = 21, cex = 2, ...)  ## S3 method for class 'correlogramList'plot(x, lattice = TRUE, legend = TRUE,                test.level = 0.05, col = c("grey", "red"),                xlab = "", ylab = "Moran's I",                type = "b", pch = 21, cex = 2, ...)

Arguments

x

an object of class"correlogram" or of class"correlogramList" (both produced bycorrelogram.formula).

legend

should a legend be added on the plot?

test.level

the level used to discriminate the plotting symbolswith colours considering the P-values.

col

two colours for the plotting symbols: the first one is usedif the P-value is greater than or equal totest.level, thesecond one otherwise.

type

the type of plot to produce (seeplot for possible choices).

xlab

an optional character string for the label on the x-axis(none by default).

ylab

the default label on the y-axis.

pch

the type of plotting symbol.

cex

the default size for the plotting symbols.

lattice

when plotting several correlograms, should they beplotted in trellis-style with lattice (the default), or together onthe same plot?

...

other parameters passed to theplot orlinesfunction.

Details

When plotting several correlograms with lattice, some options have noeffect:legend,type, andpch (pch=19 isalways used in this situation).

When usingpch between 1 and 20 (i.e., non-filled symbols, thecolours specified incol are also used for the lines joiningthe points. To keep black lines, it is better to leavepchbetween 21 and 25.

Author(s)

Emmanuel Paradis

Plot Phylogenies

Description

These functions plot phylogenetic trees.

Usage

## S3 method for class 'phylo'plot(x, type = "phylogram", use.edge.length = TRUE,     node.pos = NULL, show.tip.label = TRUE,     show.node.label = FALSE, edge.color = NULL, edge.width     = NULL, edge.lty = NULL, node.color = NULL, node.width     = NULL, node.lty = NULL, font = 3, cex = par("cex"),     adj = NULL, srt = 0, no.margin = FALSE, root.edge =     FALSE, label.offset = 0, underscore = FALSE, x.lim =     NULL, y.lim = NULL, direction = "rightwards", lab4ut =     NULL, tip.color = par("col"), plot = TRUE, rotate.tree     = 0, open.angle = 0, node.depth = 1, align.tip.label =     FALSE, ...)## S3 method for class 'multiPhylo'plot(x, layout = 1, ...)

Arguments

x

an object of class"phylo" or of class"multiPhylo".

type

a character string specifying the type of phylogeny to bedrawn; it must be one of "phylogram" (the default), "cladogram","fan", "unrooted", "radial", "tidy", or any unambiguous abbreviationof these.

use.edge.length

a logical indicating whether to use the edgelengths of the phylogeny to draw the branches (the default) or not(ifFALSE). This option has no effect if the object of class"phylo" has no ‘edge.length’ element.

node.pos

a numeric taking the value 1 or 2 which specifies thevertical position of the nodes with respect to their descendants. IfNULL (the default), then the value is determined in relationto ‘type’ and ‘use.edge.length’ (see details).

show.tip.label

a logical indicating whether to show the tiplabels on the phylogeny (defaults toTRUE, i.e. the labelsare shown).

show.node.label

a logical indicating whether to show the nodelabels on the phylogeny (defaults toFALSE, i.e. the labelsare not shown).

edge.color

a vector of mode character giving the colours usedto draw the branches of the plotted phylogeny. These are taken to bein the same order than the componentedge ofphy. Iffewer colours are given than the length ofedge, then thecolours are recycled.

edge.width

a numeric vector giving the width of the branches ofthe plotted phylogeny. These are taken to be in the same order thanthe componentedge ofphy. If fewer widths are giventhan the length ofedge, then these are recycled.

edge.lty

same as the previous argument but for line types;1: plain, 2: dashed, 3: dotted, 4: dotdash, 5: longdash, 6: twodash.

node.color

a vector of mode character giving the colours usedto draw the perpendicular lines associated with each node of theplotted phylogeny. These are taken to bein the same order than the componentnode ofphy. Iffewer colours are given than the length ofnode, then thecolours are recycled.

node.width

as the previous argument, but for line widths.

node.lty

as the previous argument, but for line types;1: plain, 2: dashed, 3: dotted, 4: dotdash, 5: longdash, 6: twodash.

font

an integer specifying the type of font for the labels: 1(plain text), 2 (bold), 3 (italic, the default), or 4 (bolditalic).

cex

a numeric value giving the factor scaling of the tip andnode labels (Character EXpansion). The default is to take thecurrent value from the graphical parameters.

adj

a numeric specifying the justification of the text stringsof the labels: 0 (left-justification), 0.5 (centering), or 1(right-justification). This option has no effect iftype = "unrooted". IfNULL (the default) the value is set withrespect ofdirection (see details).

srt

a numeric giving how much the labels are rotated in degrees(negative values are allowed resulting in clock-like rotation); thevalue has an effect respectively to the value ofdirection (see Examples). This option has no effect iftype = "unrooted".

no.margin

a logical. IfTRUE, the margins are set tozero and the plot uses all the space of the device (note that thiswas the behaviour ofplot.phylo up to version 0.2-1 of ‘ape’with no way to modify it by the user, at least easily).

root.edge

a logical indicating whether to draw the root edge(defaults to FALSE); this has no effect if ‘use.edge.length = FALSE’or if ‘type = "unrooted"’.

label.offset

a numeric giving the space between the nodes andthe tips of the phylogeny and their corresponding labels. Thisoption has no effect iftype = "unrooted".

underscore

a logical specifying whether the underscores in tiplabels should be written as spaces (the default) or left as are (ifTRUE).

x.lim

a numeric vector of length one or two giving the limit(s)of the x-axis. IfNULL, this is computed with respect tovarious parameters such as the string lengths of the labels and thebranch lengths. If a single value is given, this is taken as theupper limit.

y.lim

same than above for the y-axis.

direction

a character string specifying the direction of thetree. Four values are possible: "rightwards" (the default),"leftwards", "upwards", and "downwards".

lab4ut

(= labels for unrooted trees) a character stringspecifying the display of tip labels for unrooted trees (can beabbreviated): either"horizontal" where all labels arehorizontal (the default iftype = "u"), or"axial"where the labels are displayed in the axis of the correspondingterminal branches. This option has an effect iftype = "u","f", or"r".

tip.color

the colours used for the tip labels, eventuallyrecycled (see examples).

plot

a logical controlling whether to draw the tree. IfFALSE, the graphical device is set as if the tree wasplotted, and the coordinates are saved as well.

rotate.tree

for "fan", "unrooted", or "radial" trees: therotation of the whole tree in degrees (negative values areaccepted).

open.angle

iftype = "f" or"r", the angle indegrees left blank. Use a non-zero value if you want to callaxisPhylo after the tree is plotted.

node.depth

an integer value (1 or 2) used if branch lengths arenot used to plot the tree; 1: the node depths are proportional tothe number of tips descending from each node (the default and was theonly possibility previously), 2: they are evenly spaced.

align.tip.label

a logical value or an integer. IfTRUE,the tips are aligned and dotted lines are drawn between the tips ofthe tree and the labels. If an integer, the tips are aligned andthis gives the type of the lines (lty).

layout

the number of trees to be plotted simultaneously.

...

further arguments to be passed toplot or toplot.phylo.

Details

Ifx is a list of trees (i.e., an object of class"multiPhylo"), then any further argument may be passed with... and could be any one of those listed above for a singletree.

The font format of the labels of the nodes and the tips is the same.

Ifno.margin = TRUE, the margins are set to zero and are notrestored after plotting the tree, so that the user can access thecoordinates system of the plot.

The option ‘node.pos’ allows the user to alter the vertical position(i.e., ordinates) of the nodes. Ifnode.pos = 1, then theordinate of a node is the mean of the ordinates of its directdescendants (nodes and/or tips). Ifnode.pos = 2, then theordinate of a node is the mean of the ordinates of all the tips ofwhich it is the ancestor. Ifnode.pos = NULL (the default),then its value is determined with respect to other options: iftype = "phylogram" then ‘node.pos = 1’; iftype = "cladogram" anduse.edge.length = FALSE then ‘node.pos = 2’;iftype = "cladogram" anduse.edge.length = TRUE then‘node.pos = 1’. Remember that in this last situation, the branchlengths make sense when projected on the x-axis.

Ifadj is not specified, then the value is determined withrespect todirection: ifdirection = "leftwards" thenadj = 1 (0 otherwise).

If the argumentsx.lim andy.lim are not specified by theuser, they are determined roughly by the function. This may not alwaysgive a nice result: the user may check these values with the(invisibly) returned list (see “Value:”).

If you usealign.tip.label = TRUE withtype = "fan", youwill have certainly to setx.lim andy.lim manually.

If you resize manually the graphical device (windows or X11) you mayneed to replot the tree.

Value

plot.phylo returns invisibly a list with the followingcomponents which values are those used for the current plot:

type

use.edge.length

node.pos

node.depth

show.tip.label

show.node.label

font

cex

adj

srt

no.margin

label.offset

x.lim

y.lim

direction

tip.color

Ntip

Nnode

root.time

align.tip.label

Note

The argumentasp cannot be passed with....

Author(s)

Emmanuel Paradis, Martin Smith, Damien de Vienne

References

van der Ploeg, A. (2014) Drawing non-layered tidy trees in lineartime.Journal of Software: Practice and Experience,44,1467–1484.

Examples

### An extract from Sibley and Ahlquist (1990)x <- "(((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3):6.3,Tyto_alba:13.5);"tree.owls <- read.tree(text= x)plot(tree.owls)### Show the types of trees.layout(matrix(1:6, 3, 2))plot(tree.owls, main = "With branch lengths")plot(tree.owls, type = "c")plot(tree.owls, type = "u")plot(tree.owls, use.edge.length = FALSE, main = "Without branch lengths")plot(tree.owls, type = "c", use.edge.length = FALSE)plot(tree.owls, type = "u", use.edge.length = FALSE)layout(1)data(bird.orders)### using random colours and thicknessplot(bird.orders,     edge.color = sample(colors(), length(bird.orders$edge)/2),     edge.width = sample(1:10, length(bird.orders$edge)/2, replace = TRUE))title("Random colours and branch thickness")### rainbow colouring...X <- c("red", "orange", "yellow", "green", "blue", "purple")plot(bird.orders,     edge.color = sample(X, length(bird.orders$edge)/2, replace = TRUE),     edge.width = sample(1:10, length(bird.orders$edge)/2, replace = TRUE))title("Rainbow colouring")plot(bird.orders, type = "c", use.edge.length = FALSE,     edge.color = sample(X, length(bird.orders$edge)/2, replace = TRUE),     edge.width = rep(5, length(bird.orders$edge)/2))segments(rep(0, 6), 6.5:1.5, rep(2, 6), 6.5:1.5, lwd = 5, col = X)text(rep(2.5, 6), 6.5:1.5, paste(X, "..."), adj = 0)title("Character mapping...")plot(bird.orders, "u", font = 1, cex = 0.75)data(bird.families)plot(bird.families, "u", lab4ut = "axial", font = 1, cex = 0.5)plot(bird.families, "r", font = 1, cex = 0.5)### cladogram with oblique tip labelsplot(bird.orders, "c", FALSE, direction = "u", srt = -40, x.lim = 25.5)### facing trees with different informations...tr <- bird.orderstr$tip.label <- rep("", 23)layout(matrix(1:2, 1, 2), c(5, 4))plot(bird.orders, "c", FALSE, adj = 0.5, no.margin = TRUE, label.offset = 0.8,     edge.color = sample(X, length(bird.orders$edge)/2, replace = TRUE),     edge.width = rep(5, length(bird.orders$edge)/2))text(7.5, 23, "Facing trees with\ndifferent informations", font = 2)plot(tr, "p", direction = "l", no.margin = TRUE,     edge.width = sample(1:10, length(bird.orders$edge)/2, replace = TRUE))### Recycling of arguments gives a lot of possibilities### for tip labels:plot(bird.orders, tip.col = c(rep("red", 5), rep("blue", 18)),     font = c(rep(3, 5), rep(2, 17), 1))plot(bird.orders, tip.col = c("blue", "green"),     cex = 23:1/23 + .3, font = 1:3)co <- c(rep("blue", 9), rep("green", 35))plot(bird.orders, "f", edge.col = co)plot(bird.orders, edge.col = co)layout(1)## tidy treestr <- rtree(100)layout(matrix(1:2, 2))plot(tr)axis(2)plot(tr, "t")axis(2)## around 20 percent gain on the y-axis

Extra Fuctions to Plot and Annotate Phylogenies

Description

These are extra functions to plot and annotate phylogenies, mostlycalling basic graphical functions inape.

Usage

plotBreakLongEdges(phy, n = 1, ...)drawSupportOnEdges(value, ...)

Arguments

phy

an object of class"phylo".

n

the numner of long branches to be broken.

value

the values to be printed on the internal branches of the tree.

...

further arguments to be passed toplot.phylo or toedgelabels.

Details

drawSupportOnEdges assumes the tree is unrooted, so the vectorvalue should have as many values than the number of internalbranches (= number of nodes - 1). If there is one additional value, itis assumed that it relates to the root node and is dropped (see examples).

Value

NULL

Author(s)

Emmanuel Paradis

Examples

tr <- rtree(10)tr$edge.length[c(1, 18)] <- 100op <- par(mfcol = 1:2)plot(tr); axisPhylo()plotBreakLongEdges(tr, 2); axisPhylo()## from ?boot.phylo:f <- function(x) nj(dist.dna(x))data(woodmouse)tw <- f(woodmouse) # NJ tree with K80 distanceset.seed(1)## bootstrap with 100 replications:(bp <- boot.phylo(tw, woodmouse, f, quiet = TRUE))## the first value relates to the root node and is always 100## it is ignored below:plot(tw, "u")drawSupportOnEdges(bp)## more readable but the tree is really unrooted:plot(tw)drawSupportOnEdges(bp)par(op)

Plot Variance Components

Description

Plot previously estimated variance components.

Usage

## S3 method for class 'varcomp'plot(x, xlab = "Levels", ylab = "Variance", type = "b", ...)

Arguments

x

Avarcomp object

xlab

x axis label

ylab

y axis label

type

plot type ("l", "p" or "b", seeplot)

...

Further argument sent to thexyplot function.

Value

The same asxyplot.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

Plot Tree With Time Axis

Description

This function plots a non-ultrametric tree where the tips are notcontemporary together with their dates on the x-axis.

Usage

plotTreeTime(phy, tip.dates, show.tip.label = FALSE, y.lim = NULL,             color = TRUE, ...)

Arguments

phy

an object of class"phylo".

tip.dates

a vector of the same length than the number of tipsinphy (see details).

show.tip.label

a logical value; seeplot.phylo.

y.lim

by default, one fifth of the plot is left below the tree;use this option to change this behaviour.

color

a logical value specifying whether to use colors for thelines linking the tips to the time axis. IfFALSE, a greyscale is used.

...

other arguments to be passed toplot.phylo.

Details

The vectortip.dates may be numeric or of class“Date”. In either case, the time axis is setaccordingly. The length of this vector must be equal to the number oftips of the tree: the dates are matched to the tips numbers. Missingvalues are allowed.

Value

NULL

Author(s)

Emmanuel Paradis

Examples

dates <- as.Date(.leap.seconds)tr <- rtree(length(dates))plotTreeTime(tr, dates)## handling NA's:dates[11:26] <- NAplotTreeTime(tr, dates)## dates can be on an arbitrary scale, e.g., [-1, 1]:plotTreeTime(tr, runif(Ntip(tr), -1, 1))

Compact Display of a Phylogeny

Description

These functions prints a compact summary of a phylogeny, or a list ofphylogenies, on the console.

Usage

## S3 method for class 'phylo'print(x, printlen = 6 ,...)## S3 method for class 'multiPhylo'print(x, details = FALSE ,...)## S3 method for class 'multiPhylo'str(object, ...)

Arguments

x

an object of class"phylo" or"multiPhylo".

object

an object of class"multiPhylo".

printlen

the number of labels to print (6 by default).

details

a logical indicating whether to print information onall trees.

...

further arguments passed to or from other methods.

Value

NULL.

Author(s)

Ben Bolker and Emmanuel Paradis

Examples

x <- rtree(10)print(x)print(x, printlen = 10)x <- rmtree(2, 10)print(x)print(x, TRUE)str(x)

Random DNA Sequences

Description

This function generates random sets of DNA sequences.

Usage

rDNAbin(n, nrow, ncol, base.freq = rep(0.25, 4), prefix = "Ind_")

Arguments

n

a vector of integers giving the lengths of the sequences. Canbe missing in which casenrow andncol must be given.

nrow,ncol

two single integer values giving the number ofsequences and the number of sites, respectively (ignored ifnis given).

base.freq

the base frequencies.

prefix

the prefix used to give labels to the sequences; bydefault these are Ind_1, ... Ind_n (or Ind_nrow).

Details

Ifn is used, this function generates a list with sequence lengths given by the values inn. Ifn is missing, a matrix isgenerated.

The purpose of this function is to generate a set of sequences of aspecific size. To simulate sequences on a phylogenetic tree, seesimSeq inphangorn (very efficient), andthe packagephylosim (more for pedagogy).

Value

an object of class"DNAbin".

Note

It is not recommended to use this function to generate objects largerthan two billion bases (2 Gb).

Author(s)

Emmanuel Paradis

Examples

rDNAbin(1:10)rDNAbin(rep(10, 10))rDNAbin(nrow = 10, ncol = 10)

Continuous Character Simulation

Description

This function simulates the evolution of a continuous character along aphylogeny. The calculation is done recursively from the root. SeeParadis (2012, pp. 232 and 324) for an introduction.

Usage

rTraitCont(phy, model = "BM", sigma = 0.1, alpha = 1, theta = 0,           ancestor = FALSE, root.value = 0, ...)

Arguments

phy

an object of class"phylo".

model

a character (either"BM" or"OU") or afunction specifying the model (see details).

sigma

a numeric vector giving the standard-deviation of therandom component for each branch (can be a single value).

alpha

ifmodel = "OU", a numeric vector giving thestrength of the selective constraint for each branch (can be asingle value).

theta

ifmodel = "OU", a numeric vector giving theoptimum for each branch (can be a single value).

ancestor

a logical value specifying whether to return thevalues at the nodes as well (by default, only the values at the tipsare returned).

root.value

a numeric giving the value at the root.

...

further arguments passed tomodel if it is afunction.

Details

There are three possibilities to specifymodel:

"BM": a Browian motion model is used. If the argumentssigma has more than one value, its length must be equal to thethe branches of the tree. This allows to specify a model with variablerates of evolution. You must be careful that branch numbering is donewith the tree in “postorder” order: to see the order of the branchesyou can use:tr <- reorder(tr, "po"); plor(tr); edgelabels().The argumentsalpha andtheta are ignored.
"OU": an Ornstein-Uhlenbeck model is used. The aboveindexing rule is used for the three parameterssigma,alpha, andtheta. This may be interesting for the lastone to model varying phenotypic optima. The exact updating formulafrom Gillespie (1996) are used which are reduced to BM formula ifalpha = 0.
A function: it must be of the formfoo(x, l) wherex is the trait of the ancestor andl is the branchlength. It must return the value of the descendant. The argumentssigma,alpha, andtheta are ignored.

Value

A numeric vector with names taken from the tip labels ofphy. Ifancestor = TRUE, the node labels are used ifpresent, otherwise, “Node1”, “Node2”, etc.

Author(s)

Emmanuel Paradis

References

Gillespie, D. T. (1996) Exact numerical simulation of theOrnstein-Uhlenbeck process and its integral.Physical Review E,54, 2084–2091.

Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.

Examples

data(bird.orders)rTraitCont(bird.orders) # BM with sigma = 0.1### OU model with two optima:tr <- reorder(bird.orders, "postorder")plot(tr)edgelabels()theta <- rep(0, Nedge(tr))theta[c(1:4, 15:16, 23:24)] <- 2## sensitive to 'alpha' and 'sigma':rTraitCont(tr, "OU", theta = theta, alpha=.1, sigma=.01)### an imaginary model with stasis 0.5 time unit after a node, then### BM evolution with sigma = 0.1:foo <- function(x, l) {    if (l <= 0.5) return(x)    x + (l - 0.5)*rnorm(1, 0, 0.1)}tr <- rcoal(20, br = runif)rTraitCont(tr, foo, ancestor = TRUE)### a cumulative Poisson process:bar <- function(x, l) x + rpois(1, l)(x <- rTraitCont(tr, bar, ancestor = TRUE))plot(tr, show.tip.label = FALSE)Y <- x[1:20]A <- x[-(1:20)]nodelabels(A)tiplabels(Y)

Discrete Character Simulation

Description

This function simulates the evolution of a discrete character along aphylogeny. Ifmodel is a character or a matrix, evolution issimulated with a Markovian model; the transition probabilities arecalculated for each branch withP = e^{Qt} whereQ is therate matrix given bymodel andt is the branch length.The calculation is done recursively from the root. See Paradis (2006,p. 101) for a general introduction applied to evolution.

Usage

rTraitDisc(phy, model = "ER", k = if (is.matrix(model)) ncol(model) else 2,           rate = 0.1, states = LETTERS[1:k], freq = rep(1/k, k),           ancestor = FALSE, root.value = 1, ...)

Arguments

phy

an object of class"phylo".

model

a character, a square numeric matrix, or a functionspecifying the model (see details).

k

the number of states of the character.

rate

the rate of change used ifmodel is a character; itisnot recycled ifmodel = "ARD" ofmodel = "SYM".

states

the labels used for the states; by default “A”, “B”,...

freq

a numeric vector giving the equilibrium relativefrequencies of each state; by default the frequencies are equal.

ancestor

a logical value specifying whether to return thevalues at the nodes as well (by default, only the values at the tipsare returned).

root.value

an integer giving the value at the root (by default,it's the first state). To have a random value, useroot.value = sample(k).

...

further arguments passed tomodel if it is afunction.

Details

There are three possibilities to specifymodel:

A matrix: it must be a numeric square matrix; the diagonal isalways ignored. The argumentsk andrate are ignored.
A character: these are the same short-cuts than in the functionace:"ER" is an equal-rates model,"ARD"is an all-rates-different model, and"SYM" is a symmetricalmodel. Note that the argumentrate must be of the appropriatelength, i.e., 1,k(k - 1), ork(k - 1)/2 for the three models,respectively. The rate matrixQ is then filled column-wise.
A function: it must be of the formfoo(x, l) wherex is the trait of the ancestor andl is the branchlength. It must return the value of the descendant as an integer.

Value

A factor with names taken from the tip labels ofphy. Ifancestor = TRUE, the node labels are used if present,otherwise, “Node1”, “Node2”, etc.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2006)Analyses of Phylogenetics and Evolution withR. New York: Springer.

Examples

data(bird.orders)### the two followings are the same:rTraitDisc(bird.orders)rTraitDisc(bird.orders, model = matrix(c(0, 0.1, 0.1, 0), 2))### two-state model with irreversibility:rTraitDisc(bird.orders, model = matrix(c(0, 0, 0.1, 0), 2))### simple two-state model:tr <- rcoal(n <- 40, br = runif)x <- rTraitDisc(tr, ancestor = TRUE)plot(tr, show.tip.label = FALSE)nodelabels(pch = 19, col = x[-(1:n)])tiplabels(pch = 19, col = x[1:n])### an imaginary model with stasis 0.5 time unit after a node, then### random evolution:foo <- function(x, l) {    if (l < 0.5) return(x)    sample(2, size = 1)}tr <- rcoal(20, br = runif)x <- rTraitDisc(tr, foo, ancestor = TRUE)plot(tr, show.tip.label = FALSE)co <- c("blue", "yellow")cot <- c("white", "black")Y <- x[1:20]A <- x[-(1:20)]nodelabels(A, bg = co[A], col = cot[A])tiplabels(Y, bg = co[Y], col = cot[Y])

Multivariate Character Simulation

Description

This function simulates the evolution of a multivariate set of traitsalong a phylogeny. The calculation is done recursively from theroot.

Usage

rTraitMult(phy, model, p = 1, root.value = rep(0, p), ancestor = FALSE,           asFactor = NULL, trait.labels = paste("x", 1:p, sep = ""), ...)

Arguments

phy

an object of class"phylo".

model

a function specifying the model (see details).

p

an integer giving the number of traits.

root.value

a numeric vector giving the values at the root.

ancestor

a logical value specifying whether to return thevalues at the nodes as well (by default, only the values at the tipsare returned).

asFactor

the indices of the traits that are returned as factors(discrete traits).

trait.labels

a vector of mode character giving the names of thetraits.

...

further arguments passed tomodel if it is afunction.

Details

The model is specified with anR function of the formfoo(x, l) wherex is a vector of the traits of the ancestor andl is the branch length. Other arguments may be added. Thefunction must return a vector of lengthp.

Value

A data frame withp columns whose names are given bytrait.labels and row names taken from the labels of the tree.

Author(s)

Emmanuel Paradis

Examples

## correlated evolution of 2 continuous traits:mod <- function(x, l) {    y1 <- rnorm(1, x[1] + 0.5*x[2], 0.1)    y2 <- rnorm(1, 0.5*x[1] + x[2], 0.1)    c(y1, y2)}set.seed(11)tr <- makeNodeLabel(rcoal(20))x <- rTraitMult(tr, mod, 2, ancestor = TRUE)op <- par(mfcol = c(2, 1))plot(x, type = "n")text(x, labels = rownames(x), cex = 0.7)oq <- par(mar = c(0, 1, 0, 1), xpd = TRUE)plot(tr, font = 1, cex = 0.7)nodelabels(tr$node.label, cex = 0.7, adj = 1)par(c(op, oq))

Read DNA Sequences from GenBank via Internet

Description

This function connects to the GenBank database, and reads nucleotidesequences using accession numbers given as arguments.

Usage

read.GenBank(access.nb, seq.names = access.nb, species.names = TRUE,             as.character = FALSE, chunk.size = 400, quiet = TRUE,             type = "DNA")

Arguments

access.nb

a vector of mode character giving the accession numbers.

seq.names

the names to give to each sequence; by default theaccession numbers are used.

species.names

a logical indicating whether to attribute thespecies names to the returned object.

as.character

a logical controlling whether to return thesequences as an object of class"DNAbin" (the default).

chunk.size

the number of sequences downloaded together (seedetails).

quiet

a logical value indicating whether to show the progressof the downloads. IfTRUE, will also print the (full) name ofthe FASTA file containing the downloaded sequences.

type

a character specifying to download "DNA" (nucleotide) or"AA" (amino acid) sequences.

Details

The function uses the sitehttps://www.ncbi.nlm.nih.gov/ fromwhere the sequences are retrieved.

Ifspecies.names = TRUE, the returned list has an attribute"species" containing the names of the species taken from thefield “ORGANISM” in GenBank.

Sinceape 3.6, this function retrieves the sequences in FASTAformat: this is more efficient and more flexible (scaffolds andcontigs can be read) than what was done in previous versions. Theoptiongene.names has been removed inape 5.4; thisinformation is also present in the description.

Settingspecies.names = FALSE is much faster (could be usefulif you read a series of scaffolds or contigs, or if you already havethe species names).

The argumentchunk.size is set by default to 400 which islikely to work in many cases. If an error occurs such as “Cannot openfile ...” showing the list of the accession numbers, then you maytry decreasingchunk.size to 200 or 300.

Ifquiet = FALSE, the display is done chunk by chunk, so themessage “Downloading sequences: 400 / 400 ...” means that thedownload from sequence 1 to sequence 400 is under progress (it is notpossible to display a more accurate message because the downloadmethod depends on the platform).

Value

A list of DNA sequences made of vectors of class"DNAbin", orof single characters (ifas.character = TRUE) with twoattributes (species and description).

Author(s)

Emmanuel Paradis and Klaus Schliep

Examples

## This won't work if your computer is not connected## to the Internet## Get the 8 sequences of tanagers (Ramphocelus)## as used in Paradis (1997)ref <- c("U15717", "U15718", "U15719", "U15720",         "U15721", "U15722", "U15723", "U15724")## Copy/paste or type the following commands if you## want to try them.## Not run: Rampho <- read.GenBank(ref)## get the species names:attr(Rampho, "species")## build a matrix with the species names and the accession numbers:cbind(attr(Rampho, "species"), names(Rampho))## print the first sequence## (can be done with `Rampho$U15717' as well)Rampho[[1]]## the description from each FASTA sequence:attr(Rampho, "description")## End(Not run)

Read Tree File in CAIC Format

Description

This function reads one tree from a CAIC file.A second file containing branch lengths values may also be passed (experimental).

Usage

read.caic(file, brlen = NULL, skip = 0, comment.char = "#", ...)

Arguments

file

a file name specified by either a variable of mode character, or a double-quoted string.

brlen

a file name for the branch lengths file.

skip

the number of lines of the input file to skip before beginning to read data (this is passed directly to scan()).

comment.char

a single character, the remaining of the line after this character is ignored (this is passed directly to scan()).

...

Further arguments to be passed to scan().

Details

Read a tree from a file in the format used by the CAIC and MacroCAIc program.

Value

an object of class"phylo".

Warning

The branch length support is still experimental and was not fully tested.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

References

Purvis, A. and Rambaut, A. (1995) Comparative analysis by independentcontrasts (CAIC): an Apple Macintosh application for analysingcomparative data.CABIOS,11 :241–251.

Examples

## The same example than in read.tree, without branch lengths.## An extract from Sibley and Ahlquist (1990)fl <- tempfile("tree", fileext = ".tre")cat("AAA","Strix_aluco","AAB","Asio_otus",   "AB","Athene_noctua","B","Tyto_alba",   file = fl, sep = "\n")tree.owls <- read.caic(fl)plot(tree.owls)tree.owlsunlink(fl) # delete the file "ex.tre"

Read DNA Sequences in a File

Description

These functions read DNA sequences in a file, and returns a matrix or alist of DNA sequences with the names of the taxa read in the file asrownames or names, respectively. By default, the sequences are returnedin binary format, otherwise (ifas.character = TRUE) inlowercase.

Usage

read.dna(file, format = "interleaved", skip = 0,         nlines = 0, comment.char = "#",         as.character = FALSE, as.matrix = NULL)read.FASTA(file, type = "DNA")read.fastq(file, offset = -33)

Arguments

file

a file name specified by either a variable of mode character,or a double-quoted string. Can also be aconnection (whichwill be opened for reading if necessary, and if soclosed (and hence destroyed) at the end of thefunction call). Files compressed with GZIP can be read (the namemust end with .gz), as well as remote files.

format

a character string specifying the format of the DNAsequences. Four choices are possible:"interleaved","sequential","clustal", or"fasta", or anyunambiguous abbreviation of these.

skip

the number of lines of the input file to skip beforebeginning to read data (ignored for FASTA files; see below).

nlines

the number of lines to be read (by default the file isread untill its end; ignored for FASTA files)).

comment.char

a single character, the remaining of the lineafter this character is ignored (ignored for FASTA files).

as.character

a logical controlling whether to return thesequences as an object of class"DNAbin" (the default).

as.matrix

(used ifformat = "fasta") one of the threefollowings: (i)NULL: returns the sequences in a matrix ifthey are of the same length, otherwise in a list; (ii)TRUE:returns the sequences in a matrix, or stops with an error if theyare of different lengths; (iii)FALSE: always returns thesequences in a list.

type

a character string giving the type of the sequences: one of"DNA" or"AA" (case-independent, can be abbreviated).

offset

the value to be added to the quality scores (the defaultapplies to the Sanger format and should work for most recent FASTQfiles).

Details

read.dna follows the interleaved and sequential formats definedin PHYLIP (Felsenstein, 1993) but with the original feature than thereis no restriction on the lengths of the taxa names. For these twoformats, the first line of the file must contain the dimensions of thedata (the numbers of taxa and the numbers of nucleotides); thesequences are considered as aligned and thus must be of the samelengths for all taxa. For the FASTA and FASTQ formats, the conventionsdefined in the references are followed; the sequences are taken asnon-aligned. For all formats, the nucleotides can be arranged in anyway with blanks and line-breaks inside (with the restriction that thefirst ten nucleotides must be contiguous for the interleaved andsequential formats, see below). The names of the sequences are read inthe file. Particularities for each format are detailed below.

Interleaved: the function starts to read the sequences after itfinds one or more spaces (or tabulations). All characters before thesequences are taken as the taxa names after removing the leading andtrailing spaces (so spaces in taxa names are not allowed). It isassumed that the taxa names are not repeated in the subsequentblocks of nucleotides.
Sequential: the same criterion than for the interleaved formatis used to start reading the sequences and the taxa names; thesequences are then read until the number of nucleotides specified inthe first line of the file is reached. This is repeated for each taxa.
Clustal: this is the format output by the Clustal programs(.aln). It is close to the interleaved format: the differences arethat the dimensions of the data are not indicated in the file, andthe names of the sequences are repeated in each block.
FASTA: this looks like the sequential format but the taxa names(or a description of the sequence) are on separate lines beginningwith a ‘greater than’ character ‘>’ (there may be leading spacesbefore this character). These lines are taken as taxa names afterremoving the ‘>’ and the possible leading and trailing spaces. Allthe data in the file before the first sequence are ignored.

The FASTQ format is explained in the references.

Compressed files must be read through connections (see examples).read.fastq can read compressed files directly (seeexamples).

Value

a matrix or a list (ifformat = "fasta") of DNA sequencesstored in binary format, or of mode character (ifas.character = "TRUE").

read.FASTA always returns a list of class"DNAbin" or"AAbin".

read.fastq returns a list of class"DNAbin" with anatrribute"QUAL" (see examples).

Author(s)

Emmanuel Paradis and RJ Ewing

References

Anonymous. FASTA format.https://en.wikipedia.org/wiki/FASTA_format

Anonymous. FASTQ format.https://en.wikipedia.org/wiki/FASTQ_format

Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version3.5c. Department of Genetics, University of Washington.http://evolution.genetics.washington.edu/phylip/phylip.html

Examples

## 1. Simple text filesTEXTfile <- tempfile("exdna", fileext = ".txt")## 1a. Extract from data(woodmouse) in sequential format:cat("3 40","No305     NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT","No304     ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT","No306     ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",file = TEXTfile, sep = "\n")ex.dna <- read.dna(TEXTfile, format = "sequential")str(ex.dna)ex.dna## 1b. The same data in interleaved format, ...cat("3 40","No305     NTTCGAAAAA CACACCCACT","No304     ATTCGAAAAA CACACCCACT","No306     ATTCGAAAAA CACACCCACT","          ACTAAAANTT ATCAGTCACT","          ACTAAAAATT ATCAACCACT","          ACTAAAAATT ATCAATCACT",file = TEXTfile, sep = "\n")ex.dna2 <- read.dna(TEXTfile)## 1c. ... in clustal format, ...cat("CLUSTAL (ape) multiple sequence alignment", "","No305     NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT","No304     ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT","No306     ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT","           ************************** ******  ****",file = TEXTfile, sep = "\n")ex.dna3 <- read.dna(TEXTfile, format = "clustal")## 1d. ... and in FASTA formatFASTAfile <- tempfile("exdna", fileext = ".fas")cat(">No305","NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",">No304","ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",">No306","ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",file = FASTAfile, sep = "\n")ex.dna4 <- read.dna(FASTAfile, format = "fasta")## The 4 data objects are the same:identical(ex.dna, ex.dna2)identical(ex.dna, ex.dna3)identical(ex.dna, ex.dna4)## 2. How to read GZ compressed files## create a GZ file and open a connection:GZfile <- tempfile("exdna", fileext = ".fas.gz")con <- gzfile(GZfile, "wt")## write the data using the connection:cat(">No305", "NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",    ">No304", "ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",    ">No306", "ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",    file = con, sep = "\n")close(con) # close the connection## read the GZ'ed file:ex.dna5 <- read.dna(gzfile(GZfile), "fasta")## This example is with a FASTA file but this works as well## with the other formats described above.## All 5 data objects are identical:identical(ex.dna, ex.dna5)unlink(c(TEXTfile, FASTAfile, GZfile)) # clean-up## Not run: ## 3. How to read files from a ZIP archive## NOTE: since ape 5.7-1, all files in these examples are written## in the temporary directory, thus the following commands work## best when run in the user's working directory.## write the woodmouse data in a FASTA file:data(woodmouse)write.dna(woodmouse, "woodmouse.fas", "fasta")## archive a FASTA file in a ZIP file:zip("myarchive.zip", "woodmouse.fas")## Note: the file myarchive.zip is created if necessary## Read the FASTA file from the ZIP archive without extraction:wood2 <- read.dna(unz("myarchive.zip", "woodmouse.fas"), "fasta")## Alternatively, unzip the archive:fl <- unzip("myarchive.zip")## the previous command eventually creates locally## the fullpath archived with 'woodmouse.fas'wood3 <- read.dna(fl, "fasta")identical(woodmouse, wood2)identical(woodmouse, wood3)## End(Not run)## read a FASTQ file from 1000 Genomes:## Not run: a <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/"file <- "SRR062641.filt.fastq.gz"URL <- paste0(a, file)download.file(URL, file)## If the above command doesn't work, you may copy/paste URL in## a Web browser instead.X <- read.fastq(file)X # 109,811 sequences## get the qualities of the first sequence:(qual1 <- attr(X, "QUAL")[[1]])## the corresponding probabilities:10^(-qual1/10)## get the mean quality for each sequence:mean.qual <- sapply(attr(X, "Q"), mean)## can do the same for var, sd, ...## End(Not run)

Read GFF Files

Description

This function reads a file in general feature format version 3 (GFF3)and returns a data frame.

Usage

read.gff(file, na.strings = c(".", "?"), GFF3 = TRUE)

Arguments

file

a file name specified by a character string.

na.strings

the strings in the GFF file that will be convertedas NA's (missing values).

GFF3

a logical value specifying whether if the file isformatted according to version 3 of GFF.

Details

The returned data frame has its (column) names correctly set (seeReferences) and the categorical variables (seqid, source, type,strand, and phase) set as factors.

This function should be more efficient than usingread.delim.

GFF2 (aka GTF) files can also be read: useGFF3 = FALSE to havethe correct field names. Note that GFF2 files and GFF3 files have thesame structure, although some fields are slightly different (seereference).

The file can be gz-compressed (see examples), but not zipped.

Value

NULL

Author(s)

Emmanuel Paradis

References

https://en.wikipedia.org/wiki/General_feature_format

Examples

## Not run: ## requires to be connected on Internetd <- "https://ftp.ensembl.org/pub/release-86/gff3/homo_sapiens/"f <- "Homo_sapiens.GRCh38.86.chromosome.MT.gff3.gz"download.file(paste0(d, f), "mt_gff3.gz")## If the above command doesn't work, you may copy/paste the full URL in## a Web browser instead.gff.mito <- read.gff("mt_gff3.gz")## the lengths of the sequence features:gff.mito$end - (gff.mito$start - 1)table(gff.mito$type)## where the exons start:gff.mito$start[gff.mito$type == "exon"]## End(Not run)

Read Tree File in Nexus Format

Description

This function reads one or several trees in a NEXUS file.

Usage

read.nexus(file, tree.names = NULL, force.multi = FALSE)

Arguments

file

a file name specified by either a variable of mode character,or a double-quoted string.

tree.names

if there are several trees to be read, a vector ofmode character giving names to the individual trees (by default,this uses the labels in the NEXUS file if these are present).

force.multi

a logical value; ifTRUE, an object of class"multiPhylo" is always returned even if the file contains asingle tree (see details).

Details

The present implementation tries to follow as much as possible theNEXUS standard (but see the restriction below on TRANSLATIONtables). Only the block “TREES” is read; the other data can be readwith other functions (e.g.,read.dna,read.table, ...).

If a TRANSLATION table is present it is assumed that only the tiplabels are translated and they are all translated with integerswithout gap. Consequently, if nodes have labels in the tree(s) theyare read as they are and not looked for in the translation table. Thelogic behind this is that in the vast majority of cases, node labelswill be support values rather than proper taxa names. This isconsistent withwrite.nexus which translates only thetip labels.

Usingforce.multi = TRUE when the file contains a single treemakes possible to keep the tree name (as names of the list).

‘read.nexus’ tries to represent correctly trees with a badlyrepresented root edge (i.e. with an extra pair of parentheses). Forinstance, the tree "((A:1,B:1):10);" will be read like "(A:1,B:1):10;"but a warning message will be issued in the former case as this isapparently not a valid Newick format. If there are two root edges(e.g., "(((A:1,B:1):10):10);"), then the tree is not read and an errormessage is issued.

Value

an object of class"phylo" or"multiPhylo".

Author(s)

Emmanuel Paradis

References

Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.

Read Character Data In NEXUS Format

Description

read.nexus.data reads a file with sequences in the NEXUSformat.nexus2DNAbin is a helper function to convert the outputfrom the previous function into the class"DNAbin".

For the moment, only sequence data (DNA or protein) are supported.

Usage

read.nexus.data(file)nexus2DNAbin(x)

Arguments

file

a file name specified by either a variable of modecharacter, or a double-quoted string.

x

an object output byread.nexus.data.

Details

This parser tries to read data from a file written in arestricted NEXUS format (see examples below).

Please see files ‘data.nex’ and ‘taxacharacters.nex’ forexamples of formats that will work.

Some noticeable exceptions from the NEXUS standard (non-exhaustivelist):

I: Comments must be either on separate lines or at theend of lines. Examples:
[Comment]— OK
Taxon ACGTACG [Comment]— OK
[Comment line 1
Comment line 2]— NOT OK!
Tax[Comment]on ACG[Comment]T— NOT OK!
II: No spaces (or comments) are allowed in thesequences. Examples:
name ACGT— OK
name AC GT— NOT OK!
III: No spaces are allowed in taxon names, not even ifnames are in single quotes. That is, single-quoted names are nottreated as such by the parser. Examples:
Genus_species— OK
'Genus_species'— OK
'Genus species'— NOT OK!
IV: The trailingend that closes thematrix must be on a separate line. Examples:
taxon AACCGGT
end;— OK
taxon AACCGGT;
end;— OK
taxon AACCCGT; end;— NOT OK!
V: Multistate characters are not allowed. That is,NEXUS allows you to specify multiple character states at acharacter position either as an uncertainty,(XY), or as anactual appearance of multiple states,{XY}. This isinformation is not handled by the parser. Examples:
taxon 0011?110— OK
taxon 0011{01}110— NOT OK!
taxon 0011(01)110— NOT OK!
VI: The number of taxa must be on the same line asntax. The same applies tonchar. Examples:
ntax = 12— OK
ntax =
12— NOT OK!
VII: The word “matrix” can not occur anywhere inthe file before the actualmatrix command, unless it is ina comment. Examples:
BEGIN CHARACTERS;
TITLE 'Data in file "03a-cytochromeB.nex"';
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
["This is The Matrix"]— OK
MATRIX
BEGIN CHARACTERS;
TITLE 'Matrix in file "03a-cytochromeB.nex"';— NOT OK!
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
MATRIX

Value

A list of sequences each made of a single vector of mode characterwhere each element is a (phylogenetic) character state.

Author(s)

Johan Nylander, Thomas Guillerme, and Klaus Schliep

References

Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.

Examples

## Use read.nexus.data to read a file in NEXUS format into object x## Not run: x <- read.nexus.data("file.nex")

Read Tree File in Parenthetic Format

Description

This function reads a file which contains one or several trees inparenthetic format known as the Newick or New Hampshire format.

Usage

read.tree(file = "", text = NULL, tree.names = NULL, skip = 0,    comment.char = "", keep.multi = FALSE, ...)

Arguments

file

a file name specified by either a variable of mode character,or a double-quoted string; iffile = "" (the default) then thetree is input on the keyboard, the entry being terminated with ablank line.

text

alternatively, the name of a variable of mode characterwhich contains the tree(s) in parenthetic format. By default, thisis ignored (set toNULL, meaning that the tree is read in afile); iftext is notNULL, then the argumentfile is ignored.

tree.names

if there are several trees to be read, a vector ofmode character that gives names to the individual trees; ifNULL (the default), the trees are named"tree1","tree2", ...

skip

the number of lines of the input file to skip beforebeginning to read data (this is passed directly to scan()).

comment.char

a single character, the remaining of the lineafter this character is ignored (this is passed directly toscan()).

keep.multi

ifTRUE andtree.names = NULL thensingle trees are returned in"multiPhylo" format, with anyname that is present (see details). Default isFALSE.

...

further arguments to be passed toscan().

Details

The default option forfile allows to type directly the tree onthe keyboard (or possibly to copy from an editor and paste in R'sconsole) with, e.g.,mytree <- read.tree().

‘read.tree’ tries to represent correctly trees with a badlyrepresented root edge (i.e. with an extra pair of parentheses). Forinstance, the tree "((A:1,B:1):10);" will be read like "(A:1,B:1):10;"but a warning message will be issued in the former case as this isapparently not a valid Newick format. If there are two root edges(e.g., "(((A:1,B:1):10):10);"), then the tree is not read and an errormessage is issued.

If there are any characters preceding the first "(" in a line thenthis is assigned to the name. This is returned when a "multiPhylo"object is returned andtree.names = NULL.

Untilape 4.1, the default ofcomment.char was"#"(as inscan). This has been changed so that extended Newickfiles can be read.

Value

an object of class"phylo" with the following components:

edge

a two-column matrix of mode numeric where each rowrepresents an edge of the tree; the nodes and the tips aresymbolized with numbers; the tips are numbered 1, 2, ..., and thenodes are numbered after the tips. For each row, the first columngives the ancestor.

edge.length

(optional) a numeric vector giving the lengths of thebranches given byedge.

tip.label

a vector of mode character giving the names of thetips; the order of the names in this vector corresponds to the(positive) number inedge.

Nnode

the number of (internal) nodes.

node.label

(optional) a vector of mode character giving thenames of the nodes.

root.edge

(optional) a numeric value giving the length of thebranch at the root if it exists.

If several trees are read in the file, the returned object is of class"multiPhylo", and is a list of objects of class"phylo".The name of each tree can be specified bytree.names, or can beread from the file (see details).

Author(s)

Emmanuel Paradis and Daniel Lawsondan.lawson@bristol.ac.uk

References

Felsenstein, J. The Newick tree format.http://evolution.genetics.washington.edu/phylip/newicktree.html

Olsen, G. Interpretation of the "Newick's 8:45" tree format standard.http://evolution.genetics.washington.edu/phylip/newick_doc.html

Paradis, E. (2020) Definition of Formats for Coding Phylogenetic Treesin R.https://emmanuelparadis.github.io/misc/FormatTreeR.pdf

Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.

Examples

### An extract from Sibley and Ahlquist (1990)s <- "owls(((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3):6.3,Tyto_alba:13.5);"treefile <- tempfile("tree", fileext = ".tre")cat(s, file = treefile, sep = "\n")tree.owls <- read.tree(treefile)str(tree.owls)tree.owlstree.owls <- read.tree(treefile, keep.multi = TRUE)tree.owlsnames(tree.owls)unlink(treefile) # clean-up### Only the first three species using the option `text'TREE <- "((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3);"TREEtree.owls.bis <- read.tree(text = TREE)str(tree.owls.bis)tree.owls.bis## tree with singleton nodes:ts <- read.tree(text = "((((a))),d);")plot(ts, node.depth = 2) # the default will overlap the singleton node with the tipnodelabels()## 'skeleton' tree with a singleton node:tx <- read.tree(text = "(((,)),);")plot(tx, node.depth = 2)nodelabels()## a tree with single quoted labels (the 2nd label is not quoted## because it has no white spaces):z <- "(('a: France, Spain (Europe)',b),'c: Australia [Outgroup]');"tz <- read.tree(text = z)plot(tz, font = 1)

Continuous Ancestral Character Estimation

Description

This function estimates ancestral character states, and the associateduncertainty, for continuous characters. It mainly works as the acefunction, from which it differs, first, in the fact that computationsare not performed by numerical optimisation but through matrixcalculus. Second, besides classical Brownian-based reconstructionmethods, it reconstructs ancestral states under Arithmetic BrownianMotion (ABM, i.e. Brownian with linear trend) and Ornstein-Uhlenbeckprocess (OU, i.e. Brownian with an attractive optimum).

Usage

reconstruct(x, phyInit, method = "ML", alpha = NULL,            low_alpha = 0.0001, up_alpha = 1, CI = TRUE)

Arguments

x

a numerical vector.

phyInit

an object of class"phylo".

method

a character specifying the method used forestimation. Six choices are possible:"ML","REML","GLS","GLS_ABM","GLS_OU" or"GLS_OUS".

alpha

a numerical value which accounts for the attractive strength parameter of"GLS_OU" or"GLS_OUS" (used only in these cases). If alpha = NULL (the default), then it is estimated by maximum likelihood usingoptim, withlow_alpha (resp.up_alpha) as lower value (resp. upper value), which may lead to convergence issue.

low_alpha

a lower bound for alpha, used only with methods"GLS_OU" or"GLS_OUS". It has to be positive.

up_alpha

an upper bound for alpha, used only with methods"GLS_OU" or"GLS_OUS". It has to be positive.

CI

a logical specifying whether to return the 95% confidenceintervals of the ancestral state estimates.

Details

For"ML","REML" and"GLS", the default model is Brownian motion. This modelcan be fitted by maximumlikelihood (method = "ML", Felsenstein 1973, Schluter et al. 1997) - the default, residual maximum likelihood (method = "REML"), or generalized leastsquares (method = "GLS", Martins and Hansen 1997, Garland T and Ives AR 2000)."GLS_ABM" is based on Brownian motion with trend model. Both"GLS_OU" and"GLS_OUS" are based on Ornstein-Uhlenbeck model."GLS_OU" and"GLS_OUS" differs in the fact that"GLS_OUS" assume that the process starts from the optimum, while the root state has to be estimated for"GLS_OU", which may rise some issues (see Royer-Carenzi and Didier, 2016). Users may provide the attractive strength parameteralpha, for these two models."GLS_ABM","GLS_OU" and"GLS_OUS" are all fitted by generalized least squares (Royer-Carenzi and Didier, 2016).

Value

an object of class"ace" with the following elements:

ace

the estimates of theancestral character values.

CI95

the estimated 95%confidence intervals.

sigma2

ifmethod = "ML", the maximum likelihood estimate of theBrownian parameter.

loglik

ifmethod = "ML", the maximum log-likelihood.

Note

GLS_ABM should not be used on ultrametric tree.

GLS_OU may lead to aberrant reconstructions.

Author(s)

Manuela Royer-Carenzi, Gilles Didier

References

Felsenstein, J. (1973) Maximum likelihood estimation of evolutionarytrees from continuous characters.American Journal of HumanGenetics,25, 471–492.

Garland T. and Ives A.R. (2000) Using the past to predict the present:confidence intervals for regression equations in phylogeneticcomparative methods.American Naturalist,155,346–364.

Royer-Carenzi, M. and Didier, G. (2016) A comparison of ancestralstate reconstruction methods for quantitativecharacters.Journal of Theoretical Biology,404,126–142.

Schluter, D., Price, T., Mooers, A. O. and Ludwig, D. (1997)Likelihood of ancestor states in adaptive radiation.Evolution,51, 1699–1711.

Yang, Z. (2006)Computational Molecular Evolution. Oxford:Oxford University Press.

Examples

### Some random data...data(bird.orders)x <- rnorm(23, m=100)### Reconstruct ancestral quantitative characters:reconstruct(x, bird.orders)reconstruct(x, bird.orders, method = "GLS_OUS", alpha=NULL)

Internal Reordering of Trees

Description

reorder changes the internal structure of a phylogeny stored asan object of class"phylo". The tree returned is the same thanthe one input, but the ordering of the edges could be different.

cladewise andpostorder are convenience functions toreturn only the indices of the reordered edge matrices (see examples).

Usage

## S3 method for class 'phylo'reorder(x, order = "cladewise", index.only = FALSE, ...)## S3 method for class 'multiPhylo'reorder(x, order = "cladewise", ...)cladewise(x)postorder(x)

Arguments

x

an object of class"phylo" or"multiPhylo".

order

a character string: either"cladewise" (thedefault),"postorder","pruningwise", or anyunambiguous abbreviation of these.

index.only

should the function return only the ordered indicesof the rows of the edge matrix?

...

further arguments passed to or from other methods.

Details

Because in a tree coded as an object of class"phylo" eachbranch is represented by a row in the element ‘edge’, there is anarbitrary choice for the ordering of these rows.reorder allowsto reorder these rows according to three rules: in the"cladewise" order each clade is formed by a series ofcontiguous rows. In the"postorder" order, the rows arearranged so that computations following pruning-like algorithm thetree (or postorder tree traversal) can be done by descending alongthese rows (conversely, a preorder tree traversal can be performed bymoving from the last to the first row). The"pruningwise" orderis an alternative “pruning” order which is actually a bottom-uptraversal order (Valiente 2002). (This third choice might be removedin the future as it merely duplicates the second one which is moreefficient.) The possible multichotomies and branch lengths are preserved.

Note that for a given order, there are several possible orderings ofthe rows of ‘edge’.

Value

an object of class"phylo" (with the attribute"order"set accordingly), or a numeric vector ifindex.only = TRUE; ifx is of class"multiPhylo", then an object of the sameclass.

Author(s)

Emmanuel Paradis

References

Valiente, G. (2002)Algorithms on Trees and Graphs. New York:Springer.

Examples

data(bird.families)tr <- reorder(bird.families, "postorder")all.equal(bird.families, tr) # uses all.equal.phylo actuallyall.equal.list(bird.families, tr) # bypasses the generic## get the number of descendants for each tip or node:nr_desc <-  function(x) {    res <- numeric(max(x$edge))    res[1:Ntip(x)] <- 1L    for (i in postorder(x)) {       tmp <- x$edge[i,1]       res[tmp] <- res[tmp] + res[x$edge[i, 2]]   }   res}## apply it to a random tree:tree <- rtree(10)plot(tree, show.tip.label = FALSE)tiplabels()nodelabels()nr_desc(tree)

Test of Diversification-Shift With the Yule Process

Description

This function performs a test of shift in diversification rate usingprobabilities from the Yule process.

Usage

richness.yule.test(x, t)

Arguments

x

a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease or decrease diversification rate, and the second one the number ofspecies in the sister-clades without the trait. Eachrow represents a pair of sister-clades.

t

a numeric vector giving the divergence times of each pair ofclades inx.

Value

a data frame with the\chi^2, the number of degrees offreedom (= 1), and theP-value.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.

Examples

### see example(mcconwaysims.test)

Tree Simulation Under the Time-Dependent Birth–Death Models

Description

These three functions simulate phylogenies under any time-dependentbirth–death model:rlineage generates a complete tree includingthe species going extinct before present;rbdtree generates atree with only the species living at present (thus the tree isultrametric);rphylo generates a tree with a fixed number ofspecies at present time.drop.fossil is a utility function toremove the extinct species.

Usage

rlineage(birth, death, Tmax = 50, BIRTH = NULL,         DEATH = NULL, eps = 1e-6)rbdtree(birth, death, Tmax = 50, BIRTH = NULL,        DEATH = NULL, eps = 1e-6)rphylo(n, birth, death, BIRTH = NULL, DEATH = NULL,       T0 = 50, fossils = FALSE, eps = 1e-06)drop.fossil(phy, tol = 1e-8)

Arguments

birth,death

a numeric value or a (vectorized) functionspecifying how speciation and extinction rates vary through time.

Tmax

a numeric value giving the length of the simulation.

BIRTH,DEATH

a (vectorized) function which is the primitiveofbirth ordeath. This can be used to speed-up thecomputation. By default, a numerical integration is done.

eps

a numeric value giving the time resolution of thesimulation; this may be increased (e.g., 0.001) to shortencomputation times.

n

the number of species living at present time.

T0

the time at present (for the backward-in-time algorithm).

fossils

a logical value specifying whether to output thelineages going extinct.

phy

an object of class"phylo".

tol

a numeric value giving the tolerance to consider a speciesas extinct.

Details

These three functions use continuous-time algorithms:rlineageandrbdtree use the forward-in-time algorithms described inParadis (2011), whereasrphylo uses a backward-in-timealgorithm from Stadler (2011). The models are time-dependentbirth–death models as described in Kendall (1948). Speciation(birth) and extinction (death) rates may be constant or vary throughtime according to anR function specified by the user. In the lattercase,BIRTH and/orDEATH may be used if the primitivesofbirth anddeath are known. In these functions time isthe formal argument and must be namedt.

Note thatrphylo simulates trees in a way similar to whatthe packageTreeSim does, the difference is in theparameterization of the time-dependent models which is here the samethan used in the two other functions. In this parameterization scheme,time is measured from past to present (see details in Paradis 2015which includes a comparison of these algorithms).

The difference betweenrphylo andrphylo(... fossils = TRUE) is the same than betweenrbdtree andrlineage.

Value

An object of class"phylo".

Author(s)

Emmanuel Paradis

References

Kendall, D. G. (1948) On the generalized “birth-and-death”process.Annals of Mathematical Statistics,19, 1–15.

Paradis, E. (2011) Time-dependent speciation and extinction fromphylogenies: a least squares approach.Evolution,65,661–672.

Paradis, E. (2015) Random phylogenies and the distribution ofbranching times.Journal of Theoretical Biology,387,39–45.

Stadler, T. (2011) Simulating trees with a fixed number of extantspecies.Systematic Biology,60, 676–684.

Examples

set.seed(10)plot(rlineage(0.1, 0)) # Yule process with lambda = 0.1plot(rlineage(0.1, 0.05)) # simple birth-death processb <- function(t) 1/(1 + exp(0.2*t - 1)) # logisticlayout(matrix(0:3, 2, byrow = TRUE))curve(b, 0, 50, xlab = "Time", ylab = "")mu <- 0.07segments(0, mu, 50, mu, lty = 2)legend("topright", c(expression(lambda), expression(mu)),       lty = 1:2, bty = "n")plot(rlineage(b, mu), show.tip.label = FALSE)title("Simulated with 'rlineage'")plot(rbdtree(b, mu), show.tip.label = FALSE)title("Simulated with 'rbdtree'")

Roots Phylogenetic Trees

Description

root reroots a phylogenetic tree with respect to the specifiedoutgroup or at the node specified innode.

unroot unroots a phylogenetic tree, or returns it unchanged ifit is already unrooted.

is.rooted tests whether a tree is rooted.

Usage

root(phy, ...)## S3 method for class 'phylo'root(phy, outgroup, node = NULL, resolve.root = FALSE,     interactive = FALSE, edgelabel = FALSE, ...)## S3 method for class 'multiPhylo'root(phy, outgroup, ...)unroot(phy, ...)## S3 method for class 'phylo'unroot(phy, collapse.singles = FALSE,        keep.root.edge = FALSE, ...)## S3 method for class 'multiPhylo'unroot(phy, collapse.singles = FALSE,        keep.root.edge = FALSE, ...)is.rooted(phy)## S3 method for class 'phylo'is.rooted(phy)## S3 method for class 'multiPhylo'is.rooted(phy)

Arguments

phy

an object of class"phylo" or"multiPhylo".

outgroup

a vector of mode numeric or character specifying thenew outgroup.

node

alternatively, a node number where to root the tree.

resolve.root

a logical specifying whether to resolve the newroot as a bifurcating node.

interactive

ifTRUE the user is asked to select the nodeby clicking on the tree which must be plotted.

edgelabel

a logical value specifying whether to treat nodelabels as edge labels and thus eventually switching them so thatthey are associated with the correct edges when usingdrawSupportOnEdges (see Czech et al. 2016).

collapse.singles

a logical value specifying wether to callcollapse.singles before proceeding to unrooting thetree.

keep.root.edge

a logical value. IfTRUE, theroot.edge element of the tree is added in the edge matrix asa terminal edge. The default is to delete this element.

...

arguments passed among methods (e.g., when rooting listsof trees).

Details

The argumentoutgroup can be either character or numeric. Inthe first case, it gives the labels of the tips of the new outgroup;in the second case the numbers of these labels in the vectorphy$tip.label are given.

Ifoutgroup is of length one (i.e., a single value), then thetree is rerooted using the node below this tip as the new root.

Ifoutgroup is of length two or more, the most recent commonancestor (MRCA)of the ingroup is used as the new root. Notethat the tree is unrooted before being rerooted, so that ifoutgroup is already the outgroup, then the returned tree is notthe same than the original one (see examples). Ifoutgroup isnot monophyletic, the operation fails and an error message is issued.

Ifresolve.root = TRUE,root adds a zero-length branchbelow the MRCA of the ingroup.

A tree is considered rooted if either only two branches connect to theroot, or if there is aroot.edge element. In all other cases,is.rooted returnsFALSE.

Value

an object of class"phylo" or"multiPhylo" forroot andunroot; a logical vector foris.rooted.

Note

The use ofresolve.root = TRUE together withnode =gives an error if the specified node is the current root of thetree. This is because there is an ambiguity when resolving a node inan unrooted tree with no explicit outgroup. If the node is not thecurrent root, the ambiguity is solved arbitrarily by considering theclade on the right ofnode (when the tree is plotted bydefault) as the ingroup. See a detailed explanation there:

https://www.mail-archive.com/r-sig-phylo@r-project.org/msg03805.html.

Author(s)

Emmanuel Paradis

References

Czech, L., Huerta-Cepas, J. and Stamatakis, A. (2017) A criticalreview on the use of support values in tree viewers and bioinformaticstoolkits.Molecular Biology and Evolution,34,1535–1542.doi:10.1093/molbev/msx055

Examples

data(bird.orders)plot(root(bird.orders, 1))plot(root(bird.orders, 1:5))tr <- root(bird.orders, 1)is.rooted(bird.orders) # yesis.rooted(tr)          # no### This is because the tree has been unrooted first before rerooting.### You can delete the outgroup...is.rooted(drop.tip(tr, "Struthioniformes"))### ... or resolve the basal trichotomy in two ways:is.rooted(multi2di(tr))is.rooted(root(bird.orders, 1, r = TRUE))### To keep the basal trichotomy but forcing the tree as rooted:tr$root.edge <- 0is.rooted(tr)x <- setNames(rmtree(10, 10), LETTERS[1:10])is.rooted(x)

Swapping Sister Clades

Description

For a given node,rotate exchanges the position of two cladesdescending from this node. It can handle dichotomies as well aspolytomies. In the latter case, two clades from the polytomy areselected for swapping.

rotateConstr rotates internal branches giving a constraint onthe order of the tips.

Usage

rotate(phy, node, polytom = c(1, 2))rotateConstr(phy, constraint)

Arguments

phy

an object of class"phylo".

node

a vector of mode numeric or character specifying thenumber of the node.

polytom

a vector of mode numeric and length two specifying thetwo clades that should be exchanged in a polytomy.

constraint

a vector of mode character specifying the order ofthe tips as they should appear when plotting the tree (from bottomto top).

Details

phy can be either rooted or unrooted, contain polytomies and lackbranch lengths. In the presence of very short branch lengths it isconvenient to plot the phylogenetic tree without branch lengths in orderto identify the number of the node in question.

node can be any of the interior nodes of a phylogenetic treeincluding the root node. Number of the nodes can be identified by thenodelabels function. Alternatively, you can specify a vector of lengthtwo that contains either the number or the names of two tips thatcoalesce in the node of interest.

If the node subtends a polytomy, any two clades of the the polytomycan be chosen by polytom. On a plotted phylogeny, the clades arenumbered from bottom to top and polytom is used to index the twoclades one likes to swop.

Value

an object of class"phylo".

Author(s)

Christoph Heiblheibl@lmu.de, Emmanuel Paradis

Examples

# create a random tree:tre <- rtree(25)# visualize labels of internal nodes:plot(tre, use.edge.length=FALSE)nodelabels()# rotate clades around node 30:tre.new <- rotate(tre, 30)# compare the results:par(mfrow=c(1,2)) # split graphical deviceplot(tre) # plot old treplot(tre.new) # plot new tree# visualize labels of terminal nodes:plot(tre)tiplabels()# rotate clades containing nodes 12 and 20:tre.new <- rotate(tre, c(12, 21))# compare the results:par(mfrow=c(1,2)) # split graphical deviceplot(tre) # plot old treplot(tre.new) # plot new tree# or you migth just specify tiplabel names:tre.new <- rotate(tre, c("t3", "t14"))# compare the results:par(mfrow=c(1,2)) # devide graphical deviceplot(tre) # plot old treplot(tre.new) # plot new tree# a simple example for rotateConstr:A <- read.tree(text = "((A,B),(C,D));")B <- read.tree(text = "(((D,C),B),A);")B <- rotateConstr(B, A$tip.label)plot(A); plot(B, d = "l")# something more interesting (from ?cophyloplot):tr1 <- rtree(40)## drop 20 randomly chosen tips:tr2 <- drop.tip(tr1, sample(tr1$tip.label, size = 20))## rotate the root and reorder the whole:tr2 <- rotate(tr2, 21)tr2 <- read.tree(text = write.tree(tr2))X <- cbind(tr2$tip.label, tr2$tip.label) # association matrixcophyloplot(tr1, tr2, assoc = X, space = 28)## before reordering tr2 we have to find the constraint:co <- tr2$tip.label[order(match(tr2$tip.label, tr1$tip.label))]newtr2 <- rotateConstr(tr2, co)cophyloplot(tr1, newtr2, assoc = X, space = 28)

Generate Random Trees

Description

These functions generate trees by splitting randomly the edges(rtree andrtopology) or randomly clustering the tips(rcoal).rtree andrtopology generate generaltrees, andrcoal generates coalescent trees. The algorithms aredescribed in Paradis (2012) and in a vignette in this package.

Usage

rtree(n, rooted = TRUE, tip.label = NULL, br = runif, equiprob = FALSE, ...)rtopology(n, rooted = FALSE, tip.label = NULL, br = runif, ...)rcoal(n, tip.label = NULL, br = "coalescent", ...)rmtree(N, n, rooted = TRUE, tip.label = NULL, br = runif,       equiprob = FALSE, ...)rmtopology(N, n, rooted = FALSE, tip.label = NULL, br = runif, ...)

Arguments

n

an integer giving the number of tips in the tree.

rooted

a logical indicating whether the tree should be rooted(the default).

tip.label

a character vector giving the tip labels; if notspecified, the tips "t1", "t2", ..., are given.

br

one of the following: (i) anR function used to generate thebranch lengths (rtree; useNULL to simulate only atopology), or the coalescence times (rcoal); (ii) a characterto simulate a genuine coalescent tree forrcoal (thedefault); or (iii) a numeric vector for the branch lengths or thecoalescence times.

equiprob

(new sinceape 5.4-1) a logical specifyingwhether topologies are generated in equal frequencies. If,FALSE, the unbalanced topologies are generated in higherproportions than the balanced ones.

...

further argument(s) to be passed tobr.

N

an integer giving the number of trees to generate.

Details

The trees generated are bifurcating. Ifrooted = FALSE in(rtree), the tree is trifurcating at its root.

The optionequiprob = TRUE generatesunlabelledtopologies in equal frequencies. This is more complicated for thelabelled topologies (see the vignette “RandomTopologies”).

The default function to generate branch lengths inrtree isrunif. If further arguments are passed tobr, they needto be tagged (e.g.,min = 0, max = 10).

rmtree calls successivelyrtree and set the class ofthe returned object appropriately.

Value

An object of class"phylo" or of class"multiPhylo" inthe case ofrmtree orrmtopology.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2012)Analysis of Phylogenetics and Evolution withR (Second Edition). New York: Springer.

Examples

layout(matrix(1:9, 3, 3))### Nine random trees:for (i in 1:9) plot(rtree(20))### Nine random cladograms:for (i in 1:9) plot(rtree(20, FALSE), type = "c")### generate 4 random trees of bird orders:data(bird.orders)layout(matrix(1:4, 2, 2))for (i in 1:4)  plot(rcoal(23, tip.label = bird.orders$tip.label), no.margin = TRUE)layout(1)par(mar = c(5, 4, 4, 2))

Root a Tree by Root-to-Tip Regression

Description

This function roots a phylogenetic tree with dated tips in the locationmost compatible with the assumption of a strict molecular clock.

Usage

rtt(t, tip.dates, ncpu = 1, objective = correlation,    opt.tol = .Machine$double.eps^0.25)

Arguments

t

an object of class"phylo".

tip.dates

a vector of sampling times associated to the tips oft, in the same order ast$tip.label.

ncpu

number of cores to use.

objective

one of"correlation","rms", or"rsquared".

opt.tol

tolerance for optimization precision.

Details

This function duplicates one part the functionality of the programPath-O-Gen (see references). The root position is chosen to producethe best linear regression of root-to-tip distances against samplingtimes.

t must have branch lengths in units of expected substitutionsper site.

tip.dates should be a vector of sampling times, in any timeunit, with time increasing toward the present. For example, this maybe in units of “days since study start” or “years since 10,000BCE”, but not “millions of yearsago”.

Settingncpu to a value larger than 1 requires theparallellibrary.

objective is the measure which will be used to define the“goodness” of a regression fit. It may be one of"correlation"(strongest correlation between tip date and distance from root),"rms" (lowest root-mean-squared error), or"rsquared"(highest R-squared value).

opt.tol is used to optimize the location of the root along the bestbranch. By default, R'soptimize function uses a precision of.Machine$double.eps^0.25, which is about 0.0001 on a 64-bit system.This should be set to a smaller value if the branch lengths oft arevery short.

Value

an object of class"phylo".

Note

This function only chooses the best root. It does not rescale the branchlengths to time, or perform a statistical test of the molecular clockhypothesis.

Author(s)

Rosemary McCloskeyrmccloskey@cfenet.ubc.ca,Emmanuel Paradis

References

Rambaut, A. (2009). Path-O-Gen: temporal signal investigationtool.

Rambaut, A. (2000). Estimating the rate of molecular evolution:incorporating non-contemporaneous sequences into maximum likelihoodphylogenies.Bioinformatics,16, 395-399.

Examples

t <- rtree(100)tip.date <- rnorm(t$tip.label)^2rtt(t, tip.date)

Find Segregating Sites in DNA Sequences

Description

This function gives the indices of segregating (polymorphic) sites ina sample of DNA sequences.

Usage

seg.sites(x, strict = FALSE, trailingGapsAsN = TRUE)

Arguments

x

a matrix or a list which contains the DNA sequences.

strict

a logical value; ifTRUE, ambiguities and gaps inthe sequences are not interpreted in the usual way.

trailingGapsAsN

a logical value; ifTRUE (the default),the leading and trailing alignment gaps are considered as unknownbases (i.e., N).

Details

If the sequences are in a list, they must all be of the same length.

Ifstrict = FALSE (the default), the following rule is used todetermine if a site is polymorphic or not in the presence of ambiguousbases: ‘A’ and ‘R’ are not interpreted as different, ‘A’ and ‘Y’ areinterpreted as different, and ‘N’ and any other base (ambiguous ornot) are interpreted as not different. Ifstrict = TRUE, allletters are considered different.

Alignment gaps are considered different from all letters except forthe leading and trailing gaps iftrailingGapsAsN = TRUE (whichis the default).

Value

A numeric (integer) vector giving the indices of the segregatingsites.

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)y <- seg.sites(woodmouse)ylength(y)

Skyline Plot Estimate of Effective Population Size

Description

skyline computes thegeneralized skyline plot estimate of effective population sizefrom an estimated phylogeny. The demographic history is approximated bya step-function. The number of parameters of the skyline plot (i.e. its smoothness)is controlled by a parameterepsilon.

find.skyline.epsilon searches for an optimal value of theepsilon parameter,i.e. the value that maximizes the AICc-corrected log-likelihood (logL.AICc).

Usage

skyline(x, ...)## S3 method for class 'phylo'skyline(x, ...)## S3 method for class 'coalescentIntervals'skyline(x, epsilon=0, ...)## S3 method for class 'collapsedIntervals'skyline(x, old.style=FALSE, ...)find.skyline.epsilon(ci, GRID=1000, MINEPS=1e-6, ...)

Arguments

x

Either an ultrametric tree (i.e. an object of class"phylo"), or coalescent intervals (i.e. an object of class"coalescentIntervals"), or collapsed coalescent intervals(i.e. an object of class"collapsedIntervals").

epsilon

collapsing parameter that controls the amount of smoothing(allowed range: from0 toci$total.depth, default value: 0). This is the same parameter as incollapsed.intervals.

old.style

Parameter to choose between two slightly different variants of thegeneralized skyline plot (Strimmer and Pybus, pers. comm.). The default valueFALSE isrecommended.

ci

coalescent intervals (i.e. an object of class"coalescentIntervals")

GRID

Parameter for the grid search forepsilon infind.skyline.epsilon.

MINEPS

Parameter for the grid search forepsilon infind.skyline.epsilon.

...

Any of the above parameters.

Details

skyline implements thegeneralized skyline plot introduced inStrimmer and Pybus (2001). Forepsilon = 0 thegeneralized skyline plot degenerates to theclassic skyline plot described inPybus et al. (2000). The latter is in turn directly related to lineage-through-time plots(Nee et al., 1995).

Value

skyline returns an object of class"skyline" with the following entries:

time

A vector with the time at the end of each coalescentinterval (i.e. the accumulated interval lengths from the beginning of the first intervalto the end of an interval)

interval.length

A vector with the length of each interval.

population.size

A vector with the effective population size of each interval.

parameter.count

Number of free parameters in the skyline plot.

epsilon

The value of the underlying smoothing parameter.

logL

Log-likelihood of skyline plot (see Strimmer and Pybus, 2001).

logL.AICc

AICc corrected log-likelihood (see Strimmer and Pybus, 2001).

find.skyline.epsilon returns the value of theepsilon parameterthat maximizeslogL.AICc.

Author(s)

Korbinian Strimmer

References

Strimmer, K. and Pybus, O. G. (2001) Exploring the demographic historyof DNA sequences using the generalized skyline plot.MolecularBiology and Evolution,18, 2298–2305.

Pybus, O. G, Rambaut, A. and Harvey, P. H. (2000) An integratedframework for the inference of viral population history fromreconstructed genealogies.Genetics,155, 1429–1437.

Examples

# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree# corresponding coalescent intervalsci <- coalescent.intervals(tree.hiv) # from tree# collapsed intervalscl1 <- collapsed.intervals(ci,0)cl2 <- collapsed.intervals(ci,0.0119)#### classic skyline plot ####sk1 <- skyline(cl1)        # from collapsed intervals sk1 <- skyline(ci)         # from coalescent intervalssk1 <- skyline(tree.hiv)   # from treesk1plot(skyline(tree.hiv))skylineplot(tree.hiv) # shortcutplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997)#### generalized skyline plot ####sk2 <- skyline(cl2)              # from collapsed intervalssk2 <- skyline(ci, 0.0119)       # from coalescent intervalssk2 <- skyline(tree.hiv, 0.0119) # from treesk2plot(sk2)# classic and generalized skyline plot together in one plotplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2,  show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)# find optimal epsilon parameter using AICc criterionfind.skyline.epsilon(ci)sk3 <- skyline(ci, -1) # negative epsilon also triggers estimation of epsilonsk3$epsilon

Drawing Skyline Plot Graphs

Description

These functions provide various ways to drawskyline plot graphson the current graphical device. Note thatskylineplot(z, ...) is simplya shortcut forplot(skyline(z, ...)).The skyline plot itself is an estimate of effective population size through time,and is computed using the functionskyline.

Usage

## S3 method for class 'skyline'plot(x, show.years=FALSE, subst.rate, present.year, ...)## S3 method for class 'skyline'lines(x, show.years=FALSE, subst.rate, present.year, ...)skylineplot(z, ...)skylineplot.deluxe(tree, ...)

Arguments

x

skyline plot data (i.e. an object of class"skyline").

z

Either an ultrametric tree (i.e. an object of class"phylo"),or coalescent intervals (i.e. an object of class"coalescentIntervals"), orcollapsed coalescent intervals (i.e. an object of class"collapsedIntervals").

tree

ultrametric tree (i.e. an object of class"phylo").

show.years

option that determines whether the time is plotted in units ofof substitutions (default) or in years (requires specification of substution rateand year of present).

subst.rate

substitution rate (see option show.years).

present.year

present year (see option show.years).

...

further arguments to be passed on toskyline() andplot().

Details

Seeskyline for more details (incl. references) about the skyline plot method.

Author(s)

Korbinian Strimmer

Examples

# get treedata("hivtree.newick") # example tree in NH formattree.hiv <- read.tree(text = hivtree.newick) # load tree#### classic skyline plotskylineplot(tree.hiv) # shortcut#### plot classic and generalized skyline plots and estimate epsilonsk.opt <- skylineplot.deluxe(tree.hiv)sk.opt$epsilon#### classic and generalized skyline plot ####sk1 <- skyline(tree.hiv)sk2 <- skyline(tree.hiv, 0.0119)# use years rather than substitutions as unit for the time axisplot(sk1, show.years=TRUE, subst.rate=0.0023, present.year = 1997, col=c(grey(.8),1))lines(sk2,  show.years=TRUE, subst.rate=0.0023, present.year = 1997)legend(.15,500, c("classic", "generalized"), col=c(grey(.8),1),lty=1)#### various skyline plots for different epsilonslayout(mat= matrix(1:6,2,3,byrow=TRUE))ci <- coalescent.intervals(tree.hiv)plot(skyline(ci, 0.0));title(main="0.0")plot(skyline(ci, 0.007));title(main="0.007")plot(skyline(ci, 0.0119),col=4);title(main="0.0119")plot(skyline(ci, 0.02));title(main="0.02")plot(skyline(ci, 0.05));title(main="0.05")plot(skyline(ci, 0.1));title(main="0.1")layout(mat= matrix(1:1,1,1,byrow=TRUE))

Slowinski-Guyer Test of Homogeneous Diversification

Description

This function performs the Slowinski–Guyer test that a trait orvariable does not increase diversification rate.

Usage

slowinskiguyer.test(x, detail = FALSE)

Arguments

x

a matrix or a data frame with at least two columns: the firstone gives the number of species in clades with a trait supposed toincrease diversification rate, and the second one the number ofspecies in the corresponding sister-clade without the trait. Eachrow represents a pair of sister-clades.

detail

ifTRUE, the individual P-values are appended.

Details

The Slowinski–Guyer test compares a series of sister-clades where oneof the two is characterized by a trait supposed to increasediversification rate. The null hypothesis is that the trait does notaffect diversification. If the trait decreased diversification rate,then the null hypothesis cannot be rejected.

The present function has mainly a historical interest. TheSlowinski–Guyer test generally performs poorly: see Paradis (2012)alternatives and the functions cited below.

Value

a data frame with the\chi^2, the number of degrees offreedom, and theP-value. Ifdetail = TRUE, a list isreturned with the data frame and a vector of individualP-values for each pair of sister-clades.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2012) Shift in diversification in sister-cladecomparisons: a more powerful test.Evolution,66,288–295.

Slowinski, J. B. and Guyer, C. (1993) Testing whether certain traitshave caused amplified diversification: an improved method based on amodel of random speciation and extinction.American Naturalist,142, 1019–1024.

Examples

### from Table 1 in Slowinski and Guyer(1993):viviparous <- c(98, 8, 193, 36, 7, 128, 2, 3, 23, 70)oviparous <- c(234, 17, 100, 4, 1, 12, 6, 1, 481, 11)x <- data.frame(viviparous, oviparous)slowinskiguyer.test(x, TRUE) # 'P ~ 0.32' in the paperxalt <- xxalt[3, 2] <- 1slowinskiguyer.test(xalt)

Solve Ambiguous Bases in DNA Sequences

Description

Replaces ambiguous bases in DNA sequences (R, Y, W, ...) by A, G, C,or T.

Usage

solveAmbiguousBases(x, method = "columnwise", random = TRUE)

Arguments

x

a matrix of class"DNAbin"; a list is accepted and isconverted into a matrix.

method

the method used (no other choice than the default forthe moment; see details).

random

a logical value (see details).

Details

The replacements of ambiguous bases are done columwise. First, thebase frequencies are counted: if no ambiguous base is found in thecolumn, nothing is done. By default (i.e., ifrandom = TRUE),the replacements are done by random sampling using the frequencies ofthe observed compatible, non-ambiguous bases. For instance, if theambiguous base is Y, it is replaced by either C or T using theirobserved frequencies as probabilities. Ifrandom = FALSE, thegreatest of these frequencies is used. If there are no compatiblebases in the column, equal probabilities are used. For instance, ifthe ambiguous base is R, and only C and T are observed, then it isreplaced by either A or G with equal probabilities.

Alignment gaps are not changed; see the functionlatag2nto change the leading and trailing gaps.

Value

a matrix of class"DNAbin".

Author(s)

Emmanuel Paradis

Examples

X <- as.DNAbin(matrix(c("A", "G", "G", "R"), ncol = 1))alview(solveAmbiguousBases(X)) # R replaced by either A or Galview(solveAmbiguousBases(X, random = FALSE)) # R always replaced by G

Species Tree Estimation

Description

This function calculates the species tree from a set of gene trees.

Usage

speciesTree(x, FUN = min)

Arguments

x

a list of trees, e.g., an object of class"multiPhylo".

FUN

a function used to compute the divergence times of eachpair of tips.

Details

For all trees inx, the divergence time of each pair of tips iscalculated: these are then ‘summarized’ withFUN to build a newdistance matrix used to calculate the species tree with asingle-linkage hierarchical clustering. The default forFUNcomputes the maximum tree (maxtree) of Liu et al. (2010). UsingFUN = mean gives the shallowest divergence tree of Maddison andKnowles (2006).

Value

an object of class"phylo".

Author(s)

Emmanuel Paradis

References

Liu, L., Yu, L. and Pearl, D. K. (2010) Maximum tree: a consistentestimator of the species tree.Journal of Mathematical Biology,60, 95–106.

Maddison, W. P. and Knowles, L. L. (2006) Inferring phylogeny despiteincomplete lineage sorting.Systematic Biology,55, 21–30.

Examples

### example in Liu et al. (2010):tr1 <- read.tree(text = "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);")tr2 <- read.tree(text = "(((A:0.07,C:0.07):0.02,D:0.09):0.03,B:0.12);")TR <- c(tr1, tr2)TSmax <- speciesTree(TR) # MAXTREETSsha <- speciesTree(TR, mean) # shallowest divergencekronoviz(c(tr1, tr2, TSmax, TSsha), horiz = FALSE,         type = "c", cex = 1.5, font = 1)mtext(c("Gene tree 1", "Gene tree 2", "Species tree - MAXTREE"),      at = -c(7.5, 4, 1))mtext("Species tree - Shallowest Divergence")layout(1)

Generates Systematic Regular Trees

Description

This function generates trees with regular shapes.

Usage

stree(n, type = "star", tip.label = NULL)

Arguments

n

an integer giving the number of tips in the tree.

type

a character string specifying the type of tree togenerate; four choices are possible:"star","balanced","left","right", or any unambiguousabbreviation of these.

tip.label

a character vector giving the tip labels; if notspecified, the tips "t1", "t2", ..., are given.

Details

The types of trees generated are:

“star”: a star (or comb) tree with a single internal node.
“balanced”: a fully balanced dichotomous rooted tree;n must be a power of 2 (2, 4, 8, ...).
“left”: a fully unbalanced rooted tree where the largestclade is on the left-hand side when the tree is plotted upwards.
“right”: same than above but in the other direction.

Value

An object of class"phylo".

Author(s)

Emmanuel Paradis

Examples

layout(matrix(1:4, 2, 2))plot(stree(100))plot(stree(128, "balanced"))plot(stree(100, "left"))plot(stree(100, "right"))

Zoom on a Portion of a Phylogeny by Successive Clicks

Description

This function plots simultaneously a whole phylogenetic tree(supposedly large) and a portion of it determined by clicking on the nodes of the phylogeny. On exit, returns the last subtree visualized.

Usage

subtreeplot(x, wait=FALSE, ...)

Arguments

x

an object of class"phylo".

wait

a logical indicating whether the node beeing processed should be printed (useful for big phylogenies).

...

further arguments passed toplot.phylo.

Details

This function aims at easily exploring very large trees. The main argument isa phylogenetic tree, and the second one is a logical indicating whether a waiting message should be printed while the calculation is being processed.

The whole tree is plotted on the left-hand side in half of the device. Thesubtree is plotted on the right-hand side in the other half. The user clicks on the nodes in the complete tree and the subtree corresponding to this node is ploted in the right-hand side. There is no limit for the number of clicks that can be done. On exit, the subtree on the right hand side is returned.

To use a subtree as the new tree in which to zoom, the user has to use the function many times. This can however be done in a single command line (see example 2).

Author(s)

Damien de Viennedamien.de-vienne@u-psud.fr

Examples

## Not run: #example 1: simpletree1 <- rtree(50)tree2 <- subtreeplot(tree1, wait = TRUE) # on exit, tree2 will be a subtree of tree1#example 2: more than one zoomtree1 <- rtree(60)tree2 <- subtreeplot(subtreeplot(subtreeplot(tree1))) # allow three succssive zooms## End(Not run)

All subtrees of a Phylogenetic Tree

Description

This function returns a list of all the subtrees of a phylogenetic tree.

Usage

subtrees(tree, wait=FALSE)

Arguments

tree

an object of class"phylo".

wait

a logical indicating whether the node beeing processed should be printed (useful for big phylogenies).

Value

subtrees returns a list of trees of class"phylo" andreturns invisibly for each subtree a list with the followingcomponents:

tip.label

node.label

Ntip

Nnode

Author(s)

Damien de Viennedamien.de-vienne@u-psud.fr

Examples

### Random tree with 12 leavesphy<-rtree(12)par(mfrow=c(4,3))plot(phy, sub="Complete tree")### Extract the subtreesl<-subtrees(phy)### plot all the subtreesfor (i in 1:11) plot(l[[i]], sub=paste("Node", l[[i]]$node.label[1]))par(mfrow=c(1,1))

Print Summary of a Phylogeny

Description

The first function prints a compact summary of a phylogenetic tree (anobject of class"phylo"). The three other functions return thenumber of tips, nodes, or edges, respectively.

Usage

## S3 method for class 'phylo'summary(object, ...)Ntip(phy)## S3 method for class 'phylo'Ntip(phy)## S3 method for class 'multiPhylo'Ntip(phy)Nnode(phy, ...)## S3 method for class 'phylo'Nnode(phy, internal.only = TRUE, ...)## S3 method for class 'multiPhylo'Nnode(phy, internal.only = TRUE, ...)Nedge(phy)## S3 method for class 'phylo'Nedge(phy)## S3 method for class 'multiPhylo'Nedge(phy)

Arguments

object,phy

an object of class"phylo" or"multiPhylo".

...

further arguments passed to or from other methods.

internal.only

a logical indicating whether to return the numberof internal nodes only (the default), or of internal and terminal(tips) nodes (ifFALSE).

Details

The summary includes the numbers of tips and of nodes, summarystatistics of the branch lengths (if they are available) with mean,variance, minimum, first quartile, median, third quartile, andmaximum, listing of the first ten tip labels, and (if available) ofthe first ten node labels. It is also printed whether some of theseoptional elements (branch lengths, node labels, and root edge) are notfound in the tree.

summary simply prints its results on the standard output and isnot meant for programming.

Value

A NULL value in the case ofsummary, a single numeric value forthe three other functions.

Author(s)

Emmanuel Paradis

Examples

data(bird.families)summary(bird.families)Ntip(bird.families)Nnode(bird.families)Nedge(bird.families)

Translation from DNA to Amino Acid Sequences

Description

trans translates DNA sequences into amino acids.complement returns the (reverse) complement sequences.

Usage

trans(x, code = 1, codonstart = 1)complement(x)

Arguments

x

an object of class"DNAbin" (vector, matrix or list).

code

an integer value giving the genetic code to beused. Currently only the genetic codes 1 to 6 are supported.

codonstart

an integer giving where to start the translation. Thisshould be 1, 2, or 3, but larger values are accepted and have foreffect to start the translation further towards the 3'-end of the sequence.

Details

Withtrans, if the sequence length is not a multiple of three,a warning message is printed. Alignment gaps are simply ignored (i.e.,AG- returnsX with no special warning or message). Baseambiguities are taken into account where relevant: for instance,GGN,GGA,GGR, etc, all returnG.

See the link given in the References for details about the taxonomiccoverage and alternative codons of each code.

Value

an object of class"AAbin" or"DNAbin", respectively.

Note

These functions are equivalent totranslate andcomp inthe packageseqinr with the difference that there is no need toconvert the sequences into character strings.

Author(s)

Emmanuel Paradis

References

https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=cgencodes

Examples

data(woodmouse)X <- trans(woodmouse) # not correctX2 <- trans(woodmouse, 2) # using the correct codeidentical(X, X2)alview(X[1:2, 1:60]) # some 'Stop' codons (*)alview(X2[, 1:60])X2

Tree Popping

Description

Method for reconstructing phylogenetic trees from an object of classsplits using tree popping.

Usage

treePop(obj)

Arguments

obj

an object of class"bitsplit".

Value

an object of class "phylo" which displays all the splitsin the input object.

Author(s)

Andrei Popescu

Tree Explorer With Multiple Devices

Description

This function requires a plotted tree: the user is invited to clickclose to a node and the corresponding subtree (or clade) is plotted ona new window.

Usage

trex(phy, title = TRUE, subbg = "lightyellow3",     return.tree = FALSE, ...)

Arguments

phy

an object of class"phylo".

title

a logical or a character string (see details).

subbg

a character string giving the background colour for thesubtree.

return.tree

a logical: ifTRUE, the subtree is returnedafter being plotted and the operation is stopped.

...

further arguments to pass toplot.phylo.

Details

This function works with a tree (freshly) plotted on an interactivegraphical device (i.e., not a file). After callingtrex, theuser clicks close to a node of the tree, then the clade from this nodeis plotted on anew window. The user can click as many times onthe main tree: the clades are plotted successively on thesamenew window. The process is stopped by a right-click. If the user clickstoo close to the tips, a message “Try again!” is printed.

Each timetrex is called, the subtree is plotted on a newwindow without closing or deleting those possibly alreadyplotted. They may be distinguished with the optionstitleand/orsubbg.

In all cases, the device wherephy is plotted is the activewindow after the operation. It shouldnot be closed during thewhole process.

Iftitle = TRUE, a default title is printed on the new windowusing the node label, or the node number if there are no node labelsin the tree. Iftitle = FALSE, no title is printed. Iftitle is a character string, it is used for the title.

Value

an object of class"phylo" ifreturn.tree = TRUE

Author(s)

Emmanuel Paradis

Examples

## Not run: tr <- rcoal(1000)plot(tr, show.tip.label = FALSE)trex(tr) # left-click as many times as you want, then right-clicktr <- makeNodeLabel(tr)trex(tr, subbg = "lightgreen") # id.## generate a random colour with control on the darkness:rRGB <- function(a, b)    rgb(runif(1, a, b), runif(1, a, b), runif(1, a, b))### with a random pale background:trex(tr, subbg = rRGB(0.8, 1))## the above can be called many times...graphics.off() # close all graphical devices## End(Not run)

Tree Reconstruction Based on the Triangles Method

Description

Fast distance-based construction method. Should only be used whendistance measures are fairly reliable.

Usage

triangMtd(X)triangMtds(X)

Arguments

X

a distance matrix

Value

an object of class"phylo".

Author(s)

Andrei Popescu

References

http://archive.numdam.org/ARCHIVE/RO/RO_2001__35_2/RO_2001__35_2_283_0/RO_2001__35_2_283_0.pdf

Examples

data(woodmouse)tr <- triangMtd(dist.dna(woodmouse))plot(tr)

Revomes Duplicate Trees

Description

This function scans a list of trees, and returns a list with theduplicate trees removed. By default the labelled topologies arecompared.

Usage

## S3 method for class 'multiPhylo'unique(x, incomparables = FALSE,        use.edge.length = FALSE,        use.tip.label = TRUE, ...)

Arguments

x

an object of class"multiPhylo".

incomparables

unused (for compatibility with the generic).

use.edge.length

a logical specifying whether to consider the edgelengths in the comparisons; the default isFALSE.

use.tip.label

a logical specifying whether to consider the tiplabels in the comparisons; the default isTRUE.

...

further arguments passed to or from other methods.

Value

an object of class"multiPhylo" with an attribute"old.index" indicating which trees of the original list aresimilar (the tree of smaller index is taken as reference).

Author(s)

Emmanuel Paradis

Examples

TR <- rmtree(50, 4)length(unique(TR)) # not always 15...howmanytrees(4)

Update Labels

Description

This function changes labels (names or rownames) giving two vectors (old andnew). It is a generic function with several methods as described below.

Usage

updateLabel(x, old, new, ...)## S3 method for class 'character'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'DNAbin'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'AAbin'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'phylo'updateLabel(x, old, new, exact = TRUE, nodes = FALSE, ...)## S3 method for class 'evonet'updateLabel(x, old, new, exact = TRUE, nodes = FALSE, ...)## S3 method for class 'data.frame'updateLabel(x, old, new, exact = TRUE, ...)## S3 method for class 'matrix'updateLabel(x, old, new, exact = TRUE, ...)

Arguments

x

an object where to change the labels.

old,new

two vectors of mode character (must be of the same length).

exact

a logical value (see details).

nodes

a logical value specifying whether to also update the node labels of the tree or network.

...

further arguments passed to and from methods.

Details

This function can be used to change some of the labels (see examples) or all of them if their ordering is not sure.

Ifexact = TRUE (the default), the values inold are matched exactly with the labels; otherwise (exact = FALSE), the values inold are considered as regular expressions and searched in the labels withgrep.

Value

an object of the same class thanx.

Author(s)

Emmanuel Paradis

Examples

## Not run: ## the tree by Nyakatura & Bininda-Emonds (2012, BMC Biology)x <- "https://static-content.springer.com/esm/art"y <- "3A10.1186"z <- "2F1741-7007-10-12/MediaObjects/12915_2011_534_MOESM5_ESM.NEX"## The commande below may not print correctly in HTML because of the## percentage symbol; see the text or PDF help page.url <- paste(x, y, z, sep = "TC <- read.nexus(url)tr <- TC$carnivoreST_bestEstimateold <- c("Uncia_uncia", "Felis_manul", "Leopardus_jacobitus")new <- c("Panthera_uncia", "Otocolobus_manul", "Leopardus_jacobita")tr.updated <- updateLabel(tr, old, new)## End(Not run)tr <- rtree(6)## the order of the labels are randomized by this functionold <- paste0("t", 1:6)new <- paste0("x", 1:6)updateLabel(tr, old, new)tr

Variance Components with Orthonormal Contrasts

Description

This function calls Phylip's contrast program and returns thephylogenetic and phenotypic variance-covariance components for one orseveral traits. There can be several observations per species.

Usage

varCompPhylip(x, phy, exec = NULL)

Arguments

x

a numeric vector, a matrix (or data frame), or a list.

phy

an object of class"phylo".

exec

a character string giving the name of the executablecontrast program (see details).

Details

The datax can be in several forms: (i) a numeric vector ifthere is single trait and one observation per species; (ii) amatrix or data frame if there are several traits (as columns) and asingle observation of each trait for each species; (iii) a list ofvectors if there is a single trait and several observations perspecies; (iv) a list of matrices or data frames: same than (ii) butwith several traits and the rows are individuals.

Ifx has names, its values are matched to the tip labels ofphy, otherwise its values are taken to be in the same orderthan the tip labels ofphy.

Phylip (version 3.68 or higher) must be accessible on your computer. Ifyou have a Unix-like operating system, the executable name is assumedto be"phylip contrast" (as in Debian); otherwise it is setto"contrast". If this doesn't suit your system, use theoptionexec accordingly. If the executable is not in the path, youmay need to specify it, e.g.,exec = "C:/Program Files/Phylip/contrast".

Value

a list with elementsvarA andvarE with the phylogenetic(additive) and phenotypic (environmental) variance-covariancematrices. If a single trait is analyzed, these contains its variances.

Author(s)

Emmanuel Paradis

References

Felsenstein, J. (2004) Phylip (Phylogeny Inference Package) version3.68. Department of Genetics, University of Washington, Seattle, USA.http://evolution.genetics.washington.edu/phylip/phylip.html.

Felsenstein, J. (2008) Comparative methods with sampling error andwithin-species variation: Contrasts revisited and revised.American Naturalist,171, 713–725.

Examples

## Not run: tr <- rcoal(30)### Five traits, one observation per species:x <- replicate(5, rTraitCont(tr, sigma = 1))varCompPhylip(x, tr) # varE is smallx <- replicate(5, rnorm(30))varCompPhylip(x, tr) # varE is large### Five traits, ten observations per species:x <- replicate(30, replicate(5, rnorm(10)), simplify = FALSE)varCompPhylip(x, tr)## End(Not run)

Compute Variance Component Estimates

Description

Get variance component estimates from a fittedlme object.

Usage

varcomp(x, scale = FALSE, cum = FALSE)

Arguments

x

A fittedlme object

scale

Scale all variance so that they sum to 1

cum

Send cumulative variance components.

Details

Variance computations is done as in Venables and Ripley (2002).

Value

A named vector of classvarcomp with estimated variance components.

Author(s)

Julien Dutheildutheil@evolbio.mpg.de

References

Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statisticswith S (Fourth Edition). New York: Springer-Verlag.

Examples

data(carnivora)library(nlme)m <- lme(log10(SW) ~ 1, random = ~ 1|Order/SuperFamily/Family/Genus, data=carnivora)v <- varcomp(m, TRUE, TRUE)plot(v)

Phylogenetic Variance-covariance or Correlation Matrix

Description

This function computes the expected variances and covariances of acontinuous trait assuming it evolves under a given model.

This is a generic function with methods for objects of class"phylo" and"corPhyl".

Usage

vcv(phy, ...)## S3 method for class 'phylo'vcv(phy, model = "Brownian", corr = FALSE, ...)## S3 method for class 'corPhyl'vcv(phy, corr = FALSE, ...)

Arguments

phy

an object of the correct class (see above).

model

a character giving the model used to compute thevariances and covariances; only"Brownian" is available (forother models, a correlation structure may be used).

corr

a logical indicating whether the correlation matrix shouldbe returned (TRUE); by default the variance-covariance matrixis returned (FALSE).

...

further arguments to be passed to or from other methods.

Value

a numeric matrix with the names of the tips as colnames and rownames.

Note

Do not confuse this function withvcov whichcomputes the variance-covariance matrix among parameters of a fittedmodel object.

Author(s)

Emmanuel Paradis

References

Garland, T. Jr. and Ives, A. R. (2000) Using the past to predict thepresent: confidence intervals for regression equations in phylogeneticcomparative methods.American Naturalist,155, 346–364.

Examples

tr <- rtree(5)## all are the same:vcv(tr)vcv(corBrownian(1, tr))vcv(corPagel(1, tr))

Variance-Covariance Matrix to Tree

Description

This function transforms a variance-covariance matrix into aphylogenetic tree.

Usage

vcv2phylo(mat, tolerance = 1e-7)

Arguments

mat

a square symmetric (positive-definite) matrix.

tolerance

the numeric tolerance used to compare the branchlengths.

Details

The function tests if the matrix is symmetric and positive-definite(i.e., all its eigenvalues positive within the specified tolerance).

Value

an object of class"phylo".

Author(s)

Simon Blomberg

Examples

tr <- rtree(10)V <- vcv(tr) # VCV matrix assuming Brownian motionz <- vcv2phylo(V)identical(tr, z) # FALSEall.equal(tr, z) # TRUE

Define Similarity Matrix

Description

weight.taxo computes a matrix whose entries [i, j] are set to 1if x[i] == x[j], 0 otherwise.

weight.taxo2 computes a matrix whose entries [i, j] are set to 1if x[i] == x[j] AND y[i] != y[j], 0 otherwise.

The diagonal [i, i] is always set to 0.

The returned matrix can be used as a weight matrix inMoran.I.x andy may be vectors offactors.

See further details invignette("MoranI").

Usage

  weight.taxo(x)  weight.taxo2(x, y)

Arguments

x,y

a vector or a factor.

Value

a square numeric matrix.

Author(s)

Emmanuel Paradis

Find Patterns in DNA Sequences

Description

This function finds patterns in a single or a set of DNA or AA sequences.

Usage

where(x, pattern)

Arguments

x

an object inheriting the class either"DNAbin" or"AAbin".

pattern

a character string to be searched inx.

Details

Ifx is a vector, the function returns a single vector givingthe position(s) where the pattern was found. Ifx is a matrixor a list, it returns a list with the positions of the pattern foreach sequence.

Patterns may be overlapping. For instance, ifpattern = "tata"and the sequence starts with ‘tatata’, then the output will be c(1, 3).

Value

a vector of integers or a list of such vectors.

Author(s)

Emmanuel Paradis

Examples

data(woodmouse)where(woodmouse, "tata")## with AA sequences:x <- trans(woodmouse, 2)where(x, "irk")

Identifies Edges of a Tree

Description

This function identifies the edges that belong to a group (possiblynon-monophyletic) specified as a set of tips.

Usage

which.edge(phy, group)

Arguments

phy

an object of class"phylo".

group

a vector of mode numeric or character specifying the tipsfor which the edges are to be identified.

Details

The group of tips specified in ‘group’ may be non-monophyletic(paraphyletic or polyphyletic), in which case all edges from the tipsto their most recent common ancestor are identified.

The identification is made with the indices of the rows of the matrix‘edge’ of the tree.

Value

a numeric vector.

Author(s)

Emmanuel Paradis

Cytochrome b Gene Sequences of Woodmice

Description

This is a set of 15 sequences of the mitochondrial gene cytochromeb of the woodmouse (Apodemus sylvaticus) which is asubset of the data analysed by Michaux et al. (2003). The full dataset is available through GenBank (accession numbers AJ511877 toAJ511987).

Usage

data(woodmouse)

Format

An object of class"DNAbin".

Source

Michaux, J. R., Magnanou, E., Paradis, E., Nieberding, C. and Libois,R. (2003) Mitochondrial phylogeography of the Woodmouse(Apodemus sylvaticus) in the Western Palearctic region.Molecular Ecology,12, 685–697.

Examples

data(woodmouse)str(woodmouse)

Write DNA Sequences in a File

Description

These functions write in a file a list of DNA sequences in sequential,interleaved, or FASTA format.write.FASTA can write either DNAor AA sequences.

Usage

write.dna(x, file, format = "interleaved", append = FALSE,          nbcol = 6, colsep = " ", colw = 10, indent = NULL,          blocksep = 1)write.FASTA(x, file, header = NULL, append = FALSE)

Arguments

x

a list or a matrix of DNA sequences, or of AA sequences forwrite.FASTA.

file

a file name specified by either a variable of mode character,or a double-quoted string.

format

a character string specifying the format of the DNAsequences. Three choices are possible:"interleaved","sequential", or"fasta", or any unambiguousabbreviation of these.

append

a logical, ifTRUE the data are appended to thefile without erasing the data possibly existing in the file,otherwise the file (if it exists) is overwritten (FALSE thedefault).

nbcol

a numeric specifying the number of columns per row (6 bydefault); may be negative implying that the nucleotides are printedon a single line.

colsep

a character used to separate the columns (a singlespace by default).

colw

a numeric specifying the number of nucleotides per column(10 by default).

indent

a numeric or a character specifying how the blocks ofnucleotides are indented (see details).

blocksep

a numeric specifying the number of lines between theblocks of nucleotides (this has an effect only if 'format ="interleaved"').

header

a vector of mode character giving the header to bewritten in the FASTA file before the sequences. By default, there isno header.

Details

Three formats are supported in the present function: see the help pageofread.dna and the references below for a description.

If the sequences have no names, then they are given "1", "2", ... aslabels in the file.

With the interleaved and sequential formats, the sequences must be allof the same length. The names of the sequences are not truncated.

The argumentindent specifies how the rows of nucleotides areindented. In the interleaved and sequential formats, the rows withthe taxon names are never indented; the subsequent rows are indentedwith 10 spaces by default (i.e., ifindent = NULL). In the FASTAformat, the rows are not indented by default. This default behaviourcan be modified by specifying a value toindent: the rows are thenindented with “indent” (if it is a character) or ‘indent’ spaces (ifit is a numeric). For example, specifyingindent = " " orindent = 3 will have the same effect (useindent = "\t"for a tabulation).

The different options are intended to give flexibility in formattingthe sequences. For instance, if the sequences are very long it may bejudicious to remove all the spaces beween columns (colsep = ""), inthe margins (indent = 0), and between the blocks (blocksep = 0) toproduce a smaller file.

write.dna(, format = "fasta") can be very slow if the sequencesare long (> 10 kb).write.FASTA is much faster in thissituation but the formatting is not flexible: each sequence is printedon a single line, which is OK for big files that are not intended tobe open with a text editor.

Value

None (invisible ‘NULL’).

Note

Specifying a negative value for ‘nbcol’ (meaning that the nucleotidesare printed on a single line) gives the same output for theinterleaved and sequential formats.

The names of the sequences can be truncated with the functionmakeLabel. In particular, Clustal is limited to 30characters, and PHYML seems limited to 99 characters.

Author(s)

Emmanuel Paradis

References

Anonymous. FASTA format.https://en.wikipedia.org/wiki/FASTA_format

Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version3.5c. Department of Genetics, University of Washington.http://evolution.genetics.washington.edu/phylip/phylip.html

Write Tree File in Nexus Format

Description

This function writes trees in a file with the NEXUS format.

Usage

write.nexus(..., file = "", translate = TRUE, digits = 10)

Arguments

...

either (i) a single object of class"phylo", (ii) aseries of such objects separated by commas, or (iii) a listcontaining such objects.

file

a file name specified by either a variable of mode character,or a double-quoted string; iffile = "" (the default) then thetree is written on the standard output connection.

translate

a logical, ifTRUE (the default) a translationof the tip labels is done which are replaced in the parentheticrepresentation with tokens.

digits

a numeric giving the number of digits used for printingbranch lengths. For negative numbers no branch lengths are printed.

Details

If several trees are given, they must all have the same tip labels.

If among the objects given some are not trees of class"phylo",they are simply skipped and not written in the file.

Seewrite.tree for details on how tip (and node) labelsare checked before being printed.

Value

None (invisible ‘NULL’).

Author(s)

Emmanuel Paradis

References

Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.

Write Character Data in NEXUS Format

Description

This function writes in a file a list of data in the NEXUS format. Thenames of the vectors of the list are used as taxon names.

For the moment, only sequence data (DNA or protein) are supported.

Usage

write.nexus.data(x, file, format = "dna", datablock = TRUE,                 interleaved = TRUE, charsperline = NULL,                 gap = NULL, missing = NULL)

Arguments

x

a matrix or a list of data each made of a single vectorof mode character where each element is a character state (e.g.,“A”, “C”, ...) Objects of class of “DNAbin”are accepted.

file

a file name specified by either a variable of modecharacter, or a double-quoted string.

format

a character string specifying the format of thesequences. Four choices are possible:"dna" (the default)"protein","standard" or"continuous" or anyunambiguous abbreviation of these (case insensitive).

datablock

a logical, ifTRUE the data are written in asingle DATA block. IfFALSE, the data are written in TAXA andCHARACTER blocks. Default isTRUE.

interleaved

a logical, ifTRUE the data is written ininterleaved format with number of characters per line as specifiedwithcharsperline = numerical_value. IfFALSE, thedata are written in sequential format. Default isTRUE.

charsperline

a numeric value specifying the number ofcharacters per line when used withinterleaved = TRUE. Default is 80.

gap

a character specifying the symbol for gap. Default is“-”.

missing

a character specifying the symbol for missingdata. Default is “?”.

Details

If the sequences have no names, then they are given “1”,“2”, ..., as names in the file.

Sequences must be all of the same length.

Value

None (invisible ‘NULL’).

Author(s)

Johan Nylandernylander@scs.fsu.edu and Thomas Guillerme

References

Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: anextensible file format for systematic information.SystematicBiology,46, 590–621.

Examples

## Not run: ## Write interleaved DNA data with 100 characters per line in a DATA blockdata(woodmouse)write.nexus.data(woodmouse, file= "wood.ex.nex", interleaved = TRUE, charsperline = 100)## Write sequential DNA data in TAXA and CHARACTERS blocksdata(cynipids)write.nexus.data(cynipids, file = "cyn.ex.nex", format = "protein",                 datablock = FALSE, interleaved = FALSE)unlink(c("wood.ex.nex", "cyn.ex.nex"))## End(Not run)

Write Tree File in phyloXML Format

Description

This function writes trees to a file of phyloXML format.

Usage

write.phyloXML(phy, file = "", tree.names = FALSE)

Arguments

phy

an object of class"phylo" or"multiPhylo".

file

a file name specified by either a variable of mode character,or a double-quoted string; iffile = "" (the default) then thetree is written on the standard output connection (i.e. the console).

tree.names

either a logical or a vector of mode characterspecifying whether or which tree names should be written to the file.

Details

If several trees are given, they will be represented as multiple<phylogeny> elements. Contrary towrite.nexus, the treesneed not have the same tip labels.

Whentree.names isTRUE, the tree names will be alwaysadded as <name> tags to each phylogeny element. If thephyobject is unnamed, then the names will be automatically generatedfrom the tree indices as "tree<index>" (e.g. tree1, tree2, ...). Iftree.names is a character vector, the specified names will beused instead.

Branch lengths, labels, and rootedness are preserved in the phyloXMLfile.

Value

None (invisibleNULL).

Author(s)

Federico Marotta

References

Han, M. V. and Zmasek, C. M. (2009) phyloXML: XML for evolutionary biology andcomparative genomics.BMC Bioinformatics,10, 356.

Write Tree File in Parenthetic Format

Description

This function writes in a file a tree in parenthetic format using theNewick (also known as New Hampshire) format.

Usage

write.tree(phy, file = "", append = FALSE,           digits = 10, tree.names = FALSE)

Arguments

phy

an object of class"phylo" or"multiPhylo".

file

a file name specified by either a variable of mode character,or a double-quoted string; iffile = "" (the default) then thetree is written on the standard output connection (i.e. the console).

append

a logical, ifTRUE the tree is appended to the filewithout erasing the data possibly existing in the file, otherwisethe file (if it exists) is overwritten (FALSE the default).

digits

a numeric giving the number of (significant) digits usedfor printing branch lengths (see details). For negative numbers nobranch lengths are printed.

tree.names

either a logical or a vector of mode character. IfTRUE then any tree names will be written prior to the tree oneach line. If character, specifies the name of"phylo"objects which can be written to the file.

Details

The node labels and the root edge length, if available, are written inthe file.

Iftree.names == TRUE then a variant of the Newick format iswritten for which the name of a tree precedes the Newick format tree(parentheses are eventually deleted beforehand). The tree names aretaken from thenames attribute if present (they are ignored iftree.names is a character vector).

The tip labels (and the node labels if present) are checked beforebeing printed: the leading and trailing spaces, and the leading leftand trailing right parentheses are deleted; the other spaces arereplaced by underscores; the commas, colons, semicolons, and the otherparentheses are replaced with dashes.

The argumentdigits gives the number ofsignificantdigits (not rounding). For instance, ifdigits = 2 the branchlength 1.234e-7 is printed as 1.23e-7 (not 0).

Value

a vector of mode character iffile = "", none (invisibleNULL) otherwise.

Author(s)

Emmanuel Paradis, Daniel Lawsondan.lawson@bristol.ac.uk, and Klaus Schliepklaus.schliep@gmail.com

References

Felsenstein, J. The Newick tree format.http://evolution.genetics.washington.edu/phylip/newicktree.html

Olsen, G. Interpretation of the "Newick's 8:45" tree format standard.http://evolution.genetics.washington.edu/phylip/newick_doc.html

Fits the Yule Model to a Phylogenetic Tree

Description

This function fits by maximum likelihood a Yule model, i.e., abirth-only model to the branching times computed from a phylogenetictree.

Usage

yule(phy, use.root.edge = FALSE)

Arguments

phy

an object of class"phylo".

use.root.edge

a logical specifying whether to consider the rootedge in the calculations.

Details

The tree must be fully dichotomous.

The maximum likelihood estimate of the speciation rate is obtained bythe ratio of the number of speciation events on the cumulative numberof species through time; these two quantities are obtained with thenumber of nodes in the tree, and the sum of the branch lengths,respectively.

If there is a ‘root.edge’ element in the phylogenetic tree, anduse.root.edge = TRUE, then it is assumed that it has abiological meaning and is counted as a branch length, and the root iscounted as a speciation event; otherwise the number of speciationevents is the number of nodes - 1.

The standard-error of lambda is computed with the second derivative ofthe log-likelihood function.

Value

An object of class "yule" which is a list with the followingcomponents:

lambda

the maximum likelihood estimate of the speciation(birth) rate.

se

the standard-error of lambda.

loglik

the log-likelihood at its maximum.

Author(s)

Emmanuel Paradis

Fits the Yule Model With Covariates

Description

This function fits by maximum likelihood the Yule model withcovariates, that is a birth-only model where speciation rate isdetermined by a generalized linear model.

Usage

yule.cov(phy, formula, data = NULL)

Arguments

phy

an object of class"phylo".

formula

a formula specifying the model to be fitted.

data

the name of the data frame where the variables informula are to be found; by default, the variables are lookedfor in the global environment.

Details

The model fitted is a generalization of the Yule model where thespeciation rate is determined by:

\ln\frac{\lambda_i}{1 - \lambda_i} = \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \alpha

where\lambda_i is the speciation rate for species i,x_{i1}, x_{i2}, \dots are species-specificvariables, and\beta_1, \beta_2, \dots, \alphaare parameters to be estimated. The term on the left-hand side aboveis a logit function often used in generalized linear models forbinomial data (seefamily). The above model canbe written in matrix form:

\mathrm{logit} \lambda_i = x_i' \beta

The standard-errors of the parameters are computed with the secondderivatives of the log-likelihood function. (See References for otherdetails on the estimation procedure.)

The function needs three things:

a phylogenetic tree which may contain multichotomies;
a formula which specifies the predictors of the model describedabove: this is given as a standardR formula and has no response (noleft-hand side term), for instance:~ x + y, it can includeinteractions (~ x + a * b) (seeformulafor details);
the predictors specified in the formula must be accessible tothe function (either in the global space, or though thedataoption); they can be numeric vectors or factors. The length and theorder of these data are important: the number of values (length) mustbe equal to the number of tips of the tree + the number of nodes. Theorder is the following: first the values for the tips in the sameorder than for the labels, then the values for the nodes sequentiallyfrom the root to the most terminal nodes (i.e., in the order given byphy$edge).

The user must obtain the values for the nodes separately.

Note that the method in its present implementation assumes that thechange in a species trait is more or less continuous between two nodesor between a node and a tip. Thus reconstructing the ancestral valueswith a Brownian motion model may be consistent with the presentmethod. This can be done with the functionace.

Value

A NULL value is returned, the results are simply printed. The outputincludes the deviance of the null (intercept-only) model and alikelihood-ratio test of the fitted model against the null model.Note that the deviance of the null model is different from the onereturned byyule because of the different parametrizations.

Author(s)

Emmanuel Paradis

References

Paradis, E. (2005) Statistical analysis of diversification withspecies traits.Evolution,59, 1–12.

Examples

### a simple example with some random datadata(bird.orders)x <- rnorm(45) # the tree has 23 tips and 22 nodes### the standard-error for x should be as large as### the estimated parameteryule.cov(bird.orders, ~ x)### another example with a tree that has a multichotomydata(bird.families)y <- rnorm(272) # 137 tips + 135 nodesyule.cov(bird.families, ~ y)

Fits the Time-Dependent Yule Model

Description

This function fits by maximum likelihood the time-dependent Yulemodel. The time is measured from the past (root.time) to thepresent.

Usage

yule.time(phy, birth, BIRTH = NULL, root.time = 0, opti = "nlm", start = 0.01)

Arguments

phy

an object of class"phylo".

birth

a (vectorized) function specifying how the birth(speciation) probability changes through time (see details).

BIRTH

a (vectorized) function giving the primitive ofbirth.

root.time

a numeric value giving the time of the root node(time is measured from the past towards the present).

opti

a character string giving the function used foroptimisation of the likelihood function. Three choices are possible:"nlm","nlminb", or"optim", or any unambiguousabbreviation of these.

start

the initial values used in the optimisation.

Details

The model fitted is a straightforward extension of the Yule model withcovariates (seeyule.cov). Rather than havingheterogeneity among lineages, the speciation probability is the samefor all lineages at a given time, but can change through time.

The functionbirthmust meet these two requirements: (i)the parameters to be estimated are the formal arguments; (ii) time isnamedt in the body of the function. However, this is theopposite for the primitiveBIRTH:t is the formalargument, and the parameters are used in its body. See the examples.

It is recommended to useBIRTH if possible, and required ifspeciation probability is constant on some time interval. If thisprimitive cannot be provided, a numerical integration is done withintegrate.

The standard-errors of the parameters are computed with the Hessian ofthe log-likelihood function.

Value

An object of class"yule" (seeyule).

Author(s)

Emmanuel Paradis

References

Hubert, N., Paradis, E., Bruggemann, H. and Planes, S. (2011) Communityassembly and diversification in Indo-Pacific coral reeffishes.Ecology and Evolution,1, 229–277.

Examples

### define two models...birth.logis <- function(a, b) 1/(1 + exp(-a*t - b)) # logisticbirth.step <- function(l1, l2, Tcl) { # 2 rates with one break-point    ans <- rep(l1, length(t))    ans[t > Tcl] <- l2    ans}### ... and their primitives:BIRTH.logis <- function(t) log(exp(-a*t) + exp(b))/a + tBIRTH.step <- function(t){    out <- numeric(length(t))    sel <- t <= Tcl    if (any(sel)) out[sel] <- t[sel] * l1    if (any(!sel)) out[!sel] <- Tcl * l1 + (t[!sel] - Tcl) * l2    out}data(bird.families)### fit both models:yule.time(bird.families, birth.logis)yule.time(bird.families, birth.logis, BIRTH.logis) # same but faster## Not run: yule.time(bird.families, birth.step)  # failsyule.time(bird.families, birth.step, BIRTH.step,          opti = "nlminb", start = c(.01, .01, 100))

Zoom on a Portion of a Phylogeny

Description

This function plots simultaneously a whole phylogenetic tree(supposedly large) and a portion of it.

Usage

zoom(phy, focus, subtree = FALSE, col = rainbow, ...)

Arguments

phy

an object of class"phylo".

focus

a vector, either numeric or character, or a list ofvectors specifying the tips to be focused on.

subtree

a logical indicating whether to show the context of theextracted subtrees.

col

a vector of colours used to show where the subtrees are inthe main tree, or a function .

...

further arguments passed toplot.phylo.

Details

This function aims at exploring very large trees. The main argument isa phylogenetic tree, and the second one is a vector or a list ofvectors specifying the tips to be focused on. The vector(s) can beeither numeric and thus taken as the indices of the tip labels, orcharacter in which case it is taken as the corresponding tip labels.

The whole tree is plotted on the left-hand side in a narrowersub-window (about a quarter of the device) without tip labels. Thesubtrees consisting of the tips in ‘focus’ are extracted and plottedon the right-hand side starting from the top left corner andsuccessively column-wise.

If the argument ‘col’ is a vector of colours, as many colours as thenumber of subtrees must be given. The alternative is to give afunction that will create colours or grey levels from the number ofsubtrees: seerainbow for some possibilitieswith colours.

Author(s)

Emmanuel Paradis

Examples

## Not run: data(chiroptera)zoom(chiroptera, 1:20, subtree = TRUE)zoom(chiroptera, grep("Plecotus", chiroptera$tip.label))zoom(chiroptera, list(grep("Plecotus", chiroptera$tip.label),                      grep("Pteropus", chiroptera$tip.label)))## End(Not run)

Movatterモバイル変換

Analyses of Phylogenetics and Evolution

Description

Author(s)

References

Amino Acid Sequences

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Tree Estimation Based on an Improved Version of the NJ Algorithm

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Congruence among distance matrices

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Manipulate DNA Sequences in Bit-Level Format

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Recode Blocks of Indels

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Tree Estimation Based on the Minimum Evolution Algorithm

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Initialize a ‘corPhyl’ Structure Object

Description

Usage

Arguments

Value

Author(s)

See Also

Theoretical Lineage-Through Time Plots

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Most Parsimonious Reconstruction

Description

Usage

Arguments