| Type: | Package |
| Title: | Model Based Phylogenetic Analysis |
| Version: | 1.1.3 |
| Description: | A collection of functions to do model-based phylogenetic analysis. It includes functions to calculate community phylogenetic diversity, to estimate correlations among functional traits while accounting for phylogenetic relationships, and to fit phylogenetic generalized linear mixed models. The Bayesian phylogenetic generalized linear mixed models are fitted with the 'INLA' package (https://www.r-inla.org). |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| Depends: | R (≥ 3.1) |
| Imports: | stats, ape, Rcpp, Matrix, methods, graphics, dplyr, lme4,nloptr, gridExtra, mvtnorm, latticeExtra, tidyr |
| RoxygenNote: | 7.3.3 |
| LinkingTo: | Rcpp, RcppArmadillo |
| Suggests: | testthat, pez, knitr, rmarkdown, covr, picante, rbenchmark,INLA, MCMCglmm, logistf, phylolm, ggplot2, ggridges, DHARMa,rr2, future.apply |
| VignetteBuilder: | knitr |
| URL: | https://daijiang.github.io/phyr/,https://github.com/daijiang/phyr/ |
| BugReports: | https://github.com/daijiang/phyr/issues |
| Additional_repositories: | https://inla.r-inla-download.org/R/stable/ |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2025-11-10 22:46:40 UTC; dli |
| Author: | Anthony Ives [aut], Russell Dinnage |
| Maintainer: | Daijiang Li <daijianglee@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-11 08:30:02 UTC |
Not in
Description
This function will return elements of x not in y
Usage
x %nin% yArguments
x | A vector. |
y | A vector. |
Value
A vector.
Create phylogenetic var-cov matrix based on phylogeny and community data
Description
This function will remove species from community data that are not in the phylogeny.It will also remove tips from the phylogeny that are not in the community data. Andthen convert the phylogeny to a Var-cov matrix.
Usage
align_comm_V(comm, tree, prune.tree = FALSE, scale.vcv = TRUE)Arguments
comm | A site by species data frame, with site names as row names. |
tree | A phylogeny with "phylo" as class; or a phylogenetic var-covar matrix. |
prune.tree | Whether to prune the tree first then use vcv.phylo function. Defaultis FALSE: use vcv.phylo first then subsetting the matrix. |
scale.vcv | Whether to scale vcv to a correlation matrix. |
Value
A list of the community data and the phylogenetic var-cov matrix.
Generic method to output bootstrap confidence intervals from an object.
Description
Implemented only forcor_phylo objects thus far.
Usage
boot_ci(mod, ...)Arguments
mod | A |
... | Additional arguments. |
Value
A list of confidence intervals.
Example community data
Description
A data frame with site names as row names, species names as column names,cells are the abundance of each species at each site.
Usage
comm_aFormat
A data frame with 15 sites and 15 species.
Example community data
Description
A data frame with site names as row names, species names as column names,cells are the abundance of each species at each site.
Usage
comm_bFormat
A data frame with 15 sites and 9 species.
Correlations among multiple variates with phylogenetic signal
Description
This function calculates Pearson correlation coefficients for multiple continuousvariates that may have phylogenetic signal, allowing users to specify measurementerror as the standard error of variate values at the tips of the phylogenetic tree.Phylogenetic signal for each variate is estimated from the data assuming that variateevolution is given by a Ornstein-Uhlenbeck process. Thus, the function allows theestimation of phylogenetic signal in multiple variates while incorporatingcorrelations among variates. It is also possible to include independent variables(covariates) for each variate to remove possible confounding effects.cor_phylo returns the correlation matrix for variate values, estimatesof phylogenetic signal for each variate, and regression coefficients forindependent variables affecting each variate.
Usage
cor_phylo(variates, species, phy, covariates = NULL, meas_errors = NULL, data = sys.frame(sys.parent()), REML = TRUE, method = c("nelder-mead-r", "bobyqa", "subplex", "nelder-mead-nlopt", "sann"), no_corr = FALSE, constrain_d = FALSE, lower_d = 1e-7, rel_tol = 1e-6, max_iter = 1000, sann_options = NULL, verbose = FALSE, rcond_threshold = 1e-10, boot = 0, keep_boots = c("fail", "none", "all"))## S3 method for class 'cor_phylo'boot_ci(mod, refits = NULL, alpha = 0.05, ...)## S3 method for class 'cor_phylo'print(x, digits = max(3, getOption("digits") - 3), ...)Arguments
variates | A formula or a matrix specifying variates between which correlationsare being calculated.The formula should be one-sided of the form |
species | A one-sided formula implicating the variable inside |
phy | Either a phylogeny of class |
covariates | A list specifying covariate(s) for each variate.The list can contain only two-sided formulas or matrices.Formulas should be of the typical form: |
meas_errors | A list or matrix containing standard errors for each variate.If a list, it must contain only two-sided formulas like those for |
data | An optional data frame, list, or environment that contains thevariables in the model. By default, variables are taken from the environmentfrom which |
REML | Whether REML (versus ML) should be used for model fitting.Defaults to |
method | Method of optimization using |
no_corr | A single logical for whether to make all correlations zero.Running |
constrain_d | If |
lower_d | Lower bound on the phylogenetic signal parameter.Defaults to |
rel_tol | A control parameter dictating the relative tolerance for convergencein the optimization. Defaults to |
max_iter | A control parameter dictating the maximum number of iterationsin the optimization. Defaults to |
sann_options | A named list containing the control parameters for SANNminimization.This is only relevant if |
verbose | If |
rcond_threshold | Threshold for the reciprocal condition number of twomatrices inside the log likelihood function.Increasing this threshold makes the optimization process more strongly"bounce away" from badly conditioned matrices and can help with convergenceand with estimates that are nonsensical.Defaults to |
boot | Number of parametric bootstrap replicates.Bootstrapping can be run in parallel if |
keep_boots | Character specifying when to output data (convergence codesand simulated variate data) from bootstrap replicates.This is useful for troubleshooting when one or more bootstrap replicatesfails to converge or outputs ridiculous results.Setting this to |
mod |
|
refits | One or more |
alpha | Alpha used for the confidence intervals. Defaults to |
... | arguments passed to and from other methods. |
x | an object of class |
digits | the number of digits to be printed. |
Value
cor_phylo returns an object of classcor_phylo:
call | The matched call. |
corrs | The |
d | Values of |
B | A matrix of regression-coefficient estimates, SE, Z-scores, and P-values,respectively. Rownames indicate which coefficient it refers to. |
B_cov | Covariance matrix for regression coefficients. |
logLik | The log likelihood for either the restricted likelihood( |
AIC | AIC for either the restricted likelihood ( |
BIC | BIC for either the restricted likelihood ( |
niter | Number of iterations the optimizer used. |
convcode | Conversion code for the optimizer.This number is
For more information on the nlopt return codes, seehttps://nlopt.readthedocs.io/en/latest/NLopt_Reference/#return-values. |
rcond_vals | Reciprocal condition numbers for two matrices insidethe log likelihood function. These are provided to potentially help guidethe changing of the |
bootstrap | A list of bootstrap output, which is simply |
boot_ci returns a list of confidence intervals with the following fields:
corrsEstimates of correlations.This is a matrix the values above the diagonal being theupper limits and values below being the lower limits.
dPhylogenetic signals.
B0Coefficient estimates.
B_covCoefficient covariances.
Methods (by generic)
boot_ci(cor_phylo): returns bootstrapped confidence intervals from acor_phyloobjectprint(cor_phylo): printscor_phyloobjects
Walkthrough
For the case of two variables, the function estimates parameters for the model ofthe form, for example,
X[1] = B[1,0] + B[1,1] * u[1,1] + \epsilon[1]
X[2] = B[2,0] + B[2,1] * u[2,1] + \epsilon[2]
\epsilon ~ Gaussian(0, V)
whereB[1,0],B[1,1],B[2,0], andB[2,1] are regressioncoefficients, andV is a variance-covariance matrix containing the correlationcoefficient r, parameters of the OU processd1 andd2, and diagonalmatricesM1 andM2 of measurement standard errors forX[1] andX[2]. The matrixV is2n x 2n, withn x n blocks given by
V[1,1] = C[1,1](d1) + M1
V[1,2] = C[1,2](d1,d2)
V[2,1] = C[2,1](d1,d2)
V[2,2] = C[2,2](d2) + M2
whereC[i,j](d1,d2) are derived fromphy under the assumption of jointOU evolutionary processes for each variate (see Zheng et al. 2009). This formulationextends in the obvious way to more than two variates.
Author(s)
Anthony R. Ives, Lucas A. Nell
References
Zheng, L., A. R. Ives, T. Garland, B. R. Larget, Y. Yu, and K. F. Cao.2009. New multivariate tests for phylogenetic signal and trait correlationsapplied to ecophysiological phenotypes of nineManglietia species.Functional Ecology23:1059–1069.
Examples
## Simple example using data without correlations or phylogenetic## signal. This illustrates the structure of the input data.set.seed(10)phy <- ape::rcoal(10, tip.label = 1:10)data_df <- data.frame( species = phy$tip.label, # variates: par1 = rnorm(10), par2 = rnorm(10), par3 = rnorm(10), # covariate for par2: cov2 = rnorm(10, mean = 10, sd = 4), # measurement error for par1 and par2, respectively: se1 = 0.2, se2 = 0.4)data_df$par2 <- data_df$par2 + 0.5 * data_df$cov2cp <- cor_phylo(variates = ~ par1 + par2 + par3, covariates = list(par2 ~ cov2), meas_errors = list(par1 ~ se1, par2 ~ se2), species = ~ species, phy = phy, data = data_df)# If you've already created matrices/lists...X <- as.matrix(data_df[,c("par1", "par2", "par3")])U <- list(par2 = cbind(cov2 = data_df$cov2))M <- cbind(par1 = data_df$se1, par2 = data_df$se2)# ... you can also use those directly# (notice that I'm inputting an object for `species`# bc I ommitted `data`):cp2 <- cor_phylo(variates = X, species = data_df$species, phy = phy, covariates = U, meas_errors = M)# # # ## Simulation example for the correlation between two variables. The example# ## compares the estimates of the correlation coefficients from cor_phylo when# ## measurement error is incorporated into the analyses with three other cases:# ## (i) when measurement error is excluded, (ii) when phylogenetic signal is# ## ignored (assuming a "star" phylogeny), and (iii) neither measurement error# ## nor phylogenetic signal are included.# # # In the simulations, variable 2 is associated with a single independent variable.# # library(ape)# # set.seed(1)# # Set up parameter values for simulating data# n <- 50# phy <- rcoal(n, tip.label = 1:n)# trt_names <- paste0("par", 1:2)# # R <- matrix(c(1, 0.7, 0.7, 1), nrow = 2, ncol = 2)# d <- c(0.3, 0.95)# B2 <- 1# # Se <- c(0.2, 1)# M <- matrix(Se, nrow = n, ncol = 2, byrow = TRUE)# colnames(M) <- trt_names# # # Set up needed matrices for the simulations# p <- length(d)# # star <- stree(n)# star$edge.length <- array(1, dim = c(n, 1))# star$tip.label <- phy$tip.label# # Vphy <- vcv(phy)# Vphy <- Vphy/max(Vphy)# Vphy <- Vphy/exp(determinant(Vphy)$modulus[1]/n)# # tau <- matrix(1, nrow = n, ncol = 1) %*% diag(Vphy) - Vphy# C <- matrix(0, nrow = p * n, ncol = p * n)# for (i in 1:p) for (j in 1:p) {# Cd <- (d[i]^tau * (d[j]^t(tau)) * (1 - (d[i] * d[j])^Vphy))/(1 - d[i] * d[j])# C[(n * (i - 1) + 1):(i * n), (n * (j - 1) + 1):(j * n)] <- R[i, j] * Cd# }# MM <- matrix(M^2, ncol = 1)# V <- C + diag(as.numeric(MM))# # # Perform a Cholesky decomposition of Vphy. This is used to generate phylogenetic# # signal: a vector of independent normal random variables, when multiplied by the# # transpose of the Cholesky deposition of Vphy will have covariance matrix# # equal to Vphy.# iD <- t(chol(V))# # # Perform Nrep simulations and collect the results# Nrep <- 100# cor.list <- matrix(0, nrow = Nrep, ncol = 1)# cor.noM.list <- matrix(0, nrow = Nrep, ncol = 1)# cor.noP.list <- matrix(0, nrow = Nrep, ncol = 1)# cor.noMP.list <- matrix(0, nrow = Nrep, ncol = 1)# d.list <- matrix(0, nrow = Nrep, ncol = 2)# d.noM.list <- matrix(0, nrow = Nrep, ncol = 2)# B.list <- matrix(0, nrow = Nrep, ncol = 3)# B.noM.list <- matrix(0, nrow = Nrep, ncol = 3)# B.noP.list <- matrix(0, nrow = Nrep, ncol = 3)# # # set.seed(2)# for (rep in 1:Nrep) {# # XX <- iD %*% rnorm(2 * n)# X <- matrix(XX, n, p)# colnames(X) <- trt_names# # U <- list(cbind(rnorm(n, mean = 2, sd = 10)))# names(U) <- trt_names[2]# # X[,2] <- X[,2] + B2[1] * U[[1]][,1] - B2[1] * mean(U[[1]][,1])# # # Call cor_phylo with (i) phylogeny and measurement error,# # (ii) just phylogeny,# # and (iii) just measurement error# z <- cor_phylo(variates = X,# covariates = U,# meas_errors = M,# phy = phy,# species = phy$tip.label)# z.noM <- cor_phylo(variates = X,# covariates = U,# phy = phy,# species = phy$tip.label)# z.noP <- cor_phylo(variates = X,# covariates = U,# meas_errors = M,# phy = star,# species = phy$tip.label)# # cor.list[rep] <- z$corrs[1, 2]# cor.noM.list[rep] <- z.noM$corrs[1, 2]# cor.noP.list[rep] <- z.noP$corrs[1, 2]# cor.noMP.list[rep] <- cor(cbind(# lm(X[,1] ~ 1)$residuals,# lm(X[,2] ~ U[[1]])$residuals))[1,2]# # d.list[rep, ] <- z$d# d.noM.list[rep, ] <- z.noM$d# # B.list[rep, ] <- z$B[,1]# B.noM.list[rep, ] <- z.noM$B[,1]# B.noP.list[rep, ] <- z.noP$B[,1]# }# # correlation <- rbind(R[1, 2], mean(cor.list), mean(cor.noM.list),# mean(cor.noP.list), mean(cor.noMP.list))# rownames(correlation) <- c("True", "With M and Phy", "Without M",# "Without Phy", "Without Phy or M")# # signal.d <- rbind(d, colMeans(d.list), colMeans(d.noM.list))# rownames(signal.d) <- c("True", "With M and Phy", "Without M")# # est.B <- rbind(c(0, 0, B2), colMeans(B.list),# colMeans(B.noM.list[-39,]), # 39th rep didn't converge# colMeans(B.noP.list))# rownames(est.B) <- c("True", "With M and Phy", "Without M", "Without Phy")# colnames(est.B) <- rownames(z$B)# # # Example simulation output:# # correlation# # [,1]# # True 0.7000000# # With M and Phy 0.6943712# # Without M 0.2974162# # Without Phy 0.3715406# # Without Phy or M 0.3291473# # signal.d# # [,1] [,2]# # True 0.3000000 0.9500000# # With M and Phy 0.3025853 0.9422067# # Without M 0.2304527 0.4180208# # est.B# # par1_0 par2_0 par2_cov_1# # True 0.000000000 0.0000000 1.0000000# # With M and Phy -0.008838245 0.1093819 0.9995058# # Without M -0.008240453 0.1142330 0.9995625# # Without Phy 0.002933341 0.1096578 1.0028474Example environmental data
Description
A data frame of site environmental variables.
Usage
enviFormat
A data frame with 15 sites and 4 variables: sand proportion,canopy shade proportion, precipitation, and minimum temperature.
Family Objects for communityPGLMM objects
Description
Family Objects for communityPGLMM objects
Usage
## S3 method for class 'communityPGLMM'family(object, ...)Arguments
object | the function |
... | further arguments passed to methods. |
Fitted values for communityPGLMM
Description
Fitted values for communityPGLMM
Usage
## S3 method for class 'communityPGLMM'fitted(object, ...)Arguments
object | A fitted model with class communityPGLMM. |
... | Additional arguments, ignored for method compatibility. |
Value
Fitted values. For binomial and poisson PGLMMs, this is equal to mu.
Extract fixed-effects estimates
Description
Extract the fixed-effects estimates
Usage
## S3 method for class 'communityPGLMM'fixef(object, ...)Arguments
object | A fitted model with class communityPGLMM. |
... | Ignored. |
Details
Extract the estimates of the fixed-effects parameters from a fitted model.For bayesian models, the p-values are simply to indicate whether thecredible intervals include 0 (p = 0.04) or not (p = 0.6).
Value
A dataframe of fixed-effects estimates.
get_design_matrix gets design matrix for gaussian, binomial, and poisson models
Description
get_design_matrix gets design matrix for gaussian, binomial, and poisson models
Usage
get_design_matrix(formula, data, random.effects, na.action = NULL)Arguments
formula | A two-sided linear formula object describing themixed effects of the model. To specify that a random term should have phylogenetic covariance matrix alongwith non-phylogenetic one, add Note that correlated random terms are not allowed. For example, |
data | A |
random.effects | Optional pre-build list of random effects. If |
na.action | What to do with NAs? |
Value
A list of design matrices.
Match phylogeny with community data
Description
This function will remove species from community data that are not in the phylogeny.It will also remove tips from the phylogeny that are not in the community data.
Usage
match_comm_tree(comm, tree, comm_2 = NULL)Arguments
comm | A site by species data frame, with site names as row names. |
tree | A phylogeny with "phylo" as class. |
comm_2 | Another optional site by species data frame, if presented, both community data and the phylogenywill have the same set of species. This can be useful for PCD with custom species pool. |
Value
A list of the community data and the phylogeny.
Extracting the Model Frame from a communityPGLMM Modelobject
Description
Extracting the Model Frame from a communityPGLMM Modelobject
Usage
## S3 method for class 'communityPGLMM'model.frame(formula, ...)Arguments
formula | |
... | for For |
Number of Observation in a communityPGLMM Model
Description
Number of Observation in a communityPGLMM Model
Usage
## S3 method for class 'communityPGLMM'nobs(object, use.fallback = FALSE, ...)Arguments
object | a fitted model object. |
use.fallback | logical: should fallback methods be used to try toguess the value? |
... | further arguments to be passed to methods. |
Phylogeny and community data from an Oldfield ecosystem in Southern Ontario, Canada
Description
A list containing a phylogeny for XX species of Oldfield forbs, as well as apresence / absence dataset for their occurrence across several locations inSouthern Ontario see Dinnage (2009) for details. Sites each had two plots whichexperienced a different treatment each; either they has been disturbed (ploughed1 or 2 years previously), or they were a control plot (undisturbed in recent records).
Usage
oldfieldFormat
A list with two elements:
phyA phylogeny in
ape'sphyformatdataA data.frame containing data on the occurrence of the species in
phy
oldfield$data is a data.frame with 1786 rows, and the following 7 columns:
site_originteger. Site ID number.
habitat_typecharacter. Plot treatment: disturbed or undisturbed.
spcharacter. Species name using underscore to separate binomial names (to match phylogeny).
abundanceinteger. Recorded abundance of species in plot.
disturbanceinteger. Whether the plot was disturbed or not. 0 or 1. 0 for undisturbed, 1 for disturbed
site_origcharacter. A unique site descriptor concatenating the site number with the disturbance treatment.
presinteger. Species presence or absence in plot. 0 or 1. 0 for absent, 1 for present
pairwise site phylogenetic community dissimilarity (PCD) within a community
Description
Calculate pairwise site PCD, users can specify expected values frompcd_pred().
Usage
pcd(comm, tree, expectation = NULL, cpp = TRUE, verbose = TRUE, ...)Arguments
comm | A site by species data frame or matrix, sites as rows. |
tree | A phylogeny for species. |
expectation | nsp_pool, psv_bar, psv_pool, and nsr calculated from |
cpp | Whether to use loops written with c++, default is TRUE. |
verbose | Do you want to see the progress? |
... | Other arguments. |
Value
A list of a variety of pairwise dissimilarities.
References
Ives, A. R., & Helmus, M. R. 2010. Phylogenetic metrics of community similarity.The American Naturalist, 176(5), E128-E142.
Examples
x1 = pcd_pred(comm_1 = comm_a, comm_2 = comm_b, tree = phylotree, reps = 100)pcd(comm = comm_a, tree = phylotree, expectation = x1)Predicted PCD with species pool
Description
This function will calculate expected PCD from one or two sets of communities (depends on the species pool)
Usage
pcd_pred(comm_1, comm_2 = NULL, tree, reps = 10^3, cpp = TRUE)Arguments
comm_1 | A site by species dataframe or matrix, with sites as rows and species as columns. |
comm_2 | An optional second site by species data frame. It should have the same number of rows as comm_1.This can be useful if we want to calculate temporal beta diversity, i.e. changes of the same site over time.Because data of the same site are not independent, setting comm_2 will use both communities as species poolto calculate expected PCD. |
tree | The phylogeny for all species, with "phylo" as class; or a var-cov matrix. |
reps | Number of random draws, default is 1000 times. |
cpp | Whether to use loops written with c++, default is TRUE. If you came across with errors, try toset cpp = FALSE. This normally will run without errors, but slower. |
Value
A list with species richness of the pool, expected PSV, PSV of the pool,and unique number of species richness across sites.
Phylogenetic Generalized Linear Mixed Model for Community Data
Description
This function performs Generalized Linear Mixed Models for binary, count,and continuous data, estimating regression coefficients withapproximate standard errors. It is specifically designed for community datain which species occur within multiple sites (locations).A Bayesian version of PGLMM uses the packageINLA,which is not available on CRAN yet. If you wish to use this option,you must first installINLA fromhttps://www.r-inla.org/ by runninginstall.packages('INLA', repos='https://www.math.ntnu.no/inla/R/stable') in R.
Usage
pglmm( formula, data = NULL, family = "gaussian", cov_ranef = NULL, random.effects = NULL, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL, bayes_nested_matrix_as_list = FALSE)communityPGLMM( formula, data = NULL, family = "gaussian", cov_ranef = NULL, random.effects = NULL, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL, bayes_nested_matrix_as_list = FALSE)Arguments
formula | A two-sided linear formula object describing themixed effects of the model. To specify that a random term should have phylogenetic covariance matrix alongwith non-phylogenetic one, add Note that correlated random terms are not allowed. For example, |
data | A |
family | Either "gaussian" for a Linear Mixed Model, or"binomial" or "poisson" for Generalized Linear Mixed Models."family" should be specified as a character string (i.e., quoted). For binomial andPoisson data, we use the canonical logit and log link functions, respectively.Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.For both binomial and Poisson data, we add an observation-levelrandom term by default via |
cov_ranef | A named list of covariance matrices of random terms. The names should be thegroup variables that are used as random terms with specified covariance matrices(without the two underscores, e.g. |
random.effects | Optional pre-build list of random effects. If |
REML | Whether REML or ML is used for model fitting the random effects. Ignored if |
optimizer | nelder-mead-nlopt (default), bobyqa, Nelder-Mead, or subplex.Nelder-Mead is from the stats package and the other optimizers are from the nloptr package.Ignored if |
repulsion | When there are nested random terms specified, |
add.obs.re | Whether to add an observation-level random term for binomial or Poissondistributions. Normally it would be a good idea to add this to account for overdispersion,so |
verbose | If |
cpp | Whether to use C++ function for optim. Default is TRUE. Ignored if |
bayes | Whether to fit a Bayesian version of the PGLMM using |
s2.init | An array of initial estimates of s2 for each randomeffect that scales the variance. If s2.init is not provided for |
B.init | Initial estimates of |
reltol | A control parameter dictating the relative tolerancefor convergence in the optimization; see |
maxit | A control parameter dictating the maximum number ofiterations in the optimization; see |
tol.pql | A control parameter dictating the tolerance forconvergence in the PQL estimates of the mean components of theGLMM. Ignored if |
maxit.pql | A control parameter dictating the maximum numberof iterations in the PQL estimates of the mean components of theGLMM. Ignored if |
marginal.summ | Summary statistic to use for the estimate of coefficients whendoing a Bayesian PGLMM (when |
calc.DIC | Should the Deviance Information Criterion be calculated and returnedwhen doing a Bayesian PGLMM? Ignored if |
calc.WAIC | Should the WAIC be calculated and returnedwhen doing a Bayesian PGLMM? Ignored if |
prior | Which type of default prior should be used by |
prior_alpha | Only used if |
prior_mu | Only used if |
ML.init | Only relevant if |
tree | A phylogeny for column sp, with "phylo" class, or a covariance matrix for sp.Make sure to have all species in the matrix; if the matrix is not standardized,(i.e., det(tree) != 1), |
tree_site | A second phylogeny for "site". This is required only if thesite column contains species instead of sites. This can be used for bipartitiequestions; tree_site can also be a covariance matrix. Make sure to have all sitesin the matrix; if the matrix is not standardized (i.e., det(tree_site) != 1),pglmm' will try to standardize it for you. No longer used: keep here for compatibility. |
sp | No longer used: keep here for compatibility. |
site | No longer used: keep here for compatibility. |
bayes_options | Additional options to pass to INLA for if |
bayes_nested_matrix_as_list | For |
Details
For Gaussian data,pglmm analyzes the phylogenetic linear mixed model
Y = \beta_0 + \beta_1x + b_0 + b_1x
b_0 ~ Gaussian(0, \sigma_0^2I_{sp})
b_1 ~ Gaussian(0, \sigma_0^2V_{sp})
\eta ~ Gaussian(0,\sigma^2)
where\beta_0 and\beta_1 are fixedeffects, andV_{sp} is a variance-covariance matrixderived from a phylogeny (typically under the assumption ofBrownian motion evolution). Here, the variation in the mean(intercept) for each species is given by the random effectb_0 that is assumed to be independent amongspecies. Variation in species' responses to predictor variablex is given by a random effectb_0 that isassumed to depend on the phylogenetic relatedness among speciesgiven byV_{sp}; if species are closely related,their specific responses tox will be similar. Thisparticular model would be specified as
z <- pglmm(Y ~ X + (1|sp__), data = data, family = "gaussian", cov_ranef = list(sp = phy))
Or you can prepare the random terms manually (not recommended for simple models but may be necessary for complex models):
re.1 <- list(1, sp = dat$sp, covar = diag(nspp))
re.2 <- list(dat$X, sp = dat$sp, covar = Vsp)
z <- pglmm(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))
The covariance matrix covar is standardized to have its determinantequal to 1. This in effect standardizes the interpretation of thescalar\sigma^2. Although mathematically this isnot required, it is a very good idea to standardize the predictor(independent) variables to have mean 0 and variance 1. This willmake the function more robust and improve the interpretation of theregression coefficients. For categorical (factor) predictorvariables, you will need to construct 0-1 dummy variables, andthese should not be standardized (for obvious reasons).
For binary generalized linear mixed models (family ='binomial'), the function estimates parameters for the model ofthe form, for example,
y = \beta_0 + \beta_1x + b_0 + b_1x
Y = logit^{-1}(y)
b_0 ~ Gaussian(0, \sigma_0^2I_{sp})
b_1 ~ Gaussian(0, \sigma_0^2V_{sp})
where\beta_0 and\beta_1 are fixedeffects, andV_{sp} is a variance-covariance matrixderived from a phylogeny (typically under the assumption ofBrownian motion evolution).
z <- pglmm(Y ~ X + (1|sp__), data = data, family = "binomial", cov_ranef = list(sp = phy))
As with the linear mixed model, it is a very good idea tostandardize the predictor (independent) variables to have mean 0and variance 1. This will make the function more robust and improvethe interpretation of the regression coefficients.
Value
An object (list) of classcommunityPGLMM with the following elements:
formula | the formula for fixed effects |
formula_original | the formula for both fixed effects and random effects |
data | the dataset |
family |
|
random.effects | the list of random effects |
B | estimates of the regression coefficients |
B.se | approximate standard errors of the fixed effects regression coefficients.This is set to NULL if |
B.ci | approximate Bayesian credible interval of the fixed effects regression coefficients.This is set to NULL if |
B.cov | approximate covariance matrix for the fixed effects regression coefficients |
B.zscore | approximate Z scores for the fixed effects regression coefficients. This is set to NULL if |
B.pvalue | approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if |
ss | standard deviations of the random effects for the covariance matrix |
s2r | random effects variances for non-nested random effects |
s2n | random effects variances for nested random effects |
s2resid | for linear mixed models, the residual variance |
s2r.ci | Bayesian credible interval for random effects variances for non-nested random effects.This is set to NULL if |
s2n.ci | Bayesian credible interval for random effects variances for nested random effects.This is set to NULL if |
s2resid.ci | Bayesian credible interval for linear mixed models, the residual variance.This is set to NULL if |
logLik | for linear mixed models, the log-likelihood for either the restricted likelihood ( |
AIC | for linear mixed models, the AIC for either the restricted likelihood ( |
BIC | for linear mixed models, the BIC for either the restricted likelihood ( |
DIC | for Bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if |
REML | whether or not REML is used ( |
bayes | whether or not a Bayesian model was fit. |
marginal.summ | The specified summary statistic used to summarize the Bayesian marginal distributions.Only present if |
s2.init | the user-provided initial estimates of |
B.init | the user-provided initial estimates of |
Y | the response (dependent) variable returned in matrix form |
X | the predictor (independent) variables returned in matrix form (including 1s in the first column) |
H | the residuals. For linear mixed models, this does not account for random terms,To get residuals after accounting for both fixed and random terms, use |
iV | the inverse of the covariance matrix for the entire system (of dimension ( |
mu | predicted mean values for the generalized linear mixed model (i.e., similar to |
nested | matrices used to construct the nested design matrix. This is set to NULL if |
Zt | the design matrix for random effects. This is set to NULL if |
St | diagonal matrix that maps the random effects variances onto the design matrix |
convcode | the convergence code provided by |
niter | number of iterations performed by |
inla.model | Model object fit by underlying |
Author(s)
Anthony R. Ives, Daijiang Li, Russell Dinnage
References
Ives, A. R. and M. R. Helmus. 2011. Generalized linearmixed models for phylogenetic analyses of communitystructure. Ecological Monographs 81:511-525.
Ives A. R. 2018. Mixed and phylogenetic models: a conceptual introduction to correlated data.https://leanpub.com/correlateddata.
Rafferty, N. E., and A. R. Ives. 2013. Phylogenetictrait-based analyses of ecological networks. Ecology 94:2321-2333.
Simpson, Daniel, et al. 2017. Penalising model component complexity:A principled, practical approach to constructing priors.Statistical science 32(1): 1-28.
Li, D., Ives, A. R., & Waller, D. M. 2017.Can functional traits account for phylogenetic signal in community composition?New Phytologist, 214(2), 607-618.
Examples
## Structure of examples:# First, a (brief) description of model types, and how they are specified# - these are *not* to be run 'as-is'; they show how models should be organised# Second, a run-through of how to simulate, and then analyse, data# - these *are* to be run 'as-is'; they show how to format and work with data################################################ Brief summary of models and their use ################################################## Model structures from Ives & Helmus (2011)if(FALSE){ # dat = data set for regression (note: must have a column "sp" and a column "site") # phy = phylogeny of class "phylo" # repulsion = to test phylogenetic repulsion or not # Model 1 (Eq. 1) z <- pglmm(freq ~ sp + (1|site) + (1|sp__@site), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) # Model 2 (Eq. 2) z <- pglmm(freq ~ sp + X + (1|site) + (X|sp__), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) # Model 3 (Eq. 3) z <- pglmm(freq ~ sp*X + (1|site) + (1|sp__@site), data = dat, family = "binomial", cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1) ## Model structure from Rafferty & Ives (2013) (Eq. 3) # dat = data set # phyPol = phylogeny for pollinators (pol) # phyPlt = phylogeny for plants (plt) z <- pglmm(freq ~ pol * X + (1|pol__) + (1|plt__) + (1|pol__@plt) + (1|pol@plt__) + (1|pol__@plt__), data = dat, family = "binomial", cov_ranef = list(pol = phyPol, plt = phyPlt), REML = TRUE, verbose = TRUE, s2.init = .1)}######################################################## Detailed analysis showing covariance matrices ######################################################### This is the example from section 4.3 in Ives, A. R. (2018) Mixed # and phylogenetic models: a conceptual introduction to correlated data.library(ape)library(mvtnorm)# Investigating covariance matrices for different types of model structurenspp <- 6nsite <- 4# Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3)phy <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3)# Simulate species meanssd.sp <- 1mean.sp <- rTraitCont(phy, model = "BM", sigma=sd.sp^2)# Replicate values of mean.sp over sitesY.sp <- rep(mean.sp, times=nsite)# Simulate site meanssd.site <- 1mean.site <- rnorm(nsite, sd=sd.site)# Replicate values of mean.site over spY.site <- rep(mean.site, each=nspp)# Compute a covariance matrix for phylogenetic attractionsd.attract <- 1Vphy <- vcv(phy)# Standardize the phylogenetic covariance matrix to have determinant = 1. # (For an explanation of this standardization, see subsection 4.3.1 in Ives (2018))Vphy <- Vphy/(det(Vphy)^(1/nspp))# Construct the overall covariance matrix for phylogenetic attraction. # (For an explanation of Kronecker products, see subsection 4.3.1 in the book)V <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy)Y.attract <- array(t(rmvnorm(n = 1, sigma = sd.attract^2*V)))# Simulate residual errorssd.e <- 1Y.e <- rnorm(nspp*nsite, sd = sd.e)# Construct the datasetd <- data.frame(sp = rep(phy$tip.label, times = nsite), site = rep(1:nsite, each = nspp))# Simulate abundance datad$Y <- Y.sp + Y.site + Y.attract + Y.e# Analyze the modelpglmm(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy))# Display random effects: the function `pglmm_plot_ranef()` does what # the name implies. You can set `show.image = TRUE` and `show.sim.image = TRUE` # to see the matrices and simulations.re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy), show.image = FALSE, show.sim.image = FALSE)#################################################### Example of a bipartite phylogenetic model ##################################################### Investigating covariance matrices for different types of model structurenspp <- 20nsite <- 15# Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3)phy.sp <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3)phy.site <- compute.brlen(rtree(n = nsite), method = "Grafen", power = 1.3)# Simulate species meansmean.sp <- rTraitCont(phy.sp, model = "BM", sigma = 1)# Replicate values of mean.sp over sitesY.sp <- rep(mean.sp, times = nsite)# Simulate site meansmean.site <- rTraitCont(phy.site, model = "BM", sigma = 1)# Replicate values of mean.site over spY.site <- rep(mean.site, each = nspp)# Generate covariance matrix for phylogenetic attraction among speciessd.sp.attract <- 1Vphy.sp <- vcv(phy.sp)Vphy.sp <- Vphy.sp/(det(Vphy.sp)^(1/nspp))V.sp <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy.sp)Y.sp.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.attract^2*V.sp)))# Generate covariance matrix for phylogenetic attraction among sitessd.site.attract <- 1Vphy.site <- vcv(phy.site)Vphy.site <- Vphy.site/(det(Vphy.site)^(1/nsite))V.site <- kronecker(Vphy.site, diag(nrow = nspp, ncol = nspp))Y.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.site.attract^2*V.site)))# Generate covariance matrix for phylogenetic attraction of species:site interactionsd.sp.site.attract <- 1V.sp.site <- kronecker(Vphy.site, Vphy.sp)Y.sp.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.site.attract^2*V.sp.site)))# Simulate residual errorsd.e <- 0.5Y.e <- rnorm(nspp*nsite, sd = sd.e)# Construct the datasetd <- data.frame(sp = rep(phy.sp$tip.label, times = nsite), site = rep(phy.site$tip.label, each = nspp))# Simulate abundance datad$Y <- Y.sp + Y.site + Y.sp.attract + Y.site.attract + Y.sp.site.attract + Y.e# Plot random effects covariance matrices and then add phylogenies# Note that, if show.image and show.sim are not specified, pglmm_plot_ranef() shows# the covariance matrices if nspp * nsite < 200 and shows simulations # if nspp * nsite > 100re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + (1|sp__@site__), data=d, cov_ranef = list(sp = phy.sp, site = phy.site))# This flips the phylogeny to match to covariance matricesrot.phy.site <- phy.sitefor(i in (nsite+1):(nsite+Nnode(phy.site))) rot.phy.site <- rotate(rot.phy.site, node = i)plot(phy.sp, main = "Species", direction = "upward")plot(rot.phy.site, main = "Site")# Analyze the simulated data and compute a P-value for the (1|sp__@site__) # random effect using a LRT. It is often better to fit the reduced model before# the full model, because it s numerically easier to fit the reduced model, # and then the parameter estimates from the reduced model can be given to the# full model. In this case, I have used the estimates of the random effects # from the reduce model, mod.r$ss, as the initial estimates for the same # parameters in the full model in the statement s2.init=c(mod.r$ss, 0.01)^2. # The final 0.01 is for the last random effect in the full model, (1|sp__@site__). # Note also that the output of the random effects from communityPGLMM(), mod.r$ss, # are the standard deviations, so they have to be squared for use as initial # values of variances in mod.f.mod.r <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__), data = d, cov_ranef = list(sp = phy.sp, site = phy.site))mod.f <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + (1|sp__@site__), data = d, cov_ranef = list(sp = phy.sp, site = phy.site), s2.init = c(mod.r$ss, 0.01)^2)mod.fpvalue <- pchisq(2*(mod.f$logLik - mod.r$logLik), df = 1, lower.tail = FALSE)pvaluePhylogenetic Generalized Linear Mixed Model for Comparative Data
Description
pglmm_compare performs linear regression for Gaussian, binomial and Poissonphylogenetic data, estimating regression coefficients with approximate standarderrors. It simultaneously estimates the strength of phylogenetic signal in theresiduals and gives an approximate conditional likelihood ratio test for thehypothesis that there is no signal. Therefore, when applied withoutpredictor (independent) variables, it gives a test for phylogenetic signal.pglmm_compare is a wrapper forpglmm tailored for comparative data inwhich each value of the response (dependent) variable corresponds to a single tipon a phylogenetic tree. If there are multiple measures for each species,pglmmwill be helpful.
Usage
pglmm_compare( formula, family = "gaussian", data = list(), phy, REML = TRUE, optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"), add.obs.re = TRUE, verbose = FALSE, cpp = TRUE, bayes = FALSE, reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200, marginal.summ = "mean", calc.DIC = FALSE, prior = "inla.default", prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, s2.init = 1, B.init = NULL)Arguments
formula | A two-sided linear formula object describing thefixed-effects of the model; for example, Y ~ X. Binomial data can be eitherpresence/absence, or a two-column array of 'successes' and 'failures'.For both binomial and Poisson data, we add an observation-levelrandom term by default via |
family | Either "gaussian" for a Linear Mixed Model, or"binomial" or "poisson" for Generalized Linear Mixed Models. |
data | A data frame containing the variables named in formula. It must hasthe tip labels of the phylogeny as row names; if they are not in the same order,the data frame will be arranged so that row names match the order of tip labels. |
phy | A phylogenetic tree as an object of class "phylo". |
REML | Whether REML or ML is used for model fitting the random effects. Ignored if |
optimizer | nelder-mead-nlopt (default), bobyqa, Nelder-Mead, or subplex.Nelder-Mead is from the stats package and the other optimizers are from the nloptr package.Ignored if |
add.obs.re | Whether to add observation-level random term for binomial and Poissonfamilies. Normally it would be a good idea to add this to account for overdispersion,so |
verbose | If |
cpp | Whether to use C++ function for optim. Default is TRUE. Ignored if |
bayes | Whether to fit a Bayesian version of the PGLMM using |
reltol | A control parameter dictating the relative tolerancefor convergence in the optimization; see |
maxit | A control parameter dictating the maximum number ofiterations in the optimization; see |
tol.pql | A control parameter dictating the tolerance forconvergence in the PQL estimates of the mean components of theGLMM. Ignored if |
maxit.pql | A control parameter dictating the maximum numberof iterations in the PQL estimates of the mean components of theGLMM. Ignored if |
marginal.summ | Summary statistic to use for the estimate of coefficients whendoing a Bayesian PGLMM (when |
calc.DIC | Should the Deviance Information Criterion be calculated and returned,when doing a Bayesian PGLMM? Ignored if |
prior | Which type of default prior should be used by |
prior_alpha | Only used if |
prior_mu | Only used if |
ML.init | Only relevant if |
s2.init | An array of initial estimates of s2. If s2.init is not provided for |
B.init | Initial estimates of |
Details
pglmm_compare in the packagephyr is similar tobinaryPGLMM inthe packageape, although it has much broader functionality, includingaccepting more than just binary data, implementing Bayesian analyses, etc.
For non-Gaussian data, the function estimates parameters for the model
Pr(Y = 1) = \theta
\theta = inverse.link(b0 + b1 * x1 + b2 * x2 + \dots+ \epsilon)
\epsilon ~ Gaussian(0, s2 * V)
whereV is a covariance matrix derived from a phylogeny(typically under the assumption of Brownian motion evolution). Althoughmathematically there is no requirement forV to be ultrametric,forcingV into ultrametric form can aide in the interpretation of themodel. This is especially true for binary data, because in regression forbinary dependent variables, only the off-diagonal elements (i.e., covariances)of matrixV are biologically meaningful (see Ives & Garland 2014).The function converts a phylo tree object into a covariance matrix,and further standardizes this matrix to have determinant = 1. This in effectstandardizes the interpretation of the scalars2. Although mathematicallynot required, it is a very good idea to standardize the predictor(independent) variables to have mean 0 and variance 1. This will make thefunction more robust and improve the interpretation of the regressioncoefficients.
For Gaussian data, the function estimates parameters for the model
Y = b0 + b1 * x1 + b2 * x2 + \dots + \epsilon)
\epsilon ~ Gaussian(0, s2 * V + s2resid * I)
wheres2resid * I gives the non-phylogenetic residual variance. Note that thisis equivalent to a model with Pagel's lambda transformation.
Value
An object (list) of classpglmm_compare with the following elements:
formula | the formula for fixed effects |
formula_original | the formula for both fixed effects and random effects |
data | the dataset |
family | either |
B | estimates of the regression coefficients |
B.se | approximate standard errors of the fixed effects regression coefficients.This is set to NULL if |
B.ci | approximate bayesian credible interval of the fixed effects regression coefficients.This is set to NULL if |
B.cov | approximate covariance matrix for the fixed effects regression coefficients |
B.zscore | approximate Z scores for the fixed effects regression coefficients. This is set to NULL if |
B.pvalue | approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if |
ss | random effects' standard deviations for the covariance matrix |
s2r | random effects variances for non-nested random effects |
s2n | random effects variances for nested random effects |
s2resid | for linear mixed models, the residual variance |
s2r.ci | Bayesian credible interval for random effects variances for non-nested random effects.This is set to NULL if |
s2n.ci | Bayesian credible interval for random effects variances for nested random effects.This is set to NULL if |
s2resid.ci | Bayesian credible interval for linear mixed models, the residual variance.This is set to NULL if |
logLik | for linear mixed models, the log-likelihood for either the restricted likelihood ( |
AIC | for linear mixed models, the AIC for either the restricted likelihood ( |
BIC | for linear mixed models, the BIC for either the restricted likelihood ( |
DIC | for bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if |
REML | whether or not REML is used ( |
bayes | whether or not a Bayesian model was fit. |
marginal.summ | The specified summary statistic used to summarise the Bayesian marginal distributions.Only present if |
s2.init | the user-provided initial estimates of |
B.init | the user-provided initial estimates of |
Y | the response (dependent) variable returned in matrix form |
X | the predictor (independent) variables returned in matrix form (including 1s in the first column) |
H | the residuals. For linear mixed models, this does not account for random terms,To get residuals after accounting for both fixed and random terms, use |
iV | the inverse of the covariance matrix. This is NULL if |
mu | predicted mean values for the generalized linear mixed model (i.e. similar to |
Zt | the design matrix for random effects. This is set to NULL if |
St | diagonal matrix that maps the random effects variances onto the design matrix |
convcode | the convergence code provided by |
niter | number of iterations performed by |
inla.model | Model object fit by underlying |
Author(s)
Anthony R. Ives
References
Ives, A. R. and Helmus, M. R. (2011) Generalized linear mixedmodels for phylogenetic analyses of community structure.EcologicalMonographs,81, 511–525.
Ives, A. R. and Garland, T., Jr. (2014) Phylogenetic regression for binarydependent variables. Pages 231–261in L. Z. Garamszegi, editor.Modern Phylogenetic Comparative Methods and Their Application inEvolutionary Biology. Springer-Verlag, Berlin Heidelberg.
See Also
pglmm; packageape and its functionbinaryPGLMM;packagephylolm and its functionphyloglm; packageMCMCglmm
Examples
## Illustration of `pglmm_compare` with simulated data# Generate random phylogenylibrary(ape)n <- 100phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1)# Generate random data and standardize to have mean 0 and variance 1X1 <- rTraitCont(phy, model = "BM", sigma = 1)X1 <- (X1 - mean(X1))/var(X1)# Simulate binary Ysim.dat <- data.frame(Y = array(0, dim = n), X1 = X1, row.names = phy$tip.label)sim.dat$Y <- ape::binaryPGLMM.sim(Y ~ X1, phy = phy, data=sim.dat, s2 = 1, B = matrix(c(0, .25), nrow = 2, ncol = 1), nrep = 1)$Y# Fit modelpglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat)# Compare with `binaryPGLMM`ape::binaryPGLMM(Y ~ X1, phy = phy, data = sim.dat)# Compare with `phyloglm`summary(phylolm::phyloglm(Y ~ X1, phy = phy, data = sim.dat))# Compare with `glm` that does not account for phylogenysummary(glm(Y ~ X1, data = sim.dat, family = "binomial"))# Compare with logistf() that does not account# for phylogeny but is less biased than glm()logistf::logistf(Y ~ X1, data = sim.dat)## Fit model with bayes = TRUE# pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat, # bayes = TRUE, calc.DIC = TRUE)# Compare with `MCMCglmm`V <- vcv(phy)V <- V/max(V)detV <- exp(determinant(V)$modulus[1])V <- V/detV^(1/n)invV <- Matrix::Matrix(solve(V),sparse = TRUE)sim.dat$species <- phy$tip.labelrownames(invV) <- sim.dat$speciesnitt <- 43000thin <- 10burnin <- 3000prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)))# commented out to save time# summary(MCMCglmm::MCMCglmm(Y ~ X1, random = ~species, ginvers = list(species = invV),# data = sim.dat, slice = TRUE, nitt = nitt, thin = thin, burnin = burnin,# family = "categorical", prior = prior, verbose = FALSE))pglmm_matrix_structure produces the entirecovariance matrix structure (V) when you specify random effects.
Description
pglmm_matrix_structure produces the entirecovariance matrix structure (V) when you specify random effects.
Usage
pglmm_matrix_structure( formula, data = list(), family = "binomial", cov_ranef, repulsion = FALSE, ss = 1, cpp = TRUE)communityPGLMM.matrix.structure( formula, data = list(), family = "binomial", cov_ranef, repulsion = FALSE, ss = 1, cpp = TRUE)Arguments
formula | A two-sided linear formula object describing themixed effects of the model. To specify that a random term should have phylogenetic covariance matrix alongwith non-phylogenetic one, add Note that correlated random terms are not allowed. For example, |
data | A |
family | Either "gaussian" for a Linear Mixed Model, or"binomial" or "poisson" for Generalized Linear Mixed Models."family" should be specified as a character string (i.e., quoted). For binomial andPoisson data, we use the canonical logit and log link functions, respectively.Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.For both binomial and Poisson data, we add an observation-levelrandom term by default via |
cov_ranef | A named list of covariance matrices of random terms. The names should be thegroup variables that are used as random terms with specified covariance matrices(without the two underscores, e.g. |
repulsion | When there are nested random terms specified, |
ss | Which of the |
cpp | Whether to use C++ function for optim. Default is TRUE. Ignored if |
Value
A design matrix.
Visualize random terms of communityPGLMMs
Description
Plot variance-cov matrix of random terms; also it is optional to simulate andvisualize data based on these var-cov matrices. The input can be a communityPGLMMmodel (by setting argumentx). If no model has been fitted, you can also specifydata, formula, and family, etc. without actually fitting the model, which willsave time.
Usage
pglmm_plot_ranef( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ...)communityPGLMM.show.re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ...)pglmm_plot_re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ...)communityPGLMM.plot.re( formula = NULL, data = NULL, family = "gaussian", sp.var = "sp", site.var = "site", tree = NULL, tree_site = NULL, repulsion = FALSE, x = NULL, show.image = TRUE, show.sim.image = FALSE, random.effects = NULL, add.tree.sp = TRUE, add.tree.site = FALSE, cov_ranef = NULL, tree.panel.space = 0.5, title.space = 5, tree.size = 3, ...)Arguments
formula | A two-sided linear formula object describing themixed effects of the model. To specify that a random term should have phylogenetic covariance matrix alongwith non-phylogenetic one, add Note that correlated random terms are not allowed. For example, |
data | A |
family | Either "gaussian" for a Linear Mixed Model, or"binomial" or "poisson" for Generalized Linear Mixed Models."family" should be specified as a character string (i.e., quoted). For binomial andPoisson data, we use the canonical logit and log link functions, respectively.Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.For both binomial and Poisson data, we add an observation-levelrandom term by default via |
sp.var | The variable name of "species"; y-axis of the image. |
site.var | The variable name of "site"; x-axis of the image. |
tree | A phylogeny for column sp, with "phylo" class, or a covariance matrix for sp.Make sure to have all species in the matrix; if the matrix is not standardized,(i.e., det(tree) != 1), |
tree_site | A second phylogeny for "site". This is required only if thesite column contains species instead of sites. This can be used for bipartitiequestions; tree_site can also be a covariance matrix. Make sure to have all sitesin the matrix; if the matrix is not standardized (i.e., det(tree_site) != 1),pglmm' will try to standardize it for you. No longer used: keep here for compatibility. |
repulsion | When there are nested random terms specified, |
x | A fitted model with class communityPGLMM. |
show.image | Whether to show the images of random effects. |
show.sim.image | Whether to show the images of simulated site by sp matrix.This can be useful to see how the phylogenetic information were included. |
random.effects | Optional pre-build list of random effects. If |
add.tree.sp | Whether to add a phylogeny of species at the top of thesimulated site by sp matrix plot, default is TRUE. |
add.tree.site | Whether to add a phylogeny of sites at the right of thesimulated site by sp matrix plot, default is FALSE. |
cov_ranef | A named list of covariance matrices of random terms. The names should be thegroup variables that are used as random terms with specified covariance matrices(without the two underscores, e.g. |
tree.panel.space | The number of lines between the phylogeny andthe matrix plot, if add.tree is TRUE. |
title.space | The number of lines between the title and the matrix plot, if add.tree is TRUE. |
tree.size | The height of the phylogeny to be plotted (number of lines), if add.tree is TRUE. |
... | Additional arguments for
|
Value
A hidden list, including the covariance matrices and simulated site by species matrices.Individual plots are saved asplt_re_list andplt_sim_list. Ifshow.image orshow.sim.image is TRUE, the corresponding final plot (plt_re_all_in_one orplt_sim_all_in_one) can be saved as external file usingggplot2::ggsave asit is a grid object.
Predicted values of PGLMM
Description
pglmm_predicted_values calculates the predictedvalues of Y; for the generalized linear mixed model (family %in%c("binomial","poisson"), these values are in the transformed space.
Usage
pglmm_predicted_values( x, cpp = TRUE, gaussian.pred = c("nearest_node", "tip_rm"), re.form = NULL, type = c("link", "response"), ...)communityPGLMM.predicted.values( x, cpp = TRUE, gaussian.pred = c("nearest_node", "tip_rm"))Arguments
x | A fitted model with class communityPGLMM. |
cpp | Whether to use c++ code. Default is TRUE. |
gaussian.pred | When family is gaussian, which type of prediction to calculate?Option nearest_node will predict values to the nearest node, which is same as lme4::predict orfitted. Option tip_rm will remove the point then predict the value of this point with remaining ones. |
re.form | (formula, |
type | character string - either |
... | Optional additional parameters. None are used at present. |
Value
A data frame with column Y_hat (predicted values accounting forboth fixed and random terms).
Testing statistical significance of random effect
Description
pglmm_profile_LRT tests statistical significance of thephylogenetic random effect using profile likelihoods when bayes = F.The resulting p-values are conditional on the fixedestimates of the other parameters, in contrast to a standardlikelihood ratio test.
Usage
pglmm_profile_LRT(x, re.number = 0, cpp = TRUE)communityPGLMM.profile.LRT(x, re.number = 0, cpp = TRUE)Arguments
x | A fitted model with class communityPGLMM with |
re.number | Which random term to test? Can be a vector with length >1. |
cpp | Whether to use C++ function for optim. Default is TRUE. Ignored if |
Value
A list of likelihood, df, and p-value.
Example phylogeny
Description
A phylogeny with more species than the community data.
Usage
phylotreeFormat
Newick format.
plot_bayes generic
Description
plot_bayes generic
Usage
plot_bayes(x, ...)Arguments
x | A communityPGLMM object fit with |
... | Further arguments to pass to or from other methods. |
Value
A ggplot object
Plot the original dataset and predicted values (optional)
Description
Plots a representation of the marginal posterior distribution of model parameters. Note thisfunction requires the packagesggplot2 andggridges to be installed.
Usage
plot_data( x, sp.var = "sp", site.var = "site", show.sp.names = FALSE, show.site.names = FALSE, digits = max(3, getOption("digits") - 3), predicted = FALSE, ...)## S3 method for class 'communityPGLMM'plot_bayes(x, n_samp = 1000, sort = TRUE, ...)Arguments
x | A communityPGLMM object fit with |
sp.var | The variable name of "species"; y-axis of the image. |
site.var | The variable name of "site"; x-axis of the image. |
show.sp.names | Whether to print species names as y-axis labels. |
show.site.names | Whether to print site names as x-axis labels. |
digits | Not used. |
predicted | Whether to plot predicted values side by side with observed ones. |
... | Further arguments to pass to or from other methods. |
n_samp | Number of sample from the marginal posterior to take in order to estimate the posterior density. |
sort | Whether to plot different terms in the order of their estimations. Default is 'TRUE'. |
Value
A ggplot object
Note
The underlying plot grid object is returned but invisible. It can be saved for later uses.
Predict Function for communityPGLMM Model Objects
Description
Predict Function for communityPGLMM Model Objects
Usage
## S3 method for class 'communityPGLMM'predict(object, newdata = NULL, ...)Arguments
object | Object of class inheriting from |
newdata | An optional data frame in which to look for variables withwhich to predict. If omitted, the fitted values are used. |
... | further arguments passed to or from other methods. |
Value
The form of the value returned bypredict depends on theclass of its argument. See the documentation of theparticular methods for details of what is produced by that method.
Prepare data forpglmm
Description
This function is mainly used withinpglmm but can also be used independently toprepare a list of random effects, which then can be updated by users for more complex models.
Usage
prep_dat_pglmm( formula, data, cov_ranef = NULL, repulsion = FALSE, prep.re.effects = TRUE, family = "gaussian", add.obs.re = TRUE, bayes = FALSE, bayes_nested_matrix_as_list = FALSE)Arguments
formula | A two-sided linear formula object describing themixed effects of the model. To specify that a random term should have phylogenetic covariance matrix alongwith non-phylogenetic one, add Note that correlated random terms are not allowed. For example, |
data | A |
cov_ranef | A named list of covariance matrices of random terms. The names should be thegroup variables that are used as random terms with specified covariance matrices(without the two underscores, e.g. |
repulsion | When there are nested random terms specified, |
prep.re.effects | Whether to prepare random effects for users. |
family | Either "gaussian" for a Linear Mixed Model, or"binomial" or "poisson" for Generalized Linear Mixed Models."family" should be specified as a character string (i.e., quoted). For binomial andPoisson data, we use the canonical logit and log link functions, respectively.Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'.For both binomial and Poisson data, we add an observation-levelrandom term by default via |
add.obs.re | Whether to add an observation-level random term for binomial or Poissondistributions. Normally it would be a good idea to add this to account for overdispersion,so |
bayes | Whether to fit a Bayesian version of the PGLMM using |
bayes_nested_matrix_as_list | For |
Value
A list with updated formula, random.effects, and updated cov_ranef.
Print summary information of fitted model
Description
Print summary information of fitted model
Usage
## S3 method for class 'communityPGLMM'print(x, digits = max(3, getOption("digits") - 3), ...)Arguments
x | A fitted communityPGLMM model. |
digits | Minimal number of significant digits for printing, as in |
... | Additional arguments, currently ignored. |
Print summary information of fitted model
Description
Print summary information of fitted model
Usage
## S3 method for class 'pglmm_compare'print(x, digits = max(3, getOption("digits") - 3), ...)Arguments
x | A fitted pglmm_compare. |
digits | Minimal number of significant digits for printing, as in |
... | Additional arguments, currently ignored. |
Phylogenetic Species Diversity Metrics
Description
Calculate the bounded phylogenetic biodiversity metrics:phylogenetic species variability, richness, evenness and clustering for one or multiple communities.
Usage
psv( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE)psr( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE)pse(comm, tree, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE)psc(comm, tree, scale.vcv = TRUE, prune.tree = FALSE)psv.spp(comm, tree, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE)psd( comm, tree, compute.var = TRUE, scale.vcv = TRUE, prune.tree = FALSE, cpp = TRUE)Arguments
comm | Community data matrix, site as rows and species as columns, site names as row names. |
tree | A phylo tree object with class "phylo" or a phylogenetic covariance matrix. |
compute.var | Logical, default is TRUE, computes the expected variancesfor PSV and PSR for each community. |
scale.vcv | Logical, default is TRUE, scale the phylogenetic covariancematrix to bound the metric between 0 and 1 (i.e. correlations). |
prune.tree | Logical, default is FALSE, prune the phylogeny before convertingto var-cov matrix? Pruning and then converting VS converting then subsetting mayhave different var-cov matrix resulted. |
cpp | Logical, default is TRUE, whether to use cpp for internal calculations. |
Details
Phylogenetic species variability (PSV) quantifies howphylogenetic relatedness decreases the variance of a hypotheticalunselected/neutral trait shared by all species in a community.The expected value of PSV is statistically independent of species richness,is one when all species in a community are unrelated (i.e., a star phylogeny)and approaches zero as species become more related. PSV is directly relatedto mean phylogenetic distance, except except calculated on a scaled phylogeneticcovariance matrix. The expected variance around PSV for any community of a particularspecies richness can be approximated. To address how individual speciescontribute to the mean PSV of a data set, the functionpsv.spp givessigned proportions of the total deviation from the mean PSV that occurs whenall species are removed from the data set one at a time. The absolute valuesof these “species effects” tend to positively correlate with species prevalence.
Value
Returns a dataframe of the respective phylogenetic species diversity metric values
Note
These metrics are bounded either between zero and one (PSV, PSE, PSC)or zero and species richness (PSR); but the metrics asymptotically approachzero as relatedness increases. Zero can be assigned to communities with lessthan two species, but conclusions drawn from assigning communities zero valuesneed be carefully explored for any data set. The data sets need not bespecies-community data sets but may be any community data set with an associated phylogeny.
Author(s)
Matthew Helmusmrhelmus@gmail.com
References
Helmus M.R., Bland T.J., Williams C.K. & Ives A.R. 2007.Phylogenetic measures of biodiversity. American Naturalist, 169, E68-E83
Examples
psv(comm = comm_a, tree = phylotree)Extract random-effects estimates
Description
Extract the random-effects estimates
Usage
## S3 method for class 'communityPGLMM'ranef(object, ...)Arguments
object | A fitted model with class communityPGLMM. |
... | Ignored. |
Details
Extract the estimates of the random-effects parameters from a fitted model.
Value
A dataframe of random-effects estimates.
Refit bootstrap replicates that failed to converge in a call tocor_phylo
Description
This function is to be called on acor_phylo object if one or more bootstrapreplicates fail to converge.It allows the user to change parameters for the optimizer to get it to converge.One or more of the resultingcp_refits object(s) can be supplied toboot_ci along with the originalcor_phylo object to calculate confidenceintervals from only bootstrap replicates that converged.
Usage
refit_boots(cp_obj, inds = NULL, ...)## S3 method for class 'cp_refits'print(x, digits = max(3, getOption("digits") - 3), ...)Arguments
cp_obj | The original |
inds | Vector of indices indicating the bootstraps you want to refit.This is useful if you want to try refitting only a portion of bootstrapreplicates.By passing |
... | Arguments that should be changed from the original call to |
x | an object of class |
digits | the number of digits to be printed. |
Value
Acp_refits object, which is a list ofcor_phylo objectscorresponding to each matrix in<original cor_phylo object>$bootstrap$mats.
Functions
print(cp_refits): printscp_refitsobjects
Residuals of communityPGLMM objects
Description
Getting different types of residuals for communityPGLMM objects.
Usage
## S3 method for class 'communityPGLMM'residuals( object, type = if (object$family %in% c("binomial", "poisson")) "deviance" else "response", scaled = FALSE, ...)Arguments
object | A fitted model with class communityPGLMM. |
type | Type of residuals, currently only "response" for gaussian pglmm;"deviance" (default) and "response" for binomial and poisson pglmm. |
scaled | Scale residuals by residual standard deviation for gaussian pglmm. |
... | Additional arguments, ignored for method compatibility. |
Value
A vector of residuals.
Remove site that has no observations of any species
Description
This function will remove site that has no observations in a site by species data frame.
Usage
rm_site_noobs(df, warn = FALSE)Arguments
df | A data frame in wide form, i.e. site by species data frame, with site names as row name. |
warn | Whether to warn when any site has no species? Default is |
Value
A site by species data frame.
Author(s)
Daijiang Li
Remove species that not observed in any site
Description
Remove species that not observed in any site
Usage
rm_sp_noobs(df, warn = FALSE)Arguments
df | A data frame in wide form, i.e. site by species data frame, with site names as row name. |
warn | Whether to warn when any species does not occur in at least one site? Default is |
Value
A site by species data frame.
Author(s)
Daijiang Li
This function will remove species that has no observations in any site.
Simulate from a communityPGLMM object
Description
Simulate from a communityPGLMM object
Usage
## S3 method for class 'communityPGLMM'simulate( object, nsim = 1, seed = NULL, re.form = NULL, ntry = 5, full = FALSE, ...)Arguments
object | A fitted model object with class 'communityPGLMM'. |
nsim | positive integer scalar - the number of responses to simulate. |
seed | an optional seed to be used in |
re.form | (formula, |
ntry | Number of times to retry simulation in the case of |
full | If |
... | optional additional arguments (none are used in |
Summary information of fitted model
Description
Summary information of fitted model
Usage
## S3 method for class 'communityPGLMM'summary(object, digits = max(3, getOption("digits") - 3), ...)Arguments
object | A fitted model with class communityPGLMM. |
digits | Minimal number of significant digits for printing, as in |
... | Additional arguments, currently ignored. |
Summary information of fitted pglmm_compare model
Description
Summary information of fitted pglmm_compare model
Usage
## S3 method for class 'pglmm_compare'summary(object, digits = max(3, getOption("digits") - 3), ...)Arguments
object | A fitted model with class pglmm_compare. |
digits | Minimal number of significant digits for printing, as in |
... | Additional arguments, currently ignored. |
Example species traits data
Description
A data frame of species functional traits.
Usage
traitsFormat
A data frame with 18 species and 3 variables: sla,height, and seed dispersal mode.
Create phylogenetic var-cov matrix
Description
This function will convert a phylogeny to a Var-cov matrix.
Usage
vcv2(phy, corr = FALSE)Arguments
phy | A phylogeny with "phylo" as class. |
corr | Whether to return a correlation matrix instead of Var-cov matrix. Default is FALSE. |
Value
A phylogenetic var-cov matrix.