Movatterモバイル変換


[0]ホーム

URL:


Title:Generalized Dissimilarity Modeling
Version:1.6.0-7
Date:2025-04-16
Description:A toolkit with functions to fit, plot, summarize, and apply Generalized Dissimilarity Models. Mokany K, Ware C, Woolley SNC, Ferrier S, Fitzpatrick MC (2022) <doi:10.1111/geb.13459> Ferrier S, Manion G, Elith J, Richardson K (2007) <doi:10.1111/j.1472-4642.2007.00341.x>.
License:GPL (≥ 3)
Depends:R (≥ 3.5.0)
Encoding:UTF-8
RoxygenNote:7.3.2
Config/testthat/edition:3
LazyData:true
URL:https://mfitzpatrick.al.umces.edu/gdm/,https://github.com/fitzLab-AL/gdm/
BugReports:https://github.com/fitzLab-AL/gdm/issues/
Imports:parallel, methods, Rcpp, reshape2, vegan, doParallel, foreach,pbapply
Suggests:tinytest, scales, terra
LinkingTo:Rcpp
NeedsCompilation:yes
Packaged:2025-04-16 15:16:30 UTC; mfitzpatrick
Author:Matt FitzpatrickORCID iD [aut, cre], Karel MokanyORCID iD [aut], Glenn Manion [aut], Diego Nieto-LugildeORCID iD [aut], Simon FerrierORCID iD [aut], Roozbeh Valavi [ctb], Matthew Lisk [ctb], Chris Ware [ctb], Skip Woolley [ctb], Tom Harwood [ctb]
Maintainer:Matt Fitzpatrick <mfitzpatrick@umces.edu>
Repository:CRAN
Date/Publication:2025-05-08 11:20:05 UTC

Overview of the functions in the gdm package

Description

Generalized Dissimilarity Modeling is a statistical technique for modelling variation in biodiversity between pairs of geographical locations or through time. Thegdm package provides functions to fit, evaluate, summarize, and plot Generalized Dissimilarity Models and to make predictions (across space and/or through time) and map biological patterns by transforming environmental predictor variables.

Details

The functions in thegdm package provide the tools necessary for fitting GDMs, including functions to prepare biodiversity and environmental data. Major functionality includes:

To see the preferable citation of the package, typecitation("gdm").

I. Formatting input data

GDM fits biological distances to pairwise site geographical and environmental distances. Most users will need to first format their data togdm's site-pair table format:

formatsitepair To convert biodiversity and environmental data to site-pair format

II. Model fitting, evaluation, and summary

gdm To fit a GDM
gdm.crossvalidation To evaluate a GDM
gdm.partition.deviance To asses predictor contributions to deviance explained
gdm.varImp To asses model significance and predictor importance
summary To summarize a GDM

III. Model prediction and transformation of environmental data

predict To predict biological dissimilarities between sites in space or between time periods
gdm.transform To transform each environmental predictor to biological importance

IV. Plotting model output and fitted functions

plot To plot model fit and I-splines
isplineExtract To extract I-spline values to allow for custom plotting
plotUncertainty To estimate and plot model sensitivity using bootstrapping

Author(s)

Thegdm development team is Matt Fitzpatrick and Karel Mokany. The R package is based on code originally developed by Glenn Manion under the direction of Simon Ferrier. Where others have contributed to individual functions, credits are provided in function help pages.

The maintainer of the R version ofgdm is Matt Fitzpatrick <mfitzpatrick@umces.edu>.


Calculate GDM Deviance for Observed & Predicted Dissimilarities

Description

Calculate GDM deviance for observed & predicted dissimilarities.Can be used for assessing cross-validation data. Translated from the c++function CalcGDMDevianceDouble() in the file NNLS_Double.cpp from theGDM R package.

Usage

calculate.gdm.deviance(predDiss, obsDiss)

Arguments

predDiss

(float) A vector of predicted dissimilarity values, of samelength as obsDiss.

obsDiss

(float) A vector of observed dissimilarity values, of samelength as predDiss.

Value

A single value (float) being the deviance.


Create Site-Pair Table

Description

Creates a site-pair table from the lower half of a site-by-site distance(dissimilarity) matrix. This function is called from theformatsitepair function and not needed by the user.

Usage

createsitepair(dist, spdata, envInfo, dXCol, dYCol, siteCol, weightsType,custWeights)

Arguments

dist

The lower half of a site-by-site distance (dissimilarity) matrix,provided by theformatsitepair function.

spdata

Input species data, the same as the bioData input to theformatsitepair function.

envInfo

Input environmental data. Only accepts data tables as input.If the environmental data forformatsitepair are rasters, thedata would have been extracted into table format withinformatsitepair.

dXCol

Input x coordinate, the same as the XColumn input to theformatsitepair function.

dYCol

Input y coordinate, the same as the YColumn input to theformatsitepair function.

siteCol

Site column, taken from either the species or environmentaltables.

weightsType

The method of determining the site-pair weights used inmodel fitting.

custWeights

Custom weights, as a vector, if given by the user.

Value

A site-pair table with appropriate distance (dissimilarity) andweight columns used for fitting GDM.

Note

This function is called from theformatsitepair functionand not needed by the user.

See Also

formatsitepair


Combines Biological and Environmental Data to Produce a GDM-formattedSite-Pair Table

Description

This function takes input biological data and environmental,geographic, and other predictor data and builds a site-pair table requiredfor fitting a Generalized Dissimilarity Model using thegdmfunction. NOTE: x-y coordinates of sites MUST be present in either thebiological or the environmental data. Site coordinates ideally should be in aprojected coordinate system (i.e., not longitude-latitude) to ensure propercalculation of geographic distances.

The input biological data can be in one of the following four formats. Notethat the general term "species" is used, but any classification of biologicalentities (e.g. functional types, haplotypes, etc) can be used as long as anappropriate distance metric is also supplied (see "dist" argument):

  1. site-by-species matrix

  2. x, y, species list

  3. site-by-site biological distance (dissimilarity) matrix

  4. an existing site-pair table (see Details)

Predictor data can be provided in three formats:

  1. a site-by-predictor matrix with a column for each predictor variableand a row for each site

  2. a terra object SpatRaster, with one raster for each predictor variable

  3. one or more site-by-site distance matrices using the "distPreds"argument (see below).

Usage

formatsitepair(bioData, bioFormat, dist="bray", abundance=FALSE, siteColumn=NULL,XColumn, YColumn, sppColumn=NULL, abundColumn=NULL, sppFilter=0, predData,distPreds=NULL, weightType="equal", custWeights=NULL, sampleSites=1, verbose=FALSE)

Arguments

bioData

The input biological (the response variable) data table, inone of the four formats defined above (see Details).

bioFormat

An integer code specifying the format of bioData. Acceptablevalues are 1, 2, 3, or 4 (see Details).

dist

Default = "bray". A character code indicating the metric toquantify pairwise site distances / dissimilarities. Calls thevegdist function from thevegan package tocalculate dissimilarity and therefore accepts any method available fromvegdist.

abundance

Default = FALSE. Indicates whether the biological data areabundance data (TRUE) or presence-absence (0, 1) data (FALSE).

siteColumn

The name of the column in either the biological orenvironmental data table containing a unique site identifier. If a site columnis provided in both the biological and environmental data, the site column namemust be the same in both tables.

XColumn

The name of the column containing x-coordinates of sites.X-coordinates can be provided in either the biological or environmental datatables, but MUST be in at least one of them. If an x-coordinate column isprovided in both the biological and environmental data, the column name mustbe identical. Site coordinates ideally should be in a projected coordinatesystem (i.e., not longitude-latitude) to ensure proper calculation ofgeographic distances. Note that if you are using rasters, they must be in thesame coordinate system as the site coordinates.

YColumn

The name of the column containing y-coordinates of sample sites.Y-coordinates can be provided in either the biological or environmental datatables, but MUST be in at least one of them. If a y-coordinate column isprovided in both the biological and environmental data, the column name mustbe identical. Site coordinates ideally should be in a projected coordinatesystem (i.e., not longitude-latitude) to ensure proper calculation ofgeographic distances. Note that if you are using rasters, they must be in thesame coordinate system as the site coordinates.

sppColumn

Only used if bioFormat = 2 (x, y, species list). The name ofthe column containing unique name / identifier for each species.

abundColumn

If abundance = TRUE, this parameter identifies the columncontaining the measure of abundance at each site. Only used if bioFormat = 2(i.e., x, y, species list), though in the case of abundance data, the tablewould have four columns: x, y, species, abundance.

sppFilter

Default = 0. To account for limited sampling effort at somesites, sppFilter removes all sites at which the number of recorded species(i.e., observed species richness) is less than the specified value. Forexample, if sppFilter = 5, all sites with fewer than 5 recorded specieswill be removed.

predData

The environmental predictor data. Accepts either asite-by-predictor table or a terra object SpatRaster.

distPreds

An optional list of distance matrices to be used as predictorsin combination with predData. For example, a site-by-site dissimilarity matrixfor one biological group (e.g., trees) can be used as a predictor for anothergroup (e.g., ferns). Each distance matrix must have as the first column thenames of the sites (therefore the matrix will not be square). The name of thecolumn containing the site names should have the same name as that providedfor the siteColumn argument. Site IDs are required here to ensure correct orderingof sites in the construction of the site-pair table. Note that the formatsitepairfunction will not accept distance matrices in the as the only predictors(i.e., at least one additional, non-distPreds predictor variable is required). If you wish to fit GDMusing only distance matrices provided using distPreds, provide one fake predictor (e.g., with all siteshaving the same value), plus site and coordinate columns. The s1 ands2 columns for this fake variable can then be removed by hand before fitting the GDM.

weightType

Default = "equal". Defines the weighting for sites. Can beeither: (1) "equal" (weights for all sites set = 1), (2) "richness" (eachsite weighted according to number of species recorded), or (3) "custom"(user defined). If weightType="custom", the user must provide a vector ofsite weights equal to the number of rows in the full site-pair table (i.e.,before species filtering (sppFilter argument) or sub-sampling is taken intoaccount (sampleSites argument)).

custWeights

A two column matrix or data frame of user-defined siteweights. The first column should be the site name and should be named the sameas that provided for the siteColumn argument. The second column should be numericweight values and should be named "weights". The weight values represent theimportance of each site in model fitting, and the values in the outputsite-pair table is an average of the two sites in each site-pair. Requiredwhen weightType = "custom". Ignored otherwise.

sampleSites

Default = 1. A number between 0-1 indicating the fractionof sites to be used to construct the site-pair table. This argument can beused to reduce the number of sites to overcome possible memory limitationswhen fitting models with very large numbers of sites.

verbose

Default = FALSE. If TRUE, summary of information regardingdimensions of the site-pair table will be printed that can be useful for diagnostics.

Details

bioData and bioFormat:The function accepts biological data in the following formats:

bioData = site-by-species matrix; bioFormat = 1: assumes that the responsedata are provided with a site ID column (specified by siteCol) and, optionally,two columns for the x & y coordinates of the sites. All remaining columnscontain the biological data, with a column for each biological entity (mostcommonly species). In the case that a raster stack (a terra object SpatRaster) is provided for theenvironmental data (predData), x-y coordinates MUST be provided in bioDatato allow extraction of the environmental data at site locations. The x-ycoordinates will be intersected with the raster stack and, if the number ofunique cells intersected by the points is less than the number of unique siteIDs (i.e. multiple sites fall within a single cell), the function will usethe raster cell as the site ID and aggregate sites accordingly. Therefore,model fitting will be sensitive to raster cell size. If the environmentaldata are in tabular format, they should have the same number of sites(i.e., same number of rows) as bioData. The x-y coordinate and site IDcolumns must have the same names in bioData and predData.

bioData = x, y, species list (optionally a fourth column with abundance canbe provided); bioFormat = 2: assumes a table of 3 or 4 columns, the first twobeing the x & y coordinates of species records, the third (sppCol) being thename / identifier of the species observed at that location, and optionally afourth column indicating a measure of abundance. If an abundance column isnot provided, presence-only data are assumed. In the case that a raster stack(a terra object SpatRaster) is provided for the environmental data (predData),the x-y coordinates will be intersected with the raster stack and, if thenumber of unique cells intersected by the points is less than the number ofunique site IDs (i.e. multiple sites fall within a single cell), the functionwill use the raster cell as the site ID and aggregate sites accordingly.Therefore, model fitting will be sensitive to raster cell size.

bioData = site-site distance (dissimilarity) matrix; bioFormat = 3. This optionallows the use of an existing site-site distance (dissimilarity) matrix, such asgenetic distance matrix calculated outside of the gdm package. Only the lowertriangle of the matrix is required to create the site-pair table, but thefunction will automatically removes the upper triangle if present. The codechecks and aligns the order of sites in the distance matrix and the predictordata to ensure they match. To do so, (1) a site column is required in boththe distance matrix and the predictor data and (2) site IDs are required tobe a number. This is the only bioFormat in which the environmental data MAYNOT be a raster stack.

bioData = site-pair table; bioFormat = 4: with an already created site-pairtable, this option allows the user to add one or more distance matrices (seedistPreds above) to the existing site-pair table and/or sub-sample thesite-pair table (see sample above). If the site-pair table was not createdusing the formatsitepair function, the user will need to ensure the order ofthe sites matches that in other tables being provided to the function.

NOTES: (1) The function assumes that the x-y coordinates and the raster stack(if used) are in the same coordinate system. No checking is performed toconfirm this is the case. (2) The function assumes that the association betweenthe provided site and x-y coordinate columns are singular and unique.Therefore, the function will fail should a given site has more than one sets ofcoordinates associated with it, as well as multiple sites being given theexact same coordinates.

Value

A formatted site-pair table containing the response (biologicaldistance or dissimilarity), predictors, and weights as required for fittingGeneralized Dissimilarity Models.

Examples

## tabular data# start with the southwest data table head(southwest) sppData <- southwest[, c(1,2,13,14)] envTab <- southwest[, c(2:ncol(southwest))]#########table type 1## site-species table without coordinatestestData1a <- reshape2::dcast(sppData, site~species)##site-species table with coordinatescoords <- unique(sppData[, 2:ncol(sppData)])testData1b <- merge(testData1a, coords, by="site")## site-species, table-tableexFormat1a <- formatsitepair(testData1a, 1, siteColumn="site", XColumn="Long",YColumn="Lat", predData=envTab)#' # next, let's try environmental raster data## not run# rastFile <- system.file("./extdata/swBioclims.grd", package="gdm")# envRast <- terra::rast(rastFile)## site-species, table-raster## not run# exFormat1b <- formatsitepair(testData1b, 1, siteColumn="site", XColumn="Long",# YColumn="Lat", predData=envRast)#########table type 2## site xy spp list, table-tableexFormat2a <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat",sppColumn="species", siteColumn="site", predData=envTab)## site xy spp list, table-raster## not run# exFormat2b <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat",# sppColumn="species", siteColumn="site", predData=envRast)#########table type 3## It is possible to format a site-pair table by starting# with a pre-calculated matrix of biological distancesdim(gdmDissim) # pairwise distance matrix + 1 column for site IDsgdmDissim[1:5, 1:5]# now we can format the table:exFormat3 <- formatsitepair(gdmDissim, 3, XColumn="Long", YColumn="Lat",                            predData=envTab, siteColumn="site")#########table type 4## adds a predictor matrix to an existing site-pair table, in this case,## predData needs to be provided, but is not actually usedexFormat4 <- formatsitepair(exFormat2a, 4, predData=envTab, siteColumn="site",                            distPreds=list(as.matrix(gdmDissim)))

Fit a Generalized Dissimilarity Model to Tabular Site-Pair Data

Description

The gdm function is used to fit a generalized dissimilarity model to tabularsite-pair data formatted as follows using theformatsitepairfunction: distance, weights, s1.xCoord, s1.yCoord, s2.xCoord, s2.yCoord,s1.Pred1, s1.Pred2, ...,s1.PredN, s2.Pred1, s2.Pred2, ..., s2.PredN. Thedistance column contains the response variable must be any ratio-baseddissimilarity (distance) measure between Site 1 and Site 2. The weights columndefines any weighting to be applied during fitting of the model. If equalweighting is required, then all entries in this column should be set to 1.0(default). The third and fourth columns, s1.xCoord and s1.yCoord, representthe spatial coordinates of the first site in the site pair (s1). The fifthand sixth columns, s2.xCoord and s2.yCoord, represent the coordinates of thesecond site (s2). Note that the first six columns are REQUIRED, even if youdo not intend to use geographic distance as a predictor (in which case thesecolumns can be loaded with dummy data if the actual coordinates areunknown - though that would be weird, no?). The next N*2 columns contain valuesfor N predictors for Site 1, followed by values for the same N predictors forSite 2.

The following is an example of a GDM input table header with threeenvironmental predictors (Temp, Rain, Bedrock):

distance, weights, s1.xCoord, s1.yCoord, s2.xCoord, s2.yCoord, s1.Temp,s1.Rain, s1.Bedrock, s2.Temp, s2.Rain, s2.Bedrock

Usage

gdm(data, geo=FALSE, splines=NULL, knots=NULL)

Arguments

data

A data frame containing the site pairs to be used to fit the GDM(obtained using theformatsitepair function). Theobserved response data must be located in the first column. The weights tobe applied to each site pair must be located in the second column. If geois TRUE, then the s1.xCoord, s1.yCoord and s2.xCoord, s2.yCoord columnswill be used to calculate the geographic distance between site pairs forinclusion as the geographic predictor term in the model. Site coordinatesideally should be in a projected coordinate system (i.e., not longitude-latitude)to ensure proper calculation of geographic distances. If geo is FALSE(default), then the s1.xCoord, s1.yCoord, s2.xCoord and s2.yCoord datacolumns must still be included, but are ignored in fitting the model.Columns containing the predictor data for Site 1, and the predictor datafor Site 2, follow.

geo

Set to TRUE if geographic distance between sites is to be includedas a model term. Set to FALSE if geographic distance is to be omitted fromthe model. Default is FALSE.

splines

An optional vector of the number of I-spline basis functionsto be used for each predictor in fitting the model. If supplied, it musthave the same length as the number of predictors (including geographicdistance if geo is TRUE). If this vector is not provided (splines=NULL),then a default of 3 basis functions is used for all predictors.

knots

An optional vector of knots inunits of the predictorvariables to be used in the fitting process. If knots are supplied andsplines=NULL, then the knots argument must have the same length as thenumber of predictors * n, where n is the number of knots (default=3). If bothknots and the number of splines are supplied, then the length of the knotsargument must be the same as the sum of the values in the splines vector.Note that the default values for knots when the default three I-spline basisfunctions are 0 (minimum), 50 (median), and 100 (maximum) quantiles.

Value

gdm returns a gdm model object. The functionsummary.gdm can be used to obtain or print a synopsis of theresults. A gdm model object is a list containing at least the followingcomponents:

dataname

The name of the table used as the data argumentto the model.

geo

Whether geographic distance was used as apredictor in the model.

gdmdeviance

The deviance of the fitted GDMmodel.

nulldeviance

The deviance of the null model.

explained

The percentage of null deviance explained by the fittedGDM model.

intercept

The fitted value for the intercept term in themodel.

predictors

A list of the names of the predictors that wereused to fit the model, in order of the amount of turnover associated witheach predictor (based on the sum of the I-spline coefficients).

coefficients

A list of the coefficients foreach spline for each of the predictors considered in model fitting.

knots

A vector of the knots derived from the x data (or userdefined), for each predictor.

splines

A vector of the number ofI-spline basis functions used for each predictor.

creationdate

Thedate and time of model creation.

observed

The observed response foreach site pair (from data column 1).

predicted

The predictedresponse for each site pair, from the fitted model (after applying the linkfunction).

ecological

The linear predictor (ecological distance)for each site pair, from the fitted model (before applying the linkfunction).

References

Ferrier S, Manion G, Elith J, Richardson, K (2007) Usinggeneralized dissimilarity modelling to analyse and predict patterns of betadiversity in regional biodiversity assessment.Diversity &Distributions 13, 252-264.

See Also

formatsitepair,summary.gdm,plot.gdm,predict.gdm,gdm.transform

Examples

 ##fit table environmental data # format site-pair table using the southwest data table head(southwest) sppData <- southwest[, c(1,2,13,14)] envTab <- southwest[, c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species",                               siteColumn="site", predData=envTab) ##fit table GDM gdmTabMod <- gdm(sitePairTab, geo=TRUE) summary(gdmTabMod) ##fit raster environmental data ##sets up site-pair table rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) ##environmental raster data sitePairRast <- formatsitepair(sppData, 2, XColumn="Long",                                YColumn="Lat", sppColumn="species",                                siteColumn="site", predData=envRast) ##sometimes raster data returns NA in the site-pair table, these rows will ##have to be removed before fitting gdm sitePairRast <- na.omit(sitePairRast) ##fit raster GDM gdmRastMod <- gdm(sitePairRast, geo=TRUE) summary(gdmRastMod)

Cross-Validation Assessment of a Fitted GDM

Description

Undertake a cross-validation assessment of a GDM fit using allthe predictors included in the formatted GDM input site-pair table (spTable).The cross-validation is run using a specified proportion (train.proportion) ofthe randomly selected sites included in spTable to train the model, with theremaining sites being used to test the performance of the model predictions.The test is repeated a specified number of times (n.crossvalid.tests), witha unique random sample taken each time. Outputs are a number ofcross-validation test metrics.

Usage

gdm.crossvalidation(spTable, train.proportion=0.9, n.crossvalid.tests=1,geo=FALSE, splines=NULL, knots=NULL)

Arguments

spTable

(dataframe) A dataframe holding the GDM input table for modelfitting.

train.proportion

(float) The proportion of sites in 'spTable' to usein training the GDM, with the remaining proportion used to test the model.(default = 0.9)

n.crossvalid.tests

(integer) The number of cross-validation sets touse in testing the GDM. (default = 1)

geo

(boolean) Geographic distance to be used in model fitting(default = FALSE).

splines

(vector) An optional vector of the number of I-spline basisfunctions to be used for each predictor in fitting the model.

knots

(vector) An optional vector of knots in units of the predictorvariables to be used in the fitting process.

Value

List, providing cross-validation statistics. These are metrics that describe how well the model fit using thesitepair training table predicts the dissimilarities in the site-pair testing table. Metrics provided include:'Train.Deviance.Explained' (the deviance explained for the training data);'Test.Deviance.Explained' (the deviance explained for the test data);'Mean.Error';'Mean.Absolute.Error';'Root.Mean.Square.Error';'Obs.Pred.Correlation' (Pearson's correlation coefficient between observed and predicted values);'Equalized.RMSE' (the average root mean square error across bands of observed dissimilarities (0.05 dissimilarity units));'Error.by.Observed.Value' (the average root mean square error and number of observations within bands of observed dissimilarities (0.05 dissimilarity units)).


Perform Deviance Partitioning of a Fitted GDM

Description

Partitions deviance explained from GDM into differentuser specified components - most typically environment versus space.

Usage

gdm.partition.deviance(sitePairTable, varSets=list(), partSpace=TRUE)

Arguments

sitePairTable

A correctly formatted site-pair table fromformatsitepair.

varSets

A list in which each element is a vector of variable namesacross which deviance partitioning is to be performed, excludinggeographic distance (which is set by the partSpace argument). Variable namesmust match those used to build the site-pair table. See example.

partSpace

Whether or not to perform the partitioning usinggeographic space. Default=TRUE.

Value

A dataframe summarizing deviance partitioning results.

Author(s)

Matt Fitzpatrick and Karel Mokany

Examples

# set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,13,14)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat",sppColumn="species", siteColumn="site", predData=envTab)# EXAMPLE - Partition two groups of variables# Make list of variable sets for partitioningvarSet <- vector("list", 2)# now, name the variable groups for partitioning# note you do not need to add "space" as this is only needed# for environmental variables# two groups (soils & climate)names(varSet) <- c("soil", "climate")# lastly, add variable names forvarSet$soil <- c("awcA", "phTotal", "sandA", "shcA", "solumDepth")varSet$climate <- c("bio5", "bio6", "bio15", "bio18", "bio19")varSet# run the function to partition soils, climate, and space (partSpace=TRUE)scgPart <- gdm.partition.deviance(sitePairTab, varSet, partSpace=TRUE)# EXAMPLE - Partition three groups of variables# Make list of variable sets for partitioningvarSet <- vector("list", 3)names(varSet) <- c("soil", "temp", "precip")varSet$soil <- c("awcA", "phTotal", "sandA", "shcA", "solumDepth")varSet$temp <- c("bio5", "bio6")varSet$precip <- c("bio15", "bio18", "bio19")# partition soils, temperature, and precip# note we can't also partition space given the function's limit to a# maximum of three variable sets, so we set partSpace=FALSEscPart <- gdm.partition.deviance(sitePairTab, varSet, partSpace=FALSE)

Single GDM Cross-Validation Test, Internal Function

Description

Undertake a cross-validation assessment of a GDM, using a singletraining and testing dataset.

Usage

gdm.single.crossvalidation(spTable_train, spTable_test, geo=FALSE,splines=NULL, knots=NULL)

Arguments

spTable_train

(dataframe) A dataframe holding the GDM input table formodel fitting.

spTable_test

(dataframe) A dataframe holding the GDM input table formodel testing, having identical column names to 'spTable_train' but usingdifferent site-pairs.

geo

(boolean) Geographic distance to be used in model fitting(default = FALSE).

splines

(vector) An optional vector of the number of I-spline basisfunctions to be used for each predictor in fitting the model.

knots

(vector) An optional vector of knots in units of the predictorvariables to be used in the fitting process.

Value

List, providing cross-validation statistics. These are metrics thatdescribe how well the model fit using the sitepair training table predictsthe dissimilarities in the sitepair testing table. Metrics provided include:'Deviance.Explained' (the deviance explained for the training data);'Test.Deviance.Explained' (the deviance explained for the test data);'Mean.Error';'Mean.Absolute.Error';'Root.Mean.Squre.Error';'Obs.Pred.Correlation' (Pearson's correlation coefficient between observed and predicted values);'Equalised.RMSE' (the average root mean square error across bands of observed dissimilarities (0.05 dissimialrity units));'Error.by.Observed.Value' (the average root mean square error and number of observations within bands of observed dissimilarities (0.05 dissimialrity units)).


Transform Environmental Data Using a Fitted Generalized Dissimilarity Model

Description

This function transforms geographic and environmental predictors using (1) thefitted functions from a model object returned fromgdm and (2) adata frame or raster object containing predictor data for a set of sites.

Usage

gdm.transform(model, data, filename = "", ...)

Arguments

model

A gdm model object resulting from a call togdm.

data

Either (i) a data frame containing values for each predictor variable in the model, formatted as follows: X, Y, var1, var2, var3, ..., varN or(ii) a terra object SpatRaster with one layer per predictor variable used in the model,excluding X and Y (rasters for x- and y-coordinates are built automatically from the inputrasters if the model was fit with geo=TRUE). The order of the columns (data frame) or raster layers (SpatRaster) MUST be the same as the order of the predictors inthe site-pair table used in model fitting. There is currently no checking to ensure that the orderof the variables to be transformed are the same as those in the site-pair table used in model fitting.If geographic distance was not used as a predictor in model fitting, the x- and y-columnsneed to be removed from the data to be transformed.Output is provided in the same format as the input data.

filename

character. Output filename for rasters. When provided the raster layers arewritten to file directly.

...

additional arguments to pass to terrapredict function.

Value

gdm.transform returns either a data frame with the same number of rows as the input data frame or a SpatRaster,depending on the format of the input data. If the model uses geographic distance as a predictor the output objectwill contain columns or layers for the transformed X and Y values for each site.The transformed environmental data will be in the remaining columns or layers.

References

Ferrier S, Manion G, Elith J, Richardson, K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment.Diversity & Distributions 13, 252-264.

Fitzpatrick MC, Keller SR (2015) Ecological genomics meets community-level modeling of biodiversity: Mapping the genomic landscape of current and future environmental adaptation.Ecology Letters 18: 1-16

Examples

# start with the southwest data set# grab the columns with xy, site ID, and species datasppTab <- southwest[, c("species", "site", "Lat", "Long")]##fit gdm using rastersrastFile <- system.file("./extdata/swBioclims.grd", package="gdm")envRast <- terra::rast(rastFile)sitePairRast <- formatsitepair(sppTab, 2, XColumn="Long", YColumn="Lat",                               sppColumn="species", siteColumn="site",                               predData=envRast)##remove NA valuessitePairRast <- na.omit(sitePairRast)##fit raster GDMgdmRastMod <- gdm(sitePairRast, geo=TRUE)##raster input, raster outputtransRasts <- gdm.transform(gdmRastMod, envRast)# map biological patterns; increase maxcell if using large rasterspcaSamp <- terra::prcomp(transRasts, maxcell = 1e4)# note the use of the 'index' argumentpcaRast <- terra::predict(transRasts, pcaSamp, index=1:3)# stretch the PCA rasters to make full use of the colour spectrumpcaRast <- terra::stretch(pcaRast)terra::plotRGB(pcaRast, r=1, g=2, b=3)

Assess Predictor Importance and Quantify Model Significance in a FittedGeneralized Dissimilarity Model.

Description

This function uses matrix permutation to perform model andpredictor significance testing and to estimate predictor importance in ageneralized dissimilarity model. The function can be run in parallel onmulticore machines to reduce computation time.

Usage

gdm.varImp(spTable, geo, splines = NULL, knots = NULL,predSelect = FALSE, nPerm = 50, pValue=0.05, parallel = FALSE, cores = 2,sampleSites = 1, sampleSitePairs = 1, outFile = NULL)

Arguments

spTable

A site-pair table, same as used to fit agdm.

geo

Similar to thegdm geo argument. The onlydifference is that the geo argument does not have a default in this function.

splines

Same as thegdm splines argument. Note thatthe current implementation requires that all predictors have the same number ofsplines.

knots

Same as thegdm knots argument.

predSelect

Set to TRUE to perform predictor selection using matrixpermutation and backward elimination. Default is FALSE. When predSelect = FALSEresults will be returned only for a model fit with all predictors.

nPerm

Number of permutations to use to estimate p-values. Default is 50.

pValue

The p-value to use for predictor selection / elimination. Default is 0.05.

parallel

Whether or not to run the matrix permutations and modelfitting in parallel. Parallel processing is highly recommended when either(i) the nPerms argument is large (>100) or (ii) a large number of site-pairs(and / or variables) are being used in model fitting (note computation demandcan be reduced using subsampling - see next arguments). The default is FALSE.

cores

When the parallel argument is set to TRUE, the number of coresto be registered for parallel processing. Must be <= the number of cores inthe machine running the function. There is no benefit to setting the number ofcores greater than the number of predictors in the model.

sampleSites

The fraction (0-1, though a value of 0 would be silly,wouldn't it?) ofsites to retain from the full site-pair table. Ifless than 1, this argument will completely remove a fraction of sites suchthat they are not used in the permutation routines.

sampleSitePairs

The fraction (0-1) ofsite-pairs (i.e., rows)to retain from the full site-pair table - in other words, all sites willbe used in the permutation routines (assuming sampleSites = 1), but notallsite-pair combinations. In the case where both the sampleSitesand the sampleSitePairs argument have values less than 1, sites first willbe removed using the sampleSites argument, followed by removal of site-pairsusing the sampleSitePairs argument. Note that the number of site-pairsremoved is based on the fraction of the resulting site-pair table aftersites have been removed, not on the size of the full site-pair table.

outFile

An optional character string to write the object returned bythe function to disk as an .RData object (".RData" is not required as partof the file name). The .RData object will contain a single list with thename of "outObject". The default is NULL, meaning that no file will be written.

Details

To test model significance, first a model is fit using all predictors andun-permuted environmental data. Any predictor for which the sum of the I-splinecoefficients sum to zero is preemptively removed. Next, the environmental data are permutednPerm times (by randomizing the order of the rows) and a GDM is fit to eachpermuted table. Model significance is determined by comparing the devianceexplained by GDM fit to the un-permuted table to the distribution of devianceexplained values from GDM fit to the nPerm permuted tables. To assess predictorsignificance, this process is repeated for each predictor individually (i.e.,only the data for the predictor being tested is permuted rather than the entireenvironmental table). Predictor importance is quantified as the percent changein deviance explained between a model fit with and without that predictor permuted. IfpredSelect=TRUE, this process continues by next permutating the site-pairtable nPerm times, but removing one predictor at a time and reassessingpredictor importance and significance. At each step, the least importantpredictor is dropped (backward elimination) and the process continues untilall non-significant predictors are removed, with significance level being setby the user and the pValue argument.

Value

A list of four tables. The first table summarizes full model deviance,percent deviance explained by the full model, the p-value of the full model,and the number of permutations used to calculate the statistics for eachfitted model (i.e., the full model and each model with predictors removed insuccession during the backward elimination procedure if predSelect=T). Theremaining three tables summarize (1) predictor importance, (2) predictorsignificance, and (3) the number of permutations used to calculate thestatistics for that model, which is provided because some GDMs may failto converge for some permutations / predictor combinations and you might want toknow how many permutations were used when calculating statistics. Or maybe youdon't, you decide.

Predictor importance is measured as the percent decrease in deviance explainedbetween the full model and the deviance explained by a model fit with that predictorpermuted. Significance is estimated using the bootstrapped p-value when thepredictor has been permuted. For most cases, the number of permutations willequal the nPerm argument. However, the value may be less should any of the modelsfit to them permuted tables fail to converge.

If predSelect=FALSE, the tables will have values only in the first column.

Author(s)

Matt Fitzpatrick and Karel Mokany

References

Ferrier S, Manion G, Elith J, Richardson, K (2007) Usinggeneralized dissimilarity modelling to analyse and predict patterns ofbeta diversity in regional biodiversity assessment.Diversity &Distributions 13, 252-264.

Fitzpatrick, MC, Sanders NJ, Ferrier S, Longino JT, Weiser MD, and RR Dunn.2011. Forecasting the Future of Biodiversity: a Test of Single- andMulti-Species Models for Ants in North America.Ecography 34: 836-47.

Examples

##fit table environmental data##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,13,14)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat",sppColumn="species", siteColumn="site", predData=envTab)## not run#modTest <- gdm.varImp(sitePairTab, geo=T, nPerm=50, parallel=T, cores=10, predSelect=T)#barplot(sort(modTest$`Predictor Importance`[,1], decreasing=T))

An example biological dissimilarity matrix

Description

Pairwise Bray-Curtis dissimilarity calculated using the speciesoccurrence data from thesouthwest data set.

Usage

gdmDissim

Format

A data frame with 94 rows and 95 columns (extra column holds site IDs):


Extract I-spline Values From a Fitted Generalized DissimilarityModel.

Description

Extracts the I-spline values from a gdm object. There is oneI-spline for each predictor that has at least one non-zero coefficient inthe fitted model.

Usage

isplineExtract(model)

Arguments

model

A gdm object fromgdm.

Value

A list with two items. The first item contains the x-values (actualvalues of the predictors) of the I-splines and the second item contains they-values (partial ecological distances) of the fitted I-splines.

References

Ferrier S, Manion G, Elith J, Richardson, K (2007) Usinggeneralized dissimilarity modelling to analyse and predict patterns of betadiversity in regional biodiversity assessment.Diversity & Distributions13, 252-264.

Fitzpatrick MC, Sanders NJ, Normand S, Svenning J-C, Ferrier S, Gove AD,Dunn RR (2013). Environmental and historical imprints on beta diversity: insightsfrom variation in rates of species turnover along gradients. Proceedings of theRoyal Society: Series B 280, art. 1768

Examples

##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,14,13)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species",                              siteColumn="site", predData=envTab)##create GDMgdmMod <- gdm(sitePairTab, geo=TRUE)##extracts splinesexSplines <- isplineExtract(gdmMod)##plot spline(s)#spline for winter precip (bio19)plot(exSplines[[1]][,"bio19"], exSplines[[2]][,"bio19"], type="l",     lwd=3, xlab="Winter precipitation (mm)", ylab="Partial Ecological Distance")

Permutate Site-Pair Table Rows, Internal Function

Description

A function which randomizes the rows of a given site-pair table. This functionis called from thegdm.varImp function and not needed by the user.be called directly by the user.

Usage

permutateSitePair(spTab, siteVarTab, indexTab, vNames)

Arguments

spTab

A site-pair table.

siteVarTab

A site x variable table.

indexTab

A table of index values for the site-pair table.

vNames

Vector of variable names in both the site-pair andsite x variable tables.

Value

A new site-pair table with rows randomized.

Note

This function is called from thegdm.varImp functionand not needed by the user.

See Also

gdm.varImp


Plot Model Fit and I-splines from a Fitted Generalized DissimilarityModel.

Description

plot is used to plot the I-splines and fit of a generalizeddissimilarity model created using thegdm function.

Usage

## S3 method for class 'gdm'plot(x, plot.layout = c(2, 2), plot.color = "blue",  plot.linewidth = 2, include.rug = FALSE, rug.sitepair = NULL, ...)

Arguments

x

A gdm model object returned fromgdm.

plot.layout

This argument specifies the row and column layout for theplots, including: (1) a single page plot of observed response data againstthe raw linear predictor (ecological distance) from the model, and (2) asingle page plot of the observed response against the predicted responsefrom the model, i.e. after applying the link function, 1.0 - exp(-y), to thelinear predictor, and (3) the I-splines fitted to the individual predictors.Default is 2 rows by 2 columns. To produce one predictor plot per page setplot.layout to c(1,1). The first two model plots are always produced on asingle page each and therefore the layout parameter affects only the layoutof the I-spline plots for those predictors that featured in the model fittingprocess (i.e., predictors with all-zero I-spline coefficients are not plotted).

plot.color

Color of the data points that are plotted for the overall plots.

plot.linewidth

The line width for the regression line over-plotted inthe two overall plots to optimize the display of the line over the data points.

include.rug

Whether or not to include a rug plot of the predictorvalues used to fit the gdm in the I-spline plots. When set to TRUE, a site-pair table must be supplied for the rug.sitepair argument. Default is FALSE.

rug.sitepair

A site-pair table used to add a rug plot of the predictorvalues used to fit the gdm in the I-spline plots. This should be the samesite-pair table used to fit the gdm model being plotted. The function doesnot check whether the supplied site-pair table matches that used in model fitting.

...

Ignored.

Value

plot returns NULL. Usesummary.gdm to obtain asynopsis of the model object.

References

Ferrier S, Manion G, Elith J, Richardson, K (2007) Usinggeneralized dissimilarity modelling to analyse and predict patterns ofbeta diversity in regional biodiversity assessment.Diversity & Distributions 13:252-264.

See Also

isplineExtract

Examples

##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,13,14)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat",                              sppColumn="species", siteColumn="site",                              predData=envTab)##create GDMgdmMod <- gdm(sitePairTab, geo=TRUE)##plot GDMplot(gdmMod, plot.layout=c(3,3))

Plot I-splines With Error Bands Using Bootstrapping.

Description

This function estimates uncertainty in the fitted I-splinesby fitting many GDMs using a subsample of the data. The function can run in parallelon multicore machines to reduce computation time (recommended for large numberof iterations). I-spline plots with error bands (+/- one standard deviation)are produced showing (1) the variance of I-spline coefficients and (2) a rug plotindicating how sites used in model fitting are distributed along each gradient.Function result optionally can be saved to disk as a csv for custom plotting, etc.The result output table will have 6 columns per predictor, three each for thex and y values containing the lower bound, full model, and upper bound.

Usage

plotUncertainty(spTable, sampleSites, bsIters, geo=FALSE,splines=NULL, knots=NULL, splineCol="blue", errCol="grey80",plot.linewidth=2.0, plot.layout=c(2,2), parallel=FALSE, cores=2, save=FALSE,fileName="gdm.plotUncertainy.csv")

Arguments

spTable

A site-pair table, same as used to fit agdm.

sampleSites

The fraction (0-1) of sites to retain from the fullsite-pair table when subsampling.

bsIters

The number of bootstrap iterations to perform.

geo

Same as thegdm geo argument.

splines

Same as thegdm splines argument.

knots

Same as thegdm knots argument.

splineCol

The color of the plotted mean spline. The default is "blue".

errCol

The color of shading for the error bands (+/- one standarddeviation around the mean line). The default is "grey80".

plot.linewidth

The line width of the plotted mean spline line. Thedefault is 2.

plot.layout

Same as theplot.gdm plot.layout argument.

parallel

Perform the uncertainty assessment using multiplecores? Default = FALSE.

cores

When the parallel argument is set to TRUE, the number ofcores to be registered for the foreach loop. Must be <= the number of coresin the machine running the function.

save

Save the function result (e.g., for custom plotting)? Default=FALSE.

fileName

Name of the csv file to save the data frame that contains the functionresult. Default = gdm.plotUncertainy.csv. Ignored if save=FALSE.

Value

plotUncertainty returns NULL. Saves a csv to disk if save=TRUE.

References

Shryock, D. F., C. A. Havrilla, L. A. DeFalco, T. C. Esque,N. A. Custer, and T. E. Wood. 2015. Landscape genomics ofSphaeralcea ambiguain the Mojave Desert: a multivariate, spatially-explicit approach to guideecological restoration.Conservation Genetics 16:1303-1317.

See Also

plot.gdm,formatsitepair,subsample.sitepair

Examples

##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,13,14)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat",                              sppColumn="species", siteColumn="site", predData=envTab)##plot GDM uncertainty using one core#not run#plotUncertainty(sitePairTab, sampleSites=0.70, bsIters=5, geo=TRUE, plot.layout=c(3,3))##plot GDM uncertainty in parallel#not run#plotUncertainty(sitePairTab, sampleSites=0.70, bsIters=50, geo=TRUE, plot.layout=c(3,3),                 #parallel=T, cores=10)

Predict Biological Dissimilarities Between Sites or Times Using aFitted Generalized Dissimilarity Model

Description

This function predicts biological distances between sites or times using amodel object returned fromgdm. Predictions between sitepairs require a data frame containing the values of predictors for pairsof locations, formatted as follows: distance, weights, s1.X, s1.Y, s2.X,s2.Y, s1.Pred1, s1.Pred2, ..., s1.PredN, s2.Pred1, s2.Pred2, ..., s2.PredN, ...,Predictions of biological change through time require two raster stacks orbricks for environmental conditions at two time periods, each with alayer for each environmental predictor in the fitted model.

Usage

## S3 method for class 'gdm'predict(object, data, time=FALSE, predRasts=NULL, filename="", ...)

Arguments

object

A gdm model object resulting from a call togdm.

data

Either a data frame containing the values of predictors for pairsof sites, in the same format and structure as used to fit the model usinggdm or a raster stack if a prediction of biological changethrough time is needed.

For a data frame, the first two columns - distance and weights - are requiredby the function but are not used in the prediction and can therefore be filledwith dummy data (e.g. all zeros). If geo is TRUE, then the s1.X, s1.Y and s2.X,s2.Y columns will be used for calculating the geographical distance betweeneach site for inclusion of the geographic predictor term into the GDM model.If geo is FALSE, then the s1.X, s1.Y, s2.X and s2.Y data columns are ignored.However these columns are still REQUIRED and can be filled with dummy data(e.g. all zeroes). The remaining columns are for N predictors for Site 1 andfollowed by N predictors for Site 2. The order of the columns must match thosein the site-pair table used to fit the model.

A raster stack should be provided only when time=T and should contain onelayer for each environmental predictor in the same order as the columns inthe site-pair table used to fit the model.

time

TRUE/FALSE: Is the model prediction for biological change through time?

predRasts

A raster stack characterizing environmental conditions for adifferent time in the past or future, with the same extent, resolution, andlayer order as the data object. Required only if time=T.

filename

character. Output filename for rasters. When provided the raster layers arewritten to file directly.

...

additional arguments to pass to terrapredict function.

Value

predict returns either a response vector with the same length as thenumber of rows in the input data frame or a raster depicting change through time across the study region.

See Also

gdm.transform

Examples

##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,14,13)]envTab <- southwest[, c(2:ncol(southwest))]# remove soils (no rasters for these)envTab <- envTab[,-c(2:6)]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species",                             siteColumn="site", predData=envTab)# create GDMgdmMod <- gdm(sitePairTab, geo=TRUE)##predict GDMpredDiss <- predict(gdmMod, sitePairTab)##time examplerastFile <- system.file("./extdata/swBioclims.grd", package="gdm")envRast <- terra::rast(rastFile)##make some fake climate change datafutRasts <- envRast##reduce winter precipitation by 25%futRasts[[3]] <- futRasts[[3]]*0.75timePred <- predict(gdmMod, envRast, time=TRUE, predRasts=futRasts)terra::plot(timePred)

Species and Environmental Data from Southwestern Australia.

Description

A data set containing species occurrence and associated environmentaldata at 94 sites in southwestern Australia.

Usage

southwest

Format

A data frame with 29364 rows and 14 variables:

species

species name

site

site name

awcA

plant-available water capacity in soil horizon A

phTotal

soil pH

sandA

percent sand content in soil horizon A

shcA

saturated hydraulic conductivity in soil horizon A

solumDepth

soil depth to unweathered parent material

bio5

maximum temperature of the coldest month

bio6

minimum temperature of the coldest month

bio15

precipitation seasonality

bio18

precipitation of warmest quarter

bio19

precipitation of coldest quarter

Lat

latitude

Long

longitude


Remove Sites at Random from a Site-Pair Table

Description

Randomly selects a number of sites from a given site-pair tableand removes them from the site-pair table. It will remove all instances ofthe sites randomly selected to be removed in both s1 and s2 positions.

Usage

subsample.sitepair(spTable, sampleSites)

Arguments

spTable

A site-pair table, same as used to fit agdm.

sampleSites

The fraction (0-1, though a value of 0 would be silly,wouldn't it?) ofsites to retain from the full site-pair table. Ifless than 1, this argument will completely remove a fraction of sites suchthat they are not used in the permutation routines.

Value

A site-pair table, such as one created byformatsitepair,ideally smaller than the one given. In the very rare case where the functiondetermines not to remove any sites, or should the sampleSites argument be 1,then the function will return the full site-pair table.

Note

This function removes sites, not just site-pairs (rows) from thesite-pair table. This function is called from several of the other functionswithin the gdm package, including theplotUncertainty andgdm.varImp functions, for the purposes of subsampling the sitesin the site-pair table.

See Also

formatsitepair

Examples

##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,13,14)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species",                              siteColumn="site", predData=envTab)subsample.sitepair(sitePairTab, sampleSites=0.7)

Summarize a Fitted Generalized Dissimilarity Model

Description

This function summarizes the gdm model object returned fromgdm.

Usage

## S3 method for class 'gdm'summary(object, ...)

Arguments

object

A gdm model object resulting from a call togdm.

...

Ignored.

Value

summary prints its output to the R Console window and returns no value.

See Also

gdm

Examples

##set up site-pair table using the southwest data setsppData <- southwest[, c(1,2,14,13)]envTab <- southwest[, c(2:ncol(southwest))]sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species",                              siteColumn="site", predData=envTab)##create GDMgdmMod <- gdm(sitePairTab, geo=TRUE)##summary of GDMsummary(gdmMod)

[8]ページ先頭

©2009-2025 Movatter.jp