| Type: | Package |
| Title: | Ergonomic Methods for Assessing Spatial Models |
| Version: | 0.6.3 |
| Description: | Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with "hot spots" of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the 'tidymodels' framework. Methods include Moran's I ('Moran' (1950) <doi:10.2307/2332142>), Geary's C ('Geary' (1954) <doi:10.2307/2986645>), Getis-Ord's G ('Ord' and 'Getis' (1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>), agreement coefficients from 'Ji' and Gallo (2006) (<doi:10.14358/PERS.72.7.823>), agreement metrics from 'Willmott' (1981) (<doi:10.1080/02723646.1981.10642213>) and 'Willmott' 'et' 'al'. (2012) (<doi:10.1002/joc.2419>), an implementation of the area of applicability methodology from 'Meyer' and 'Pebesma' (2021) (<doi:10.1111/2041-210X.13650>), and an implementation of multi-scale assessment as described in 'Riemann' 'et' 'al'. (2010) (<doi:10.1016/j.rse.2010.05.010>). |
| License: | MIT + file LICENSE |
| URL: | https://github.com/ropensci/waywiser,https://docs.ropensci.org/waywiser/ |
| BugReports: | https://github.com/ropensci/waywiser/issues |
| Depends: | R (≥ 4.0) |
| Imports: | dplyr (≥ 1.1.0), fields, FNN, glue, hardhat, Matrix, purrr,rlang (≥ 1.1.0), sf (≥ 1.0-0), spdep (≥ 1.1-9), stats,tibble, tidyselect, vctrs, yardstick (≥ 1.2.0) |
| Suggests: | applicable, caret, CAST, covr, exactextractr, ggplot2, knitr,modeldata, recipes, rmarkdown, rsample, spatialsample, terra,testthat (≥ 3.0.0), tidymodels, tidyr, tigris, units, vip,whisker, withr |
| VignetteBuilder: | knitr |
| Config/Needs/website: | kableExtra |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | true |
| Encoding: | UTF-8 |
| Language: | en-US |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2025-04-15 23:57:54 UTC; mikemahoney218 |
| Author: | Michael Mahoney |
| Maintainer: | Michael Mahoney <mike.mahoney.218@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-04-16 00:10:02 UTC |
waywiser: Ergonomic Methods for Assessing Spatial Models
Description

Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with "hot spots" of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the 'tidymodels' framework. Methods include Moran's I ('Moran' (1950)doi:10.2307/2332142), Geary's C ('Geary' (1954)doi:10.2307/2986645), Getis-Ord's G ('Ord' and 'Getis' (1995)doi:10.1111/j.1538-4632.1995.tb00912.x), agreement coefficients from 'Ji' and Gallo (2006) (doi: 10.14358/PERS.72.7.823), agreement metrics from 'Willmott' (1981) (doi: 10.1080/02723646.1981.10642213) and 'Willmott' 'et' 'al'. (2012) (doi: 10.1002/joc.2419), an implementation of the area of applicability methodology from 'Meyer' and 'Pebesma' (2021) (doi:10.1111/2041-210X.13650), and an implementation of multi-scale assessment as described in 'Riemann' 'et' 'al'. (2010) (doi:10.1016/j.rse.2010.05.010).
Author(s)
Maintainer: Michael Mahoneymike.mahoney.218@gmail.com (ORCID)
Other contributors:
Lucas Johnsonlucas.k.johnson03@gmail.com (ORCID) [contributor]
Virgilio Gómez-Rubio (Virgilio reviewed the package (v. 0.2.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/571>) [reviewer]
Jakub Nowosad (Jakub reviewed the package (v. 0.2.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/571>) [reviewer]
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs athttps://github.com/ropensci/waywiser/issues
Guerry "Moral Statistics" (1830s)
Description
This data and description are taken from the geodaData R package.Classic social science foundational study by Andre-Michel Guerry on crime, suicide, literacy and other “moral statistics” in 1830s France. Data from the R package Guerry (Michael Friendly and Stephane Dray).
Usage
guerryFormat
An sf data frame with 85 rows, 23 variables, and a geometry column:
- dept
Department ID: Standard numbers for the departments
- Region
Region of France ('N'='North', 'S'='South', 'E'='East', 'W'='West', 'C'='Central'). Corsica is coded as NA.
- Dprtmnt
Department name: Departments are named according to usage in 1830, but without accents. A factor with levelsAinAisneAllier ...Vosges Yonne
- Crm_prs
Population per Crime against persons.
- Crm_prp
Population per Crime against property.
- Litercy
Percent of military conscripts who can read and write.
- Donatns
Donations to the poor.
- Infants
Population per illegitimate birth.
- Suicids
Population per suicide.
- Maincty
Size of principal city ('1:Sm', '2:Med', '3:Lg'), used as a surrogate for population density. Large refers to the top 10, small to the bottom 10; all the rest are classed Medium.
- Wealth
Per capita tax on personal property. A ranked index based on taxes on personal and movable property per inhabitant.
- Commerc
Commerce and Industry, measured by the rank of the number of patents / population.
- Clergy
Distribution of clergy, measured by the rank of the number of Catholic priests in active service population.
- Crim_prn
Crimes against parents, measured by the rank of the ratio of crimes against parents to all crimes – Average for the years 1825-1830.
- Infntcd
Infanticides per capita. A ranked ratio of number of infanticides to population – Average for the years 1825-1830.
- Dntn_cl
Donations to the clergy. A ranked ratio of the number of bequests and donations inter vivios to population – Average for the years 1815-1824.
- Lottery
Per capita wager on Royal Lottery. Ranked ratio of the proceeds bet on the royal lottery to population — Average for the years 1822-1826.
- Desertn
Military desertion, ratio of number of young soldiers accused of desertion to the force of the military contingent, minus the deficit produced by the insufficiency of available billets – Average of the years 1825-1827.
- Instrct
Instruction. Ranks recorded from Guerry's map of Instruction. Note: this is inversely related to Literacy
- Prsttts
Number of prostitutes registered in Paris from 1816 to 1834, classified by the department of their birth
- Distanc
Distance to Paris (km). Distance of each department centroid to the centroid of the Seine (Paris)
- Area
Area (1000 km^2).
- Pop1831
Population in 1831, in 1000s
Details
Sf object, units in m. EPSG 27572: NTF (Paris) / Lambert zone II.
Source
Angeville, A. (1836). Essai sur la Statistique de la Population française Paris: F. Doufour.
Guerry, A.-M. (1833). Essai sur la statistique morale de la France Paris: Crochard. English translation: Hugh P. Whitt and Victor W. Reinking, Lewiston, N.Y. : Edwin Mellen Press, 2002.
Parent-Duchatelet, A. (1836). De la prostitution dans la ville de Paris, 3rd ed, 1857, p. 32, 36
https://geodacenter.github.io/data-and-lab/Guerry/
Examples
if (requireNamespace("sf", quietly = TRUE)) { library(sf) data(guerry) plot(guerry["Donatns"])}Number of trees and aboveground biomass for Forest Inventory and Analysis plots in New York State
Description
The original data is derived from the Forest Inventory and Analysis program,implemented by the US Department of Agriculture's Forest Service.
Usage
ny_treesFormat
An sf object using EPSG 5070: NAD83 / Conus Albers (in meters), with 5,303 rows and 5 columns:
- yr
The year measurements were taken.
- plot
A unique identifier signifying the plot measurements were taken at.
- n_trees
The number of trees present on a plot.
- agb
The total aboveground biomass at the plot location, in pounds.
- geometry
The centroid of the plot location.
Predict from aww_area_of_applicability
Description
Predict from aww_area_of_applicability
Usage
## S3 method for class 'ww_area_of_applicability'predict(object, new_data, ...)Arguments
object | A |
new_data | A data frame or matrix of new samples. |
... | Not used. |
Details
The function computes the distance indices of the new data andwhether or not they are "inside" the area of applicability.
Value
A tibble of predictions, with two columns:di, numeric, contains the"dissimilarity index" of each point innew_data, whileaoa, logical,contains whether a row is inside (TRUE) or outside (FALSE) the area ofapplicability.
Note that this function is often called usingterra::predict(), in which caseaoa will be converted to numericimplicitly;1 values correspond to cells "inside" the area of applicabilityand0 corresponds to cells "outside" the AOA.
The number of rows in the tibble is guaranteedto be the same as the number of rows innew_data. Rows withNA predictorvalues will haveNAdi andaoa values.
See Also
Other area of applicability functions:ww_area_of_applicability()
Examples
library(vip)train <- gen_friedman(1000, seed = 101) # ?vip::gen_friedmantest <- train[701:1000, ]train <- train[1:700, ]pp <- stats::ppr(y ~ ., data = train, nterms = 11)metric_name <- ifelse( packageVersion("vip") > package_version("0.3.2"), "rsq", "rsquared")importance <- vip::vi_permute( pp, target = "y", metric = metric_name, pred_wrapper = predict, train = train)aoa <- ww_area_of_applicability(y ~ ., train, test, importance = importance)predict(aoa, test)Print number of predictors and area-of-applicability threshold
Description
Print number of predictors and area-of-applicability threshold
Usage
## S3 method for class 'ww_area_of_applicability'print(x, digits = getOption("digits"), ...)Arguments
x | A |
digits | The number of digits to print, used when rounding the AOA threshold. |
... | These dots are for future extensions and must be empty. |
Examples
library(vip)trn <- gen_friedman(500, seed = 101) # ?vip::gen_friedmanpp <- ppr(y ~ ., data = trn, nterms = 11)metric_name <- ifelse( packageVersion("vip") > package_version("0.3.2"), "rsq", "rsquared")importance <- vip::vi_permute( pp, target = "y", metric = metric_name, pred_wrapper = predict, train = trn)ww_area_of_applicability(trn[2:11], importance = importance)Simulated data based on WorldClim Bioclimatic variables
Description
This data is adapted from the CAST vignettevignette("cast02-AOA-tutorial", package = "CAST").The original data is derived from the Worldclim global climate variables.
Usage
worldclim_simulationFormat
An sf object with 10,000 rows and 6 columns:
- bio2
Mean Diurnal Range (Mean of monthly (max temp - min temp))
- bio10
Mean Temperature of Warmest Quarter
- bio13
Precipitation of Wettest Month
- bio19
Precipitation of Coldest Quarter
- geometry
The location of the sampled point.
- response
A virtual species distribution, generated using the
generateSpFromPCA()function from thevirtualspeciespackage.
Source
Agreement coefficients and related methods
Description
These functions calculate the agreement coefficient and mean productdifference (MPD), as well as their systematic and unsystematic components,from Ji and Gallo (2006). Agreement coefficients provides a usefulmeasurement of agreement between two data sets which is bounded, symmetrical,and can be decomposed into systematic and unsystematic components;however, it assumes a linear relationship between the two data sets andtreats both "truth" and "estimate" as being of equal quality, and as such maynot be a useful metric in all scenarios.
Usage
ww_agreement_coefficient(data, ...)## S3 method for class 'data.frame'ww_agreement_coefficient(data, truth, estimate, na_rm = TRUE, ...)ww_agreement_coefficient_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_agreement_coefficient(data, ...)## S3 method for class 'data.frame'ww_systematic_agreement_coefficient(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_agreement_coefficient_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_agreement_coefficient(data, ...)## S3 method for class 'data.frame'ww_unsystematic_agreement_coefficient(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_agreement_coefficient_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mpd(data, ...)## S3 method for class 'data.frame'ww_unsystematic_mpd(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mpd_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_mpd(data, ...)## S3 method for class 'data.frame'ww_systematic_mpd(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_mpd_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmpd(data, ...)## S3 method for class 'data.frame'ww_unsystematic_rmpd(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmpd_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_rmpd(data, ...)## S3 method for class 'data.frame'ww_systematic_rmpd(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_rmpd_vec(truth, estimate, na_rm = TRUE, ...)Arguments
data | A |
... | Not currently used. |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
na_rm | A |
Details
Agreement coefficient values range from 0 to 1, with 1 indicating perfectagreement.truth andestimate must be the same length. This function isnot explicitly spatial and as such can be applied to data with any number ofdimensions and any coordinate reference system.
Value
A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as the number of groups.For_vec() functions, a single value (or NA).
References
Ji, L. and Gallo, K. 2006. "An Agreement Coefficient for Image Comparison."Photogrammetric Engineering & Remote Sensing 72(7), pp 823–833,doi: 10.14358/PERS.72.7.823.
See Also
Other agreement metrics:ww_willmott_d()
Other yardstick metrics:ww_global_geary_c(),ww_global_moran_i(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_local_moran_i(),ww_willmott_d()
Examples
# Calculated values match Ji and Gallo 2006:x <- c(6, 8, 9, 10, 11, 14)y <- c(2, 3, 5, 5, 6, 8)ww_agreement_coefficient_vec(x, y)ww_systematic_agreement_coefficient_vec(x, y)ww_unsystematic_agreement_coefficient_vec(x, y)ww_systematic_mpd_vec(x, y)ww_unsystematic_mpd_vec(x, y)ww_systematic_rmpd_vec(x, y)ww_unsystematic_rmpd_vec(x, y)example_df <- data.frame(x = x, y = y)ww_agreement_coefficient(example_df, x, y)ww_systematic_agreement_coefficient(example_df, x, y)ww_unsystematic_agreement_coefficient(example_df, x, y)ww_systematic_mpd(example_df, x, y)ww_unsystematic_mpd(example_df, x, y)ww_systematic_rmpd(example_df, x, y)ww_unsystematic_rmpd(example_df, x, y)Find the area of applicability
Description
This function calculates the "area of applicability" of a model, asintroduced by Meyer and Pebesma (2021). While the initial paper introducingthis method focused on spatial models, there is nothing inherently spatialabout the method; it can be used with any type of data (and, because it doesnot care about the spatial arrangement of your data, can be used with 2D or3D spatial data, and with geographic or projected CRS).
Usage
ww_area_of_applicability(x, ...)## S3 method for class 'data.frame'ww_area_of_applicability(x, testing = NULL, importance, ..., na_rm = FALSE)## S3 method for class 'matrix'ww_area_of_applicability(x, testing = NULL, importance, ..., na_rm = FALSE)## S3 method for class 'formula'ww_area_of_applicability( x, data, testing = NULL, importance, ..., na_rm = FALSE)## S3 method for class 'recipe'ww_area_of_applicability( x, data, testing = NULL, importance, ..., na_rm = FALSE)## S3 method for class 'rset'ww_area_of_applicability(x, y = NULL, importance, ..., na_rm = FALSE)Arguments
x | Either a data frame, matrix, formula(specifying predictor terms on the right-hand side), recipe(from If |
... | Not currently used. |
testing | A data frame or matrix containing the data used tovalidate your model. This should be the same data as used to calculate allmodel accuracy metrics. If this argument is |
importance | Either:
All variables in the training data ( |
na_rm | A logical of length 1, indicating whether observations (in bothtraining and testing) with |
data | The data frame representing your "training" data, when using theformula or recipe methods. |
y | Optional: a recipe (from If |
Details
Predictions made on points "inside" the area of applicability should be asaccurate as predictions made on the data provided totesting.That means that generallytesting should be your final hold-outset so that predictions on points inside the area of applicability areaccurately described by your reported model metrics.When passing anrset object tox, predictions made on points "inside" thearea of applicability instead should be as accurate as predictions made onthe assessment sets during cross-validation.
This method assumes your model was fit using dummy variables in the place ofany non-numeric predictor, and that you have one importance score perdummy variable. Having non-numeric predictors will cause this function tofail.
Value
Aww_area_of_applicability object, which can be used withpredict() tocalculate the distance of new data to the original training data, anddetermine if new data is within a model's area of applicability.
Differences from CAST
This implementation differs fromMeyer and Pebesma (2021) (and therefore from CAST) when using cross-validateddata in order to minimize data leakage. Namely, in order to calculatethe dissimilarity indexDI_{k}, CAST:
Rescales all data used for cross validation at once, lumping assessmentfolds in with analysis data.
Calculates a single
\bar{d}as the mean distance between all pointsin the rescaled data set, including between points in the same assessmentfold.For each point
kthat's used in an assessment fold, calculatesd_{k}as the minimum distance betweenkand any point in itscorresponding analysis fold.Calculates
DI_{k}by dividingd_{k}by\bar{d}(whichwas partially calculated as the distance betweenkand the rest ofthe rescaled data).
Because assessment data is used to calculate constants for rescaling analysisdata and\bar{d}, the assessment data may appear too "similar" tothe analysis data when calculatingDI_{k}. As such, waywiser treatseach fold in anrset independently:
Each analysis set is rescaled independently.
Separate
\bar{d}are calculated for each fold, as the mean distancebetween all points in the analysis set for that fold.Identically to CAST,
d_{k}is the minimum distance between a pointkin the assessment fold and any point in thecorresponding analysis fold.DI_{k}is then found by dividingd_{k}by\bar{d},which was calculated independently fromk.
Predictions are made using the full training data set, rescaled once (inthe same way as CAST), and the mean\bar{d} across folds, under theassumption that the "final" model in use will be retrained using the entiredata set.
In practice, this means waywiser produces very slightly higher\bar{d}values than CAST and a slightly higher area of applicability threshold thanCAST when usingrset objects.
References
H. Meyer and E. Pebesma. 2021. "Predicting into unknown space? Estimatingthe area of applicability of spatial prediction models," Methods in Ecologyand Evolution 12(9), pp 1620 - 1633, doi: 10.1111/2041-210X.13650.
See Also
Other area of applicability functions:predict.ww_area_of_applicability()
Examples
train <- vip::gen_friedman(1000, seed = 101) # ?vip::gen_friedmantest <- train[701:1000, ]train <- train[1:700, ]pp <- stats::ppr(y ~ ., data = train, nterms = 11)metric_name <- ifelse( packageVersion("vip") > package_version("0.3.2"), "rsq", "rsquared")importance <- vip::vi_permute( pp, target = "y", metric = metric_name, pred_wrapper = predict, train = train)aoa <- ww_area_of_applicability(y ~ ., train, test, importance = importance)predict(aoa, test)# Equivalent methods for calculating AOA:ww_area_of_applicability(train[2:11], test[2:11], importance)ww_area_of_applicability( as.matrix(train[2:11]), as.matrix(test[2:11]), importance)Make 'nb' objects from sf objects
Description
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Usage
ww_build_neighbors(data, nb = NULL, ..., call = rlang::caller_env())Arguments
data | An sf object (of class "sf" or "sfc"). |
nb | An object of class "nb" (in which case it will be returnedunchanged), or a function to create an object of class "nb" from |
... | Arguments passed to the neighbor-creating function. |
call | The execution environment of a currently runningfunction, e.g. You only need to supply Can also be For more information about error calls, seeIncluding function calls in error messages. |
Details
Whennb = NULL, the method used to create neighbors fromdata isdependent on what geometry typedata is:
If
nb = NULLanddatais a point geometry(classes "sfc_POINT" or "sfc_MULTIPOINT") the "nb" object will be createdusingww_make_point_neighbors().If
nb = NULLanddatais a polygon geometry(classes "sfc_POLYGON" or "sfc_MULTIPOLYGON") the "nb" object will be createdusingww_make_polygon_neighbors().If
nb = NULLanddatais any other geometry type, the "nb" object willbe created using the centroids of the data as points, with a warning.
Value
An object of class "nb".
Examples
ww_build_neighbors(guerry)Build "listw" objects of spatial weights
Description
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Usage
ww_build_weights(x, wt = NULL, include_self = FALSE, ...)Arguments
x | Either an sf object or a "nb" neighbors list object.If an sf object, will be converted into a neighbors list via |
wt | Either a "listw" object (which will be returned unchanged),a function for creating a "listw" object from |
include_self | Include each region itself in its own list of neighbors? |
... | Arguments passed to the weight constructing function. |
Value
Alistw object.
Examples
ww_build_weights(guerry)Global Geary's C statistic
Description
Calculate the global Geary's C statistic for model residuals.ww_global_geary_c() returns the statistic itself, whileww_global_geary_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying if there are clusters of higher residuals than expected. Forstatistical testing and inference applications, usespdep::geary.test() instead.
Usage
ww_global_geary_c(data, ...)ww_global_geary_c_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_global_geary_pvalue(data, ...)ww_global_geary_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)Arguments
data | A |
... | Additional arguments passed to |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
wt | A |
na_rm | A |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as thenumber of groups.For_vec() functions, a single value (or NA).
References
Geary, R. C. (1954). "The Contiguity Ratio and Statistical Mapping". TheIncorporated Statistician. 5 (3): 115–145. doi:10.2307/2986645.
Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion, p. 17.
See Also
Other autocorrelation metrics:ww_global_moran_i(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_local_moran_i()
Other yardstick metrics:ww_agreement_coefficient(),ww_global_moran_i(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_local_moran_i(),ww_willmott_d()
Examples
guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_global_geary_c(guerry_model, Crm_prs, predictions)ww_global_geary_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_global_geary_c_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)ww_global_geary_pvalue_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)Global Moran's I statistic
Description
Calculate the global Moran's I statistic for model residuals.ww_global_moran_i() returns the statistic itself, whileww_global_moran_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying if there are clusters of higher residuals than expected. Forstatistical testing and inference applications, usespdep::moran.test() instead.
Usage
ww_global_moran_i(data, ...)ww_global_moran_i_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)ww_global_moran_pvalue(data, ...)ww_global_moran_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)Arguments
data | A |
... | Additional arguments passed to |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
wt | A |
na_rm | A |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as thenumber of groups.For_vec() functions, a single value (or NA).
References
Moran, P.A.P. (1950). "Notes on Continuous Stochastic Phenomena." Biometrika,37(1/2), pp 17. doi: 10.2307/2332142
Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion, p. 17.
See Also
Other autocorrelation metrics:ww_global_geary_c(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_local_moran_i()
Other yardstick metrics:ww_agreement_coefficient(),ww_global_geary_c(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_local_moran_i(),ww_willmott_d()
Examples
guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_global_moran_i(guerry_model, Crm_prs, predictions)ww_global_moran_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_global_moran_i_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)ww_global_moran_pvalue_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)Local Geary's C statistic
Description
Calculate the local Geary's C statistic for model residuals.ww_local_geary_c() returns the statistic itself, whileww_local_geary_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying clusters of higher residuals than expected. For statisticaltesting and inference applications, usespdep::localC_perm() instead.
Usage
ww_local_geary_c(data, ...)ww_local_geary_c_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_local_geary_pvalue(data, ...)ww_local_geary_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)Arguments
data | A |
... | Additional arguments passed to |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
wt | A |
na_rm | A |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
A tibble with columns .metric, .estimator, and .estimate andnrow(data)rows of values.For_vec() functions, a numeric vector oflength(truth) (or NA).
References
Anselin, L. 1995. Local indicators of spatial association, GeographicalAnalysis, 27, pp 93–115. doi: 10.1111/j.1538-4632.1995.tb00338.x.
Anselin, L. 2019. A Local Indicator of Multivariate Spatial Association:Extending Geary's C. Geographical Analysis, 51, pp 133-150.doi: 10.1111/gean.12164
See Also
Other autocorrelation metrics:ww_global_geary_c(),ww_global_moran_i(),ww_local_getis_ord_g(),ww_local_moran_i()
Other yardstick metrics:ww_agreement_coefficient(),ww_global_geary_c(),ww_global_moran_i(),ww_local_getis_ord_g(),ww_local_moran_i(),ww_willmott_d()
Examples
guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_local_geary_c(guerry_model, Crm_prs, predictions)ww_local_geary_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_local_geary_c_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)ww_local_geary_pvalue_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)Local Getis-Ord G and G* statistic
Description
Calculate the local Getis-Ord G and G* statistic for model residuals.ww_local_getis_ord_g() returns the statistic itself, whileww_local_getis_ord_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying clusters of higher residuals than expected. For statisticaltesting and inference applications, usespdep::localG_perm() instead.
Usage
ww_local_getis_ord_g(data, ...)ww_local_getis_ord_g_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_local_getis_ord_g_pvalue(data, ...)ww_local_getis_ord_g_pvalue_vec(truth, estimate, wt, na_rm = FALSE, ...)Arguments
data | A |
... | Additional arguments passed to |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
wt | A |
na_rm | A |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
A tibble with columns .metric, .estimator, and .estimate andnrow(data)rows of values.For_vec() functions, a numeric vector oflength(truth) (or NA).
References
Ord, J. K. and Getis, A. 1995. Local spatial autocorrelation statistics:distributional issues and an application. Geographical Analysis, 27, 286–306.doi: 10.1111/j.1538-4632.1995.tb00912.x
See Also
Other autocorrelation metrics:ww_global_geary_c(),ww_global_moran_i(),ww_local_geary_c(),ww_local_moran_i()
Other yardstick metrics:ww_agreement_coefficient(),ww_global_geary_c(),ww_global_moran_i(),ww_local_geary_c(),ww_local_moran_i(),ww_willmott_d()
Examples
guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_local_getis_ord_g(guerry_model, Crm_prs, predictions)ww_local_getis_ord_g_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_local_getis_ord_g_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)ww_local_getis_ord_g_pvalue_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)Local Moran's I statistic
Description
Calculate the local Moran's I statistic for model residuals.ww_local_moran_i() returns the statistic itself, whileww_local_moran_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying clusters of higher residuals than expected. For statisticaltesting and inference applications, usespdep::localmoran_perm() instead.
Usage
ww_local_moran_i(data, ...)ww_local_moran_i_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_local_moran_pvalue(data, ...)ww_local_moran_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)Arguments
data | A |
... | Additional arguments passed to |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
wt | A |
na_rm | A |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
A tibble with columns .metric, .estimator, and .estimate andnrow(data)rows of values.For_vec() functions, a numeric vector oflength(truth) (or NA).
References
Anselin, L. 1995. Local indicators of spatial association, GeographicalAnalysis, 27, pp 93–115. doi: 10.1111/j.1538-4632.1995.tb00338.x.
Sokal, R. R, Oden, N. L. and Thomson, B. A. 1998. Local SpatialAutocorrelation in a Biological Model. Geographical Analysis, 30, pp 331–354.doi: 10.1111/j.1538-4632.1998.tb00406.x
See Also
Other autocorrelation metrics:ww_global_geary_c(),ww_global_moran_i(),ww_local_geary_c(),ww_local_getis_ord_g()
Other yardstick metrics:ww_agreement_coefficient(),ww_global_geary_c(),ww_global_moran_i(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_willmott_d()
Examples
guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_local_moran_i(guerry_model, Crm_prs, predictions)ww_local_moran_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_local_moran_i_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)ww_local_moran_pvalue_vec( guerry_model$Crm_prs, guerry_model$predictions, wt = wt)Make 'nb' objects from point geometries
Description
This function usesspdep::knearneigh() andspdep::knn2nb() tocreate a "nb" neighbors list.
Usage
ww_make_point_neighbors(data, k = 1, sym = FALSE, ...)Arguments
data | An |
k | How many nearest neighbors to use in |
sym | Force the output neighbors list (from |
... | Other arguments passed to |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
An object of class "nb"
Examples
ww_make_point_neighbors(ny_trees)Make 'nb' objects from polygon geometries
Description
This function is an extremely thin wrapper aroundspdep::poly2nb(),renamed to use the waywiser "ww" prefix.
Usage
ww_make_polygon_neighbors(data, ...)Arguments
data | An |
... | Additional arguments passed to |
Details
These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.
Value
An object of class "nb"
Examples
ww_make_polygon_neighbors(guerry)Evaluate metrics at multiple scales of aggregation
Description
Evaluate metrics at multiple scales of aggregation
Usage
ww_multi_scale( data = NULL, truth, estimate, metrics = list(yardstick::rmse, yardstick::mae), grids = NULL, ..., na_rm = TRUE, aggregation_function = "mean", autoexpand_grid = TRUE, progress = TRUE)Arguments
data | Either: a point geometry |
truth,estimate | If |
metrics | Either a |
grids | Optionally, a list of pre-computed |
... | Arguments passed to |
na_rm | Boolean: Should polygons with NA values be removed beforecalculating metrics? Note that this doesnot impact how values areaggregated to polygons: if you want to remove NA values before aggregating,provide a function to |
aggregation_function | The function to use to aggregate predictions andtrue values at various scales, by default |
autoexpand_grid | Boolean: if |
progress | Boolean: if |
Value
A tibble with six columns:.metric, with the nameof the metric that the row describes;.estimator, with the name of theestimator used,.estimate, with the output of the metric function;.grid_args, with the arguments passed tosf::st_make_grid() via...(if any),.grid, containing the grids used to aggregate predictions,as well as the aggregated values oftruth andestimate as well as thecount of non-NA values for each, and.notes, which (ifdata is ansfobject) will indicate any observations which were not used in a givenassessment.
Raster inputs
Ifdata isNULL, thentruth andestimate should both beSpatRasterobjects, as created viaterra::rast(). These rasters will then beaggregated to each grid usingexactextractr::exact_extract(). Ifdatais aSpatRaster object, thentruth andestimate should be indices toselect the appropriate layers of the raster viaterra::subset().
Grids are calculated using the bounding box oftruth, under the assumptionthat you may have extrapolated into regions which do not have matching "true"values. This function does not check thattruth andestimate overlap atall, or that they are at all contained within the grid.
Creating grid blocks
The grid blocks can be controlled by passing arguments tosf::st_make_grid() via.... Some particularly useful arguments include:
cellsize: Target cellsize, expressed as the "diameter" (shorteststraight-line distance between opposing sides; two times the apothem)of each block, in map units.n: The number of grid blocks in the x and y direction (columns, rows).square: A logical value indicating whether to create square (TRUE) orhexagonal (FALSE) cells.
If bothcellsize andn are provided, then the number of blocks requestedbyn of sizes specified bycellsize will be returned, likely notlining up with the bounding box ofdata. If onlycellsizeis provided, this function will return as many blocks of sizecellsize as fit inside the bounding box ofdata. If onlyn is provided,thencellsize will be automatically adjusted to create the requestednumber of cells.
Grids are created by mapping over each argument passed via...simultaneously, in a similar manner tomapply() orpurrr::pmap(). Thismeans that, for example, passingn = list(c(1, 2)) will create a single1x2 grid, while passingn = c(1, 2) will create a 1x1 gridand a 2x2grid. It also means that arguments will be recycled using R's standardvector recycling rules, so that passingn = c(1, 2) andsquare = FALSEwill create two separate grids of hexagons.
This function can be used for geographic or projected coordinate referencesystems and expects 2D data.
References
Riemann, R., Wilson, B. T., Lister, A., and Parks, S. (2010). "An effectiveassessment protocol for continuous geospatial datasets of forestcharacteristics using USFS Forest Inventory and Analysis (FIA) data."Remote Sensing of Environment 114(10), pp 2337-2352,doi: 10.1016/j.rse.2010.05.010 .
Examples
data(ames, package = "modeldata")ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)ames_model <- lm(Sale_Price ~ Lot_Area, data = ames_sf)ames_sf$predictions <- predict(ames_model, ames_sf)ww_multi_scale( ames_sf, Sale_Price, predictions, n = list( c(10, 10), c(1, 1) ), square = FALSE)# or, mostly equivalently# (there will be a slight difference due to `autoexpand_grid = TRUE`)grids <- list( sf::st_make_grid(ames_sf, n = c(10, 10), square = FALSE), sf::st_make_grid(ames_sf, n = c(1, 1), square = FALSE))ww_multi_scale(ames_sf, Sale_Price, predictions, grids = grids)Willmott's d and related values
Description
These functions calculate Willmott's d value, a proposed replacement for R2which better differentiates between types and magnitudes of possiblecovariations. Additional functions calculate systematic and unsystematiccomponents of MSE and RMSE; the sum of the systematic and unsystematiccomponents of MSE equal total MSE (though the same is not true for RMSE).
Usage
ww_willmott_d(data, ...)## S3 method for class 'data.frame'ww_willmott_d(data, truth, estimate, na_rm = TRUE, ...)ww_willmott_d_vec(truth, estimate, na_rm = TRUE, ...)ww_willmott_d1(data, ...)## S3 method for class 'data.frame'ww_willmott_d1(data, truth, estimate, na_rm = TRUE, ...)ww_willmott_d1_vec(truth, estimate, na_rm = TRUE, ...)ww_willmott_dr(data, ...)## S3 method for class 'data.frame'ww_willmott_dr(data, truth, estimate, na_rm = TRUE, ...)ww_willmott_dr_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_mse(data, ...)## S3 method for class 'data.frame'ww_systematic_mse(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_mse_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mse(data, ...)## S3 method for class 'data.frame'ww_unsystematic_mse(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mse_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_rmse(data, ...)## S3 method for class 'data.frame'ww_systematic_rmse(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_rmse_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmse(data, ...)## S3 method for class 'data.frame'ww_unsystematic_rmse(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmse_vec(truth, estimate, na_rm = TRUE, ...)Arguments
data | A |
... | Not currently used. |
truth | The column identifier for the true results(that is |
estimate | The column identifier for the predictedresults (that is also |
na_rm | A |
Details
Values of d and d1 range from 0 to 1, with 1 indicating perfect agreement.Values ofdr range from -1 to 1, with 1 similarly indicating perfect agreement. Valuesof RMSE are in the same units astruth andestimate, while values of MSEare in squared units.truth andestimate must be the same length. Thisfunction is not explicitly spatial and as such can be applied to data withany number of dimensions and any coordinate reference system.
Value
A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as the number of groups.For_vec() functions, a single value (or NA).
References
Willmott, C. J. 1981. "On the Validation of Models". Physical Geography 2(2),pp 184-194, doi: 10.1080/02723646.1981.10642213.
Willmott, C. J. 1982. "Some Comments on the Evaluation of Model Performance".Bulletin of the American Meteorological Society 63(11), pp 1309-1313,doi: 10.1175/1520-0477(1982)063<1309:SCOTEO>2.0.CO;2.
Willmott C. J., Ackleson S. G., Davis R. E., Feddema J. J., Klink K. M.,Legates D. R., O’Donnell J., Rowe C. M. 1985. "Statistics for theevaluation of model performance." Journal of Geophysical Research90(C5): 8995–9005, doi: 10.1029/jc090ic05p08995
Willmott, C. J., Robeson, S. M., and Matsuura, K. "A refined index of modelperformance". International Journal of Climatology 32, pp 2088-2094, doi:10.1002/joc.2419.
See Also
Other agreement metrics:ww_agreement_coefficient()
Other yardstick metrics:ww_agreement_coefficient(),ww_global_geary_c(),ww_global_moran_i(),ww_local_geary_c(),ww_local_getis_ord_g(),ww_local_moran_i()
Examples
x <- c(6, 8, 9, 10, 11, 14)y <- c(2, 3, 5, 5, 6, 8)ww_willmott_d_vec(x, y)ww_willmott_d1_vec(x, y)ww_willmott_dr_vec(x, y)ww_systematic_mse_vec(x, y)ww_unsystematic_mse_vec(x, y)ww_systematic_rmse_vec(x, y)ww_unsystematic_rmse_vec(x, y)example_df <- data.frame(x = x, y = y)ww_willmott_d(example_df, x, y)ww_willmott_d1(example_df, x, y)ww_willmott_dr(example_df, x, y)ww_systematic_mse(example_df, x, y)ww_unsystematic_mse(example_df, x, y)ww_systematic_rmse(example_df, x, y)ww_unsystematic_rmse(example_df, x, y)