Movatterモバイル変換

Type:

Package

Title:

Ergonomic Methods for Assessing Spatial Models

Version:

0.6.3

Description:

Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with "hot spots" of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the 'tidymodels' framework. Methods include Moran's I ('Moran' (1950) <doi:10.2307/2332142>), Geary's C ('Geary' (1954) <doi:10.2307/2986645>), Getis-Ord's G ('Ord' and 'Getis' (1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>), agreement coefficients from 'Ji' and Gallo (2006) (<doi:10.14358/PERS.72.7.823>), agreement metrics from 'Willmott' (1981) (<doi:10.1080/02723646.1981.10642213>) and 'Willmott' 'et' 'al'. (2012) (<doi:10.1002/joc.2419>), an implementation of the area of applicability methodology from 'Meyer' and 'Pebesma' (2021) (<doi:10.1111/2041-210X.13650>), and an implementation of multi-scale assessment as described in 'Riemann' 'et' 'al'. (2010) (<doi:10.1016/j.rse.2010.05.010>).

License:

MIT + file LICENSE

URL:

https://github.com/ropensci/waywiser,https://docs.ropensci.org/waywiser/

BugReports:

https://github.com/ropensci/waywiser/issues

Depends:

R (≥ 4.0)

Imports:

dplyr (≥ 1.1.0), fields, FNN, glue, hardhat, Matrix, purrr,rlang (≥ 1.1.0), sf (≥ 1.0-0), spdep (≥ 1.1-9), stats,tibble, tidyselect, vctrs, yardstick (≥ 1.2.0)

Suggests:

applicable, caret, CAST, covr, exactextractr, ggplot2, knitr,modeldata, recipes, rmarkdown, rsample, spatialsample, terra,testthat (≥ 3.0.0), tidymodels, tidyr, tigris, units, vip,whisker, withr

VignetteBuilder:

knitr

Config/Needs/website:

kableExtra

Config/testthat/edition:

Config/testthat/parallel:

true

Encoding:

UTF-8

Language:

en-US

LazyData:

true

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-04-15 23:57:54 UTC; mikemahoney218

Author:

Michael Mahoney

[aut, cre], Lucas Johnson

[ctb], Virgilio Gómez-Rubio [rev] (Virgilio reviewed the package (v. 0.2.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/571>), Jakub Nowosad [rev] (Jakub reviewed the package (v. 0.2.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/571>), Posit Software, PBC [cph, fnd]

Maintainer:

Michael Mahoney <mike.mahoney.218@gmail.com>

Repository:

CRAN

Date/Publication:

2025-04-16 00:10:02 UTC

waywiser: Ergonomic Methods for Assessing Spatial Models

Description

logo

Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with "hot spots" of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the 'tidymodels' framework. Methods include Moran's I ('Moran' (1950)doi:10.2307/2332142), Geary's C ('Geary' (1954)doi:10.2307/2986645), Getis-Ord's G ('Ord' and 'Getis' (1995)doi:10.1111/j.1538-4632.1995.tb00912.x), agreement coefficients from 'Ji' and Gallo (2006) (doi: 10.14358/PERS.72.7.823), agreement metrics from 'Willmott' (1981) (doi: 10.1080/02723646.1981.10642213) and 'Willmott' 'et' 'al'. (2012) (doi: 10.1002/joc.2419), an implementation of the area of applicability methodology from 'Meyer' and 'Pebesma' (2021) (doi:10.1111/2041-210X.13650), and an implementation of multi-scale assessment as described in 'Riemann' 'et' 'al'. (2010) (doi:10.1016/j.rse.2010.05.010).

Author(s)

Maintainer: Michael Mahoneymike.mahoney.218@gmail.com (ORCID)

Other contributors:

Lucas Johnsonlucas.k.johnson03@gmail.com (ORCID) [contributor]
Virgilio Gómez-Rubio (Virgilio reviewed the package (v. 0.2.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/571>) [reviewer]
Jakub Nowosad (Jakub reviewed the package (v. 0.2.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/571>) [reviewer]
Posit Software, PBC [copyright holder, funder]

Guerry "Moral Statistics" (1830s)

Description

This data and description are taken from the geodaData R package.Classic social science foundational study by Andre-Michel Guerry on crime, suicide, literacy and other “moral statistics” in 1830s France. Data from the R package Guerry (Michael Friendly and Stephane Dray).

Usage

guerry

Format

An sf data frame with 85 rows, 23 variables, and a geometry column:

dept: Department ID: Standard numbers for the departments
Region: Region of France ('N'='North', 'S'='South', 'E'='East', 'W'='West', 'C'='Central'). Corsica is coded as NA.
Dprtmnt: Department name: Departments are named according to usage in 1830, but without accents. A factor with levelsAinAisneAllier ...Vosges Yonne
Crm_prs: Population per Crime against persons.
Crm_prp: Population per Crime against property.
Litercy: Percent of military conscripts who can read and write.
Donatns: Donations to the poor.
Infants: Population per illegitimate birth.
Suicids: Population per suicide.
Maincty: Size of principal city ('1:Sm', '2:Med', '3:Lg'), used as a surrogate for population density. Large refers to the top 10, small to the bottom 10; all the rest are classed Medium.
Wealth: Per capita tax on personal property. A ranked index based on taxes on personal and movable property per inhabitant.
Commerc: Commerce and Industry, measured by the rank of the number of patents / population.
Clergy: Distribution of clergy, measured by the rank of the number of Catholic priests in active service population.
Crim_prn: Crimes against parents, measured by the rank of the ratio of crimes against parents to all crimes – Average for the years 1825-1830.
Infntcd: Infanticides per capita. A ranked ratio of number of infanticides to population – Average for the years 1825-1830.
Dntn_cl: Donations to the clergy. A ranked ratio of the number of bequests and donations inter vivios to population – Average for the years 1815-1824.
Lottery: Per capita wager on Royal Lottery. Ranked ratio of the proceeds bet on the royal lottery to population — Average for the years 1822-1826.
Desertn: Military desertion, ratio of number of young soldiers accused of desertion to the force of the military contingent, minus the deficit produced by the insufficiency of available billets – Average of the years 1825-1827.
Instrct: Instruction. Ranks recorded from Guerry's map of Instruction. Note: this is inversely related to Literacy
Prsttts: Number of prostitutes registered in Paris from 1816 to 1834, classified by the department of their birth
Distanc: Distance to Paris (km). Distance of each department centroid to the centroid of the Seine (Paris)
Area: Area (1000 km^2).
Pop1831: Population in 1831, in 1000s

Details

Sf object, units in m. EPSG 27572: NTF (Paris) / Lambert zone II.

Source

Angeville, A. (1836). Essai sur la Statistique de la Population française Paris: F. Doufour.
Guerry, A.-M. (1833). Essai sur la statistique morale de la France Paris: Crochard. English translation: Hugh P. Whitt and Victor W. Reinking, Lewiston, N.Y. : Edwin Mellen Press, 2002.
Parent-Duchatelet, A. (1836). De la prostitution dans la ville de Paris, 3rd ed, 1857, p. 32, 36

https://geodacenter.github.io/data-and-lab/Guerry/

Examples

if (requireNamespace("sf", quietly = TRUE)) {  library(sf)  data(guerry)  plot(guerry["Donatns"])}

Number of trees and aboveground biomass for Forest Inventory and Analysis plots in New York State

Description

The original data is derived from the Forest Inventory and Analysis program,implemented by the US Department of Agriculture's Forest Service.

Usage

ny_trees

Format

An sf object using EPSG 5070: NAD83 / Conus Albers (in meters), with 5,303 rows and 5 columns:

yr: The year measurements were taken.
plot: A unique identifier signifying the plot measurements were taken at.
n_trees: The number of trees present on a plot.
agb: The total aboveground biomass at the plot location, in pounds.
geometry: The centroid of the plot location.

Predict from a`ww_area_of_applicability`

Description

Predict from aww_area_of_applicability

Usage

## S3 method for class 'ww_area_of_applicability'predict(object, new_data, ...)

Arguments

object

Aww_area_of_applicability object.

new_data

A data frame or matrix of new samples.

...

Not used.

Details

The function computes the distance indices of the new data andwhether or not they are "inside" the area of applicability.

Value

A tibble of predictions, with two columns:di, numeric, contains the"dissimilarity index" of each point innew_data, whileaoa, logical,contains whether a row is inside (TRUE) or outside (FALSE) the area ofapplicability.

Note that this function is often called usingterra::predict(), in which caseaoa will be converted to numericimplicitly;1 values correspond to cells "inside" the area of applicabilityand0 corresponds to cells "outside" the AOA.

The number of rows in the tibble is guaranteedto be the same as the number of rows innew_data. Rows withNA predictorvalues will haveNAdi andaoa values.

Examples

library(vip)train <- gen_friedman(1000, seed = 101) # ?vip::gen_friedmantest <- train[701:1000, ]train <- train[1:700, ]pp <- stats::ppr(y ~ ., data = train, nterms = 11)metric_name <- ifelse(  packageVersion("vip") > package_version("0.3.2"),  "rsq",  "rsquared")importance <- vip::vi_permute(  pp,  target = "y",  metric = metric_name,  pred_wrapper = predict,  train = train)aoa <- ww_area_of_applicability(y ~ ., train, test, importance = importance)predict(aoa, test)

Print number of predictors and area-of-applicability threshold

Description

Print number of predictors and area-of-applicability threshold

Usage

## S3 method for class 'ww_area_of_applicability'print(x, digits = getOption("digits"), ...)

Arguments

x

Aww_area_of_applicability object.

digits

The number of digits to print, used when rounding the AOA threshold.

...

These dots are for future extensions and must be empty.

Examples

library(vip)trn <- gen_friedman(500, seed = 101) # ?vip::gen_friedmanpp <- ppr(y ~ ., data = trn, nterms = 11)metric_name <- ifelse(  packageVersion("vip") > package_version("0.3.2"),  "rsq",  "rsquared")importance <- vip::vi_permute(  pp,  target = "y",  metric = metric_name,  pred_wrapper = predict,  train = trn)ww_area_of_applicability(trn[2:11], importance = importance)

Simulated data based on WorldClim Bioclimatic variables

Description

This data is adapted from the CAST vignettevignette("cast02-AOA-tutorial", package = "CAST").The original data is derived from the Worldclim global climate variables.

Usage

worldclim_simulation

Format

An sf object with 10,000 rows and 6 columns:

bio2: Mean Diurnal Range (Mean of monthly (max temp - min temp))
bio10: Mean Temperature of Warmest Quarter
bio13: Precipitation of Wettest Month
bio19: Precipitation of Coldest Quarter
geometry: The location of the sampled point.
response: A virtual species distribution, generated using thegenerateSpFromPCA() function from thevirtualspecies package.

Source

https://www.worldclim.org

Agreement coefficients and related methods

Description

These functions calculate the agreement coefficient and mean productdifference (MPD), as well as their systematic and unsystematic components,from Ji and Gallo (2006). Agreement coefficients provides a usefulmeasurement of agreement between two data sets which is bounded, symmetrical,and can be decomposed into systematic and unsystematic components;however, it assumes a linear relationship between the two data sets andtreats both "truth" and "estimate" as being of equal quality, and as such maynot be a useful metric in all scenarios.

Usage

ww_agreement_coefficient(data, ...)## S3 method for class 'data.frame'ww_agreement_coefficient(data, truth, estimate, na_rm = TRUE, ...)ww_agreement_coefficient_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_agreement_coefficient(data, ...)## S3 method for class 'data.frame'ww_systematic_agreement_coefficient(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_agreement_coefficient_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_agreement_coefficient(data, ...)## S3 method for class 'data.frame'ww_unsystematic_agreement_coefficient(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_agreement_coefficient_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mpd(data, ...)## S3 method for class 'data.frame'ww_unsystematic_mpd(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mpd_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_mpd(data, ...)## S3 method for class 'data.frame'ww_systematic_mpd(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_mpd_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmpd(data, ...)## S3 method for class 'data.frame'ww_unsystematic_rmpd(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmpd_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_rmpd(data, ...)## S3 method for class 'data.frame'ww_systematic_rmpd(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_rmpd_vec(truth, estimate, na_rm = TRUE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Not currently used.

truth

The column identifier for the true results(that isnumeric). This should be an unquoted column name althoughthis argument is passed by expression and supportsquasiquotation (you can unquote columnnames). For⁠_vec()⁠ functions, anumeric vector.

estimate

The column identifier for the predictedresults (that is alsonumeric). As withtruth this can bespecified different ways but the primary method is to use anunquoted variable name. For⁠_vec()⁠ functions, anumeric vector.

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

Agreement coefficient values range from 0 to 1, with 1 indicating perfectagreement.truth andestimate must be the same length. This function isnot explicitly spatial and as such can be applied to data with any number ofdimensions and any coordinate reference system.

Value

A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as the number of groups.For⁠_vec()⁠ functions, a single value (or NA).

References

Ji, L. and Gallo, K. 2006. "An Agreement Coefficient for Image Comparison."Photogrammetric Engineering & Remote Sensing 72(7), pp 823–833,doi: 10.14358/PERS.72.7.823.

Examples

# Calculated values match Ji and Gallo 2006:x <- c(6, 8, 9, 10, 11, 14)y <- c(2, 3, 5, 5, 6, 8)ww_agreement_coefficient_vec(x, y)ww_systematic_agreement_coefficient_vec(x, y)ww_unsystematic_agreement_coefficient_vec(x, y)ww_systematic_mpd_vec(x, y)ww_unsystematic_mpd_vec(x, y)ww_systematic_rmpd_vec(x, y)ww_unsystematic_rmpd_vec(x, y)example_df <- data.frame(x = x, y = y)ww_agreement_coefficient(example_df, x, y)ww_systematic_agreement_coefficient(example_df, x, y)ww_unsystematic_agreement_coefficient(example_df, x, y)ww_systematic_mpd(example_df, x, y)ww_unsystematic_mpd(example_df, x, y)ww_systematic_rmpd(example_df, x, y)ww_unsystematic_rmpd(example_df, x, y)

Find the area of applicability

Description

This function calculates the "area of applicability" of a model, asintroduced by Meyer and Pebesma (2021). While the initial paper introducingthis method focused on spatial models, there is nothing inherently spatialabout the method; it can be used with any type of data (and, because it doesnot care about the spatial arrangement of your data, can be used with 2D or3D spatial data, and with geographic or projected CRS).

Usage

ww_area_of_applicability(x, ...)## S3 method for class 'data.frame'ww_area_of_applicability(x, testing = NULL, importance, ..., na_rm = FALSE)## S3 method for class 'matrix'ww_area_of_applicability(x, testing = NULL, importance, ..., na_rm = FALSE)## S3 method for class 'formula'ww_area_of_applicability(  x,  data,  testing = NULL,  importance,  ...,  na_rm = FALSE)## S3 method for class 'recipe'ww_area_of_applicability(  x,  data,  testing = NULL,  importance,  ...,  na_rm = FALSE)## S3 method for class 'rset'ww_area_of_applicability(x, y = NULL, importance, ..., na_rm = FALSE)

Arguments

x

Either a data frame, matrix, formula(specifying predictor terms on the right-hand side), recipe(fromrecipes::recipe(), orrset object, produced by resampling functionsfrom rsample or spatialsample.

Ifx is a recipe, it should be the same one used to pre-process the dataused in your model. If the recipe used to build the area of applicabilitydoesn't match the one used to build the model, the returned area ofapplicability won't be correct.

...

Not currently used.

testing

A data frame or matrix containing the data used tovalidate your model. This should be the same data as used to calculate allmodel accuracy metrics.

If this argument isNULL, then this function will use the training data(fromx ordata) to calculate within-sample distances.This may result in the area of applicability threshold being set too high,with the result that too many points are classed as "inside" the area ofapplicability.

importance

Either:

A data.frame with two columns:term, containing the names of eachvariable in the training and testing data, andestimate, containingthe (raw or scaled) feature importance for each variable.
An object of classvi with at least two columns,Variable andImportance.

All variables in the training data (x ordata, depending on the context)must have a matching importance estimate, and all terms with importanceestimates must be in the training data.

na_rm

A logical of length 1, indicating whether observations (in bothtraining and testing) withNA values in predictors should be removed. Onlypredictor variables are considered, and this value has no impact onpredictions (whereNA values produceNA predictions). Ifna_rm = FALSEandNA values are found, this function returns an error.

data

The data frame representing your "training" data, when using theformula or recipe methods.

y

Optional: a recipe (fromrecipes::recipe()) or formula.

Ify is a recipe, it should be the same one used to pre-process the dataused in your model. If the recipe used to build the area of applicabilitydoesn't match the one used to build the model, the returned area ofapplicability won't be correct.

Details

Predictions made on points "inside" the area of applicability should be asaccurate as predictions made on the data provided totesting.That means that generallytesting should be your final hold-outset so that predictions on points inside the area of applicability areaccurately described by your reported model metrics.When passing anrset object tox, predictions made on points "inside" thearea of applicability instead should be as accurate as predictions made onthe assessment sets during cross-validation.

This method assumes your model was fit using dummy variables in the place ofany non-numeric predictor, and that you have one importance score perdummy variable. Having non-numeric predictors will cause this function tofail.

Value

Aww_area_of_applicability object, which can be used withpredict() tocalculate the distance of new data to the original training data, anddetermine if new data is within a model's area of applicability.

Differences from CAST

This implementation differs fromMeyer and Pebesma (2021) (and therefore from CAST) when using cross-validateddata in order to minimize data leakage. Namely, in order to calculatethe dissimilarity indexDI_{k}, CAST:

Rescales all data used for cross validation at once, lumping assessmentfolds in with analysis data.
Calculates a single\bar{d} as the mean distance between all pointsin the rescaled data set, including between points in the same assessmentfold.
For each pointk that's used in an assessment fold, calculatesd_{k} as the minimum distance betweenk and any point in itscorresponding analysis fold.
CalculatesDI_{k} by dividingd_{k} by\bar{d} (whichwas partially calculated as the distance betweenk and the rest ofthe rescaled data).

Because assessment data is used to calculate constants for rescaling analysisdata and\bar{d}, the assessment data may appear too "similar" tothe analysis data when calculatingDI_{k}. As such, waywiser treatseach fold in anrset independently:

Each analysis set is rescaled independently.
Separate\bar{d} are calculated for each fold, as the mean distancebetween all points in the analysis set for that fold.
Identically to CAST,d_{k} is the minimum distance between a pointk in the assessment fold and any point in thecorresponding analysis fold.
DI_{k} is then found by dividingd_{k} by\bar{d},which was calculated independently fromk.

Predictions are made using the full training data set, rescaled once (inthe same way as CAST), and the mean\bar{d} across folds, under theassumption that the "final" model in use will be retrained using the entiredata set.

In practice, this means waywiser produces very slightly higher\bar{d}values than CAST and a slightly higher area of applicability threshold thanCAST when usingrset objects.

References

H. Meyer and E. Pebesma. 2021. "Predicting into unknown space? Estimatingthe area of applicability of spatial prediction models," Methods in Ecologyand Evolution 12(9), pp 1620 - 1633, doi: 10.1111/2041-210X.13650.

Examples

train <- vip::gen_friedman(1000, seed = 101) # ?vip::gen_friedmantest <- train[701:1000, ]train <- train[1:700, ]pp <- stats::ppr(y ~ ., data = train, nterms = 11)metric_name <- ifelse(  packageVersion("vip") > package_version("0.3.2"),  "rsq",  "rsquared")importance <- vip::vi_permute(  pp,  target = "y",  metric = metric_name,  pred_wrapper = predict,  train = train)aoa <- ww_area_of_applicability(y ~ ., train, test, importance = importance)predict(aoa, test)# Equivalent methods for calculating AOA:ww_area_of_applicability(train[2:11], test[2:11], importance)ww_area_of_applicability(  as.matrix(train[2:11]),  as.matrix(test[2:11]),  importance)

Make 'nb' objects from sf objects

Description

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Usage

ww_build_neighbors(data, nb = NULL, ..., call = rlang::caller_env())

Arguments

data

An sf object (of class "sf" or "sfc").

nb

An object of class "nb" (in which case it will be returnedunchanged), or a function to create an object of class "nb" fromdata and..., orNULL. See details.

...

Arguments passed to the neighbor-creating function.

call

The execution environment of a currently runningfunction, e.g.call = caller_env(). The corresponding functioncall is retrieved and mentioned in error messages as the sourceof the error.

You only need to supplycall when throwing a condition from ahelper function which wouldn't be relevant to mention in themessage.

Can also beNULL or adefused function call torespectively not display any call or hard-code a code to display.

For more information about error calls, seeIncluding function calls in error messages.

Details

Whennb = NULL, the method used to create neighbors fromdata isdependent on what geometry typedata is:

Ifnb = NULL anddata is a point geometry(classes "sfc_POINT" or "sfc_MULTIPOINT") the "nb" object will be createdusingww_make_point_neighbors().
Ifnb = NULL anddata is a polygon geometry(classes "sfc_POLYGON" or "sfc_MULTIPOLYGON") the "nb" object will be createdusingww_make_polygon_neighbors().
Ifnb = NULL anddata is any other geometry type, the "nb" object willbe created using the centroids of the data as points, with a warning.

Value

An object of class "nb".

Examples

ww_build_neighbors(guerry)

Build "listw" objects of spatial weights

Description

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Usage

ww_build_weights(x, wt = NULL, include_self = FALSE, ...)

Arguments

x

Either an sf object or a "nb" neighbors list object.If an sf object, will be converted into a neighbors list viaww_build_neighbors().

wt

Either a "listw" object (which will be returned unchanged),a function for creating a "listw" object fromx, orNULL, in which caseweights will be constructed viaspdep::nb2listw().

include_self

Include each region itself in its own list of neighbors?

...

Arguments passed to the weight constructing function.

Value

Alistw object.

Examples

ww_build_weights(guerry)

Global Geary's C statistic

Description

Calculate the global Geary's C statistic for model residuals.ww_global_geary_c() returns the statistic itself, whileww_global_geary_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying if there are clusters of higher residuals than expected. Forstatistical testing and inference applications, usespdep::geary.test() instead.

Usage

ww_global_geary_c(data, ...)ww_global_geary_c_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_global_geary_pvalue(data, ...)ww_global_geary_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Additional arguments passed tospdep::geary() (forww_global_geary_c()) orspdep::geary.test() (forww_global_geary_pvalue()).

truth

estimate

wt

Alistw object, for instance as created withww_build_weights().For data.frame input, may also be a function that takesdata and returns alistw object.

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as thenumber of groups.For⁠_vec()⁠ functions, a single value (or NA).

References

Geary, R. C. (1954). "The Contiguity Ratio and Statistical Mapping". TheIncorporated Statistician. 5 (3): 115–145. doi:10.2307/2986645.

Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion, p. 17.

Examples

guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_global_geary_c(guerry_model, Crm_prs, predictions)ww_global_geary_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_global_geary_c_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)ww_global_geary_pvalue_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)

Global Moran's I statistic

Description

Calculate the global Moran's I statistic for model residuals.ww_global_moran_i() returns the statistic itself, whileww_global_moran_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying if there are clusters of higher residuals than expected. Forstatistical testing and inference applications, usespdep::moran.test() instead.

Usage

ww_global_moran_i(data, ...)ww_global_moran_i_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)ww_global_moran_pvalue(data, ...)ww_global_moran_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Additional arguments passed tospdep::moran() (forww_global_moran_i()) orspdep::moran.test() (forww_global_moran_pvalue()).

truth

estimate

wt

Alistw object, for instance as created withww_build_weights().For data.frame input, may also be a function that takesdata and returns alistw object.

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

A tibble with columns .metric, .estimator, and .estimate and 1 row of values.For grouped data frames, the number of rows returned will be the same as thenumber of groups.For⁠_vec()⁠ functions, a single value (or NA).

References

Moran, P.A.P. (1950). "Notes on Continuous Stochastic Phenomena." Biometrika,37(1/2), pp 17. doi: 10.2307/2332142

Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion, p. 17.

Examples

guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_global_moran_i(guerry_model, Crm_prs, predictions)ww_global_moran_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_global_moran_i_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)ww_global_moran_pvalue_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)

Local Geary's C statistic

Description

Calculate the local Geary's C statistic for model residuals.ww_local_geary_c() returns the statistic itself, whileww_local_geary_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying clusters of higher residuals than expected. For statisticaltesting and inference applications, usespdep::localC_perm() instead.

Usage

ww_local_geary_c(data, ...)ww_local_geary_c_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_local_geary_pvalue(data, ...)ww_local_geary_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Additional arguments passed tospdep::localC() (forww_local_geary_c()) orspdep::localC_perm() (forww_local_geary_pvalue()).

truth

estimate

wt

Alistw object, for instance as created withww_build_weights().For data.frame input, may also be a function that takesdata and returns alistw object.

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

A tibble with columns .metric, .estimator, and .estimate andnrow(data)rows of values.For⁠_vec()⁠ functions, a numeric vector oflength(truth) (or NA).

References

Anselin, L. 1995. Local indicators of spatial association, GeographicalAnalysis, 27, pp 93–115. doi: 10.1111/j.1538-4632.1995.tb00338.x.

Anselin, L. 2019. A Local Indicator of Multivariate Spatial Association:Extending Geary's C. Geographical Analysis, 51, pp 133-150.doi: 10.1111/gean.12164

Examples

guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_local_geary_c(guerry_model, Crm_prs, predictions)ww_local_geary_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_local_geary_c_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)ww_local_geary_pvalue_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)

Local Getis-Ord G and G* statistic

Description

Calculate the local Getis-Ord G and G* statistic for model residuals.ww_local_getis_ord_g() returns the statistic itself, whileww_local_getis_ord_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying clusters of higher residuals than expected. For statisticaltesting and inference applications, usespdep::localG_perm() instead.

Usage

ww_local_getis_ord_g(data, ...)ww_local_getis_ord_g_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_local_getis_ord_g_pvalue(data, ...)ww_local_getis_ord_g_pvalue_vec(truth, estimate, wt, na_rm = FALSE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Additional arguments passed tospdep::localG() (forww_local_getis_ord_g()) orspdep::localG_perm() (forww_local_getis_ord_pvalue()).

truth

estimate

wt

Alistw object, for instance as created withww_build_weights().For data.frame input, may also be a function that takesdata and returns alistw object.

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

A tibble with columns .metric, .estimator, and .estimate andnrow(data)rows of values.For⁠_vec()⁠ functions, a numeric vector oflength(truth) (or NA).

References

Ord, J. K. and Getis, A. 1995. Local spatial autocorrelation statistics:distributional issues and an application. Geographical Analysis, 27, 286–306.doi: 10.1111/j.1538-4632.1995.tb00912.x

Examples

guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_local_getis_ord_g(guerry_model, Crm_prs, predictions)ww_local_getis_ord_g_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_local_getis_ord_g_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)ww_local_getis_ord_g_pvalue_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)

Local Moran's I statistic

Description

Calculate the local Moran's I statistic for model residuals.ww_local_moran_i() returns the statistic itself, whileww_local_moran_pvalue() returns the associated p value.These functions are meant to help assess model predictions, for instance byidentifying clusters of higher residuals than expected. For statisticaltesting and inference applications, usespdep::localmoran_perm() instead.

Usage

ww_local_moran_i(data, ...)ww_local_moran_i_vec(truth, estimate, wt, na_rm = FALSE, ...)ww_local_moran_pvalue(data, ...)ww_local_moran_pvalue_vec(truth, estimate, wt = NULL, na_rm = FALSE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Additional arguments passed tospdep::localmoran().

truth

estimate

wt

Alistw object, for instance as created withww_build_weights().For data.frame input, may also be a function that takesdata and returns alistw object.

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

A tibble with columns .metric, .estimator, and .estimate andnrow(data)rows of values.For⁠_vec()⁠ functions, a numeric vector oflength(truth) (or NA).

References

Anselin, L. 1995. Local indicators of spatial association, GeographicalAnalysis, 27, pp 93–115. doi: 10.1111/j.1538-4632.1995.tb00338.x.

Sokal, R. R, Oden, N. L. and Thomson, B. A. 1998. Local SpatialAutocorrelation in a Biological Model. Geographical Analysis, 30, pp 331–354.doi: 10.1111/j.1538-4632.1998.tb00406.x

Examples

guerry_model <- guerryguerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)guerry_model$predictions <- predict(guerry_lm, guerry_model)ww_local_moran_i(guerry_model, Crm_prs, predictions)ww_local_moran_pvalue(guerry_model, Crm_prs, predictions)wt <- ww_build_weights(guerry_model)ww_local_moran_i_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)ww_local_moran_pvalue_vec(  guerry_model$Crm_prs,  guerry_model$predictions,  wt = wt)

Make 'nb' objects from point geometries

Description

This function usesspdep::knearneigh() andspdep::knn2nb() tocreate a "nb" neighbors list.

Usage

ww_make_point_neighbors(data, k = 1, sym = FALSE, ...)

Arguments

data

Ansfc_POINT orsfc_MULTIPOINT object.

k

How many nearest neighbors to use inspdep::knearneigh().

sym

Force the output neighbors list (fromspdep::knn2nb()) tosymmetry.

...

Other arguments passed tospdep::knearneigh().

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

An object of class "nb"

Examples

ww_make_point_neighbors(ny_trees)

Make 'nb' objects from polygon geometries

Description

This function is an extremely thin wrapper aroundspdep::poly2nb(),renamed to use the waywiser "ww" prefix.

Usage

ww_make_polygon_neighbors(data, ...)

Arguments

data

Ansfc_POLYGON orsfc_MULTIPOLYGON object.

...

Additional arguments passed tospdep::poly2nb().

Details

These functions can be used for geographic or projected coordinate referencesystems and expect 2D data.

Value

An object of class "nb"

Examples

ww_make_polygon_neighbors(guerry)

Evaluate metrics at multiple scales of aggregation

Description

Evaluate metrics at multiple scales of aggregation

Usage

ww_multi_scale(  data = NULL,  truth,  estimate,  metrics = list(yardstick::rmse, yardstick::mae),  grids = NULL,  ...,  na_rm = TRUE,  aggregation_function = "mean",  autoexpand_grid = TRUE,  progress = TRUE)

Arguments

data

Either: a point geometrysf object containing the columnsspecified by thetruth andestimate arguments; aSpatRaster fromtheterra package containing layers specified by thetruth andestimatearguments; orNULL iftruth andestimate areSpatRaster objects.

truth,estimate

Ifdata is ansf object, the names (optionallyunquoted) for the columns indata containing the true and predicted values,respectively. Ifdata is aSpatRaster object, either (quoted) layer names orindices which will select the true and predicted layers, respectively, viaterra::subset() Ifdata isNULL,SpatRaster objects with a singlelayer containing the true and predicted values, respectively.

metrics

Either ayardstick::metric_set() object, or a list offunctions which will be used to construct ayardstick::metric_set() objectspecifying the performance metrics to evaluate at each scale.

grids

Optionally, a list of pre-computedsf orsfc objectsspecifying polygon boundaries to use for assessments.

...

Arguments passed tosf::st_make_grid().You almost certainly should provide these arguments as lists.For instance, passingn = list(c(1, 2)) will create a single 1x2 grid;passingn = c(1, 2) will create a 1x1 gridand a 2x2 grid.

na_rm

Boolean: Should polygons with NA values be removed beforecalculating metrics? Note that this doesnot impact how values areaggregated to polygons: if you want to remove NA values before aggregating,provide a function toaggregation_function which will remove NA values.

aggregation_function

The function to use to aggregate predictions andtrue values at various scales, by defaultmean(). For thesf method,you can pass any function which takes a single vector and returns a scalar.For raster methods, any function accepted byexactextractr::exact_extract() (note that built-in function names must bequoted). Note that this function doesnot pay attention to the value ofna_rm; any NA handling you want to do during aggregation should be handledby this function directly.

autoexpand_grid

Boolean: ifdata is in geographic coordinates andgrids aren't provided, the grids generated bysf::st_make_grid() may notcontain all observations. IfTRUE, this function will automatically expandgenerated grids by a tiny factor to attempt to capture all observations.

progress

Boolean: ifdata isNULL, should aggregation viaexactextractr::exact_extract() show a progress bar? Separate progress barswill be shown for each timetruth andestimate are aggregated.

Value

A tibble with six columns:.metric, with the nameof the metric that the row describes;.estimator, with the name of theestimator used,.estimate, with the output of the metric function;.grid_args, with the arguments passed tosf::st_make_grid() via...(if any),.grid, containing the grids used to aggregate predictions,as well as the aggregated values oftruth andestimate as well as thecount of non-NA values for each, and.notes, which (ifdata is ansfobject) will indicate any observations which were not used in a givenassessment.

Raster inputs

Ifdata isNULL, thentruth andestimate should both beSpatRasterobjects, as created viaterra::rast(). These rasters will then beaggregated to each grid usingexactextractr::exact_extract(). Ifdatais aSpatRaster object, thentruth andestimate should be indices toselect the appropriate layers of the raster viaterra::subset().

Grids are calculated using the bounding box oftruth, under the assumptionthat you may have extrapolated into regions which do not have matching "true"values. This function does not check thattruth andestimate overlap atall, or that they are at all contained within the grid.

Creating grid blocks

The grid blocks can be controlled by passing arguments tosf::st_make_grid() via.... Some particularly useful arguments include:

cellsize: Target cellsize, expressed as the "diameter" (shorteststraight-line distance between opposing sides; two times the apothem)of each block, in map units.
n: The number of grid blocks in the x and y direction (columns, rows).
square: A logical value indicating whether to create square (TRUE) orhexagonal (FALSE) cells.

If bothcellsize andn are provided, then the number of blocks requestedbyn of sizes specified bycellsize will be returned, likely notlining up with the bounding box ofdata. If onlycellsizeis provided, this function will return as many blocks of sizecellsize as fit inside the bounding box ofdata. If onlyn is provided,thencellsize will be automatically adjusted to create the requestednumber of cells.

Grids are created by mapping over each argument passed via...simultaneously, in a similar manner tomapply() orpurrr::pmap(). Thismeans that, for example, passingn = list(c(1, 2)) will create a single1x2 grid, while passingn = c(1, 2) will create a 1x1 gridand a 2x2grid. It also means that arguments will be recycled using R's standardvector recycling rules, so that passingn = c(1, 2) andsquare = FALSEwill create two separate grids of hexagons.

This function can be used for geographic or projected coordinate referencesystems and expects 2D data.

References

Riemann, R., Wilson, B. T., Lister, A., and Parks, S. (2010). "An effectiveassessment protocol for continuous geospatial datasets of forestcharacteristics using USFS Forest Inventory and Analysis (FIA) data."Remote Sensing of Environment 114(10), pp 2337-2352,doi: 10.1016/j.rse.2010.05.010 .

Examples

data(ames, package = "modeldata")ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)ames_model <- lm(Sale_Price ~ Lot_Area, data = ames_sf)ames_sf$predictions <- predict(ames_model, ames_sf)ww_multi_scale(  ames_sf,  Sale_Price,  predictions,  n = list(    c(10, 10),    c(1, 1)  ),  square = FALSE)# or, mostly equivalently# (there will be a slight difference due to `autoexpand_grid = TRUE`)grids <- list(  sf::st_make_grid(ames_sf, n = c(10, 10), square = FALSE),  sf::st_make_grid(ames_sf, n = c(1, 1), square = FALSE))ww_multi_scale(ames_sf, Sale_Price, predictions, grids = grids)

Willmott's d and related values

Description

These functions calculate Willmott's d value, a proposed replacement for R2which better differentiates between types and magnitudes of possiblecovariations. Additional functions calculate systematic and unsystematiccomponents of MSE and RMSE; the sum of the systematic and unsystematiccomponents of MSE equal total MSE (though the same is not true for RMSE).

Usage

ww_willmott_d(data, ...)## S3 method for class 'data.frame'ww_willmott_d(data, truth, estimate, na_rm = TRUE, ...)ww_willmott_d_vec(truth, estimate, na_rm = TRUE, ...)ww_willmott_d1(data, ...)## S3 method for class 'data.frame'ww_willmott_d1(data, truth, estimate, na_rm = TRUE, ...)ww_willmott_d1_vec(truth, estimate, na_rm = TRUE, ...)ww_willmott_dr(data, ...)## S3 method for class 'data.frame'ww_willmott_dr(data, truth, estimate, na_rm = TRUE, ...)ww_willmott_dr_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_mse(data, ...)## S3 method for class 'data.frame'ww_systematic_mse(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_mse_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mse(data, ...)## S3 method for class 'data.frame'ww_unsystematic_mse(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_mse_vec(truth, estimate, na_rm = TRUE, ...)ww_systematic_rmse(data, ...)## S3 method for class 'data.frame'ww_systematic_rmse(data, truth, estimate, na_rm = TRUE, ...)ww_systematic_rmse_vec(truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmse(data, ...)## S3 method for class 'data.frame'ww_unsystematic_rmse(data, truth, estimate, na_rm = TRUE, ...)ww_unsystematic_rmse_vec(truth, estimate, na_rm = TRUE, ...)

Arguments

data

Adata.frame containing the columns specified by thetruthandestimate arguments.

...

Not currently used.

truth

estimate

na_rm

Alogical value indicating whetherNAvalues should be stripped before the computation proceeds.

Details

Values of d and d1 range from 0 to 1, with 1 indicating perfect agreement.Values ofdr range from -1 to 1, with 1 similarly indicating perfect agreement. Valuesof RMSE are in the same units astruth andestimate, while values of MSEare in squared units.truth andestimate must be the same length. Thisfunction is not explicitly spatial and as such can be applied to data withany number of dimensions and any coordinate reference system.

Value

References

Willmott, C. J. 1981. "On the Validation of Models". Physical Geography 2(2),pp 184-194, doi: 10.1080/02723646.1981.10642213.

Willmott, C. J. 1982. "Some Comments on the Evaluation of Model Performance".Bulletin of the American Meteorological Society 63(11), pp 1309-1313,doi: 10.1175/1520-0477(1982)063<1309:SCOTEO>2.0.CO;2.

Willmott C. J., Ackleson S. G., Davis R. E., Feddema J. J., Klink K. M.,Legates D. R., O’Donnell J., Rowe C. M. 1985. "Statistics for theevaluation of model performance." Journal of Geophysical Research90(C5): 8995–9005, doi: 10.1029/jc090ic05p08995

Willmott, C. J., Robeson, S. M., and Matsuura, K. "A refined index of modelperformance". International Journal of Climatology 32, pp 2088-2094, doi:10.1002/joc.2419.

Examples

x <- c(6, 8, 9, 10, 11, 14)y <- c(2, 3, 5, 5, 6, 8)ww_willmott_d_vec(x, y)ww_willmott_d1_vec(x, y)ww_willmott_dr_vec(x, y)ww_systematic_mse_vec(x, y)ww_unsystematic_mse_vec(x, y)ww_systematic_rmse_vec(x, y)ww_unsystematic_rmse_vec(x, y)example_df <- data.frame(x = x, y = y)ww_willmott_d(example_df, x, y)ww_willmott_d1(example_df, x, y)ww_willmott_dr(example_df, x, y)ww_systematic_mse(example_df, x, y)ww_unsystematic_mse(example_df, x, y)ww_systematic_rmse(example_df, x, y)ww_unsystematic_rmse(example_df, x, y)

Movatterモバイル変換

waywiser: Ergonomic Methods for Assessing Spatial Models

Description

Author(s)

See Also

Guerry "Moral Statistics" (1830s)

Description

Usage

Format

Details

Source

Examples

Number of trees and aboveground biomass for Forest Inventory and Analysis plots in New York State

Description

Usage

Format

Predict from aww_area_of_applicability

Description

Usage

Arguments

Details

Value

See Also

Examples

Print number of predictors and area-of-applicability threshold

Description

Usage

Arguments

Examples

Simulated data based on WorldClim Bioclimatic variables

Description

Usage

Format

Source

Agreement coefficients and related methods

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Find the area of applicability

Description

Usage

Arguments

Details

Value

Differences from CAST

References

See Also

Examples

Make 'nb' objects from sf objects

Description

Usage

Arguments

Details

Value

Examples

Build "listw" objects of spatial weights

Description

Usage

Arguments

Value

Examples

Global Geary's C statistic

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Global Moran's I statistic

Description

Usage

Arguments

Details

Predict from a`ww_area_of_applicability`