- Notifications
You must be signed in to change notification settings - Fork23
The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
License
rvalavi/blockCV
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The packageblockCV
offers a range of functions for generating trainand test folds fork-fold andleave-one-out (LOO)cross-validation (CV). It allows for separation of data spatially andenvironmentally, with various options for block construction.Additionally, it includes a function for assessing the level of spatialautocorrelation in response or raster covariates, to aid in selecting anappropriate distance band for data separation. TheblockCV
package issuitable for the evaluation of a variety of spatial modellingapplications, including classification of remote sensing imagery, soilmapping, and species distribution modelling (SDM). It also providessupport for different SDM scenarios, including presence-absence andpresence-background species data, rare and common species, and rasterdata for predictor variables.
- There are four blocking methods:spatial,clustering,buffers, andNNDM (Nearest Neighbour Distance Matching)blocks
- Several ways to construct spatial blocks
- The assignment of the spatial blocks to cross-validation folds canbe done in three different ways:random,systematic andcheckerboard pattern
- The spatial blocks can be assigned to cross-validation folds to haveevenly distributed records forbinary (e.g. speciespresence-absence/background) ormulti-class responses (e.g. landcover classes for remote sensing image classification)
- The buffering and NNDM functions can account forpresence-absenceandpresence-background data types
- Using geostatistical techniques to inform the choice of a suitabledistance band by which to separate the data sets
The latest versionblockCV
(v3.0) features significant updates and changes. All function names have been revised to more general names, beginning withcv_*
. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below.
Some new updates:
- Function names have been changed, with all functions now startingwith
cv_
- The CV blocking functions are now:
cv_spatial
,cv_cluster
,cv_buffer
, andcv_nndm
- Spatial blocks now supporthexagonal (now, default),rectangular, and user-defined blocks
- A fast C++ implementation ofNearest Neighbour Distance Matching(NNDM) algorithm (Milà et al. 2022) is now added
- The NNDM algorithm can handle species presence-background data andother types of data
- The
cv_cluster
function generates blocks based on kmeansclustering. It now works on both environmental rasters and thespatial coordinates of sample points - The
cv_spatial_autocor
function now calculates the spatialautocorrelation range for both theresponse (i.e. binary orcontinuous data) and a set of continuous raster covariates - The new
cv_plot
function allows for visualization of folds fromall blocking strategies using ggplot facets - The
terra
package is now used for all raster processing andsupports bothstars
andraster
objects, as well as files ondisk. - The new
cv_similarity
provides measures on possible extrapolationto testing folds
To install the latest update of the package from GitHub use:
remotes::install_github("rvalavi/blockCV",dependencies=TRUE)
Or installing from CRAN:
install.packages("blockCV",dependencies=TRUE)
To see the practical examples of the package see:
- blockCV introduction: how to create block cross-validationfolds
- Block cross-validation for species distributionmodelling
- Using blockCV with the
caret
andtidymodels
(coming soon!)
This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above).
# loading the packagelibrary(blockCV)library(sf)# working with spatial vector datalibrary(terra)# working with spatial raster data
# load raster data; the pipe operator |> is available for R v4.1 or highermyrasters<- system.file("extdata/au/",package="blockCV")|> list.files(full.names=TRUE)|>terra::rast()# load species presence-absence data and convert to sfpa_data<- read.csv(system.file("extdata/","species.csv",package="blockCV"))|>sf::st_as_sf(coords= c("x","y"),crs=7845)
# spatial blocking by specified range and random assignmentsb<- cv_spatial(x=pa_data,# sf or SpatialPoints of sample data (e.g. species data)column="occ",# the response column (binary or multi-class)r=myrasters,# a raster for background (optional)size=450000,# size of the blocks in metresk=5,# number of foldshexagon=TRUE,# use hexagonal blocks - defualtselection="random",# random blocks-to-folditeration=100,# to find evenly dispersed foldsbiomod2=TRUE)# also create folds for biomod2
Or create spatial clusters for k-fold cross-validation:
# create spatial clustersset.seed(6)sc<- cv_cluster(x=pa_data,column="occ",# optionally count data in folds (binary or multi-class)k=5)
# now plot the created foldscv_plot(cv=sc,# a blockCV objectx=pa_data,# sample pointsr=myrasters[[1]],# optionally add a raster backgroundpoints_alpha=0.5,nrow=2)
Investigate spatial autocorrelation in the landscape to choose asuitable size for spatial blocks:
# exploring the effective range of spatial autocorrelation in raster covariates or sample datacv_spatial_autocor(r=myrasters,# a SpatRaster object or path to filesnum_sample=5000,# number of cells to be usedplot=TRUE)
Alternatively, you can manually choose the size of spatial blocks in aninteractive session using a Shiny app.
# shiny app to aid selecting a size for spatial blockscv_block_size(r=myrasters[[1]],x=pa_data,# optionally add sample pointscolumn="occ",min_size=2e5,max_size=9e5)
Please report issues at:https://github.com/rvalavi/blockCV/issues
To cite packageblockCV in publications, please use:
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G.blockCV: An Rpackage for generating spatially or environmentally separated folds fork-fold cross-validation of species distribution models.Methods EcolEvol. 2019; 10:225--232.https://doi.org/10.1111/2041-210X.13107
About
The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See