Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See

License

NotificationsYou must be signed in to change notification settings

rvalavi/blockCV

Repository files navigation

R build statuscodecovCRAN versiontotalLicenseDOI

Spatial and environmental blocking for k-fold and LOO cross-validation

The packageblockCV offers a range of functions for generating trainand test folds fork-fold andleave-one-out (LOO)cross-validation (CV). It allows for separation of data spatially andenvironmentally, with various options for block construction.Additionally, it includes a function for assessing the level of spatialautocorrelation in response or raster covariates, to aid in selecting anappropriate distance band for data separation. TheblockCV package issuitable for the evaluation of a variety of spatial modellingapplications, including classification of remote sensing imagery, soilmapping, and species distribution modelling (SDM). It also providessupport for different SDM scenarios, including presence-absence andpresence-background species data, rare and common species, and rasterdata for predictor variables.

Main features

  • There are four blocking methods:spatial,clustering,buffers, andNNDM (Nearest Neighbour Distance Matching)blocks
  • Several ways to construct spatial blocks
  • The assignment of the spatial blocks to cross-validation folds canbe done in three different ways:random,systematic andcheckerboard pattern
  • The spatial blocks can be assigned to cross-validation folds to haveevenly distributed records forbinary (e.g. speciespresence-absence/background) ormulti-class responses (e.g. landcover classes for remote sensing image classification)
  • The buffering and NNDM functions can account forpresence-absenceandpresence-background data types
  • Using geostatistical techniques to inform the choice of a suitabledistance band by which to separate the data sets

New updates of the version 3.0

The latest versionblockCV (v3.0) features significant updates and changes. All function names have been revised to more general names, beginning withcv_*. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below.

Some new updates:

  • Function names have been changed, with all functions now startingwithcv_
  • The CV blocking functions are now:cv_spatial,cv_cluster,cv_buffer, andcv_nndm
  • Spatial blocks now supporthexagonal (now, default),rectangular, and user-defined blocks
  • A fast C++ implementation ofNearest Neighbour Distance Matching(NNDM) algorithm (Milà et al. 2022) is now added
  • The NNDM algorithm can handle species presence-background data andother types of data
  • Thecv_cluster function generates blocks based on kmeansclustering. It now works on both environmental rasters and thespatial coordinates of sample points
  • Thecv_spatial_autocor function now calculates the spatialautocorrelation range for both theresponse (i.e. binary orcontinuous data) and a set of continuous raster covariates
  • The newcv_plot function allows for visualization of folds fromall blocking strategies using ggplot facets
  • Theterra package is now used for all raster processing andsupports bothstars andraster objects, as well as files ondisk.
  • The newcv_similarity provides measures on possible extrapolationto testing folds

Installation

To install the latest update of the package from GitHub use:

remotes::install_github("rvalavi/blockCV",dependencies=TRUE)

Or installing from CRAN:

install.packages("blockCV",dependencies=TRUE)

Vignettes

To see the practical examples of the package see:

  1. blockCV introduction: how to create block cross-validationfolds
  2. Block cross-validation for species distributionmodelling
  3. Using blockCV with thecaret andtidymodels (coming soon!)

Basic usage

This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above).

# loading the packagelibrary(blockCV)library(sf)# working with spatial vector datalibrary(terra)# working with spatial raster data
# load raster data; the pipe operator |> is available for R v4.1 or highermyrasters<- system.file("extdata/au/",package="blockCV")|>  list.files(full.names=TRUE)|>terra::rast()# load species presence-absence data and convert to sfpa_data<- read.csv(system.file("extdata/","species.csv",package="blockCV"))|>sf::st_as_sf(coords= c("x","y"),crs=7845)
# spatial blocking by specified range and random assignmentsb<- cv_spatial(x=pa_data,# sf or SpatialPoints of sample data (e.g. species data)column="occ",# the response column (binary or multi-class)r=myrasters,# a raster for background (optional)size=450000,# size of the blocks in metresk=5,# number of foldshexagon=TRUE,# use hexagonal blocks - defualtselection="random",# random blocks-to-folditeration=100,# to find evenly dispersed foldsbiomod2=TRUE)# also create folds for biomod2

Or create spatial clusters for k-fold cross-validation:

# create spatial clustersset.seed(6)sc<- cv_cluster(x=pa_data,column="occ",# optionally count data in folds (binary or multi-class)k=5)
# now plot the created foldscv_plot(cv=sc,# a blockCV objectx=pa_data,# sample pointsr=myrasters[[1]],# optionally add a raster backgroundpoints_alpha=0.5,nrow=2)

Investigate spatial autocorrelation in the landscape to choose asuitable size for spatial blocks:

# exploring the effective range of spatial autocorrelation in raster covariates or sample datacv_spatial_autocor(r=myrasters,# a SpatRaster object or path to filesnum_sample=5000,# number of cells to be usedplot=TRUE)

Alternatively, you can manually choose the size of spatial blocks in aninteractive session using a Shiny app.

# shiny app to aid selecting a size for spatial blockscv_block_size(r=myrasters[[1]],x=pa_data,# optionally add sample pointscolumn="occ",min_size=2e5,max_size=9e5)

Reporting issues

Please report issues at:https://github.com/rvalavi/blockCV/issues

Citation

To cite packageblockCV in publications, please use:

Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G.blockCV: An Rpackage for generating spatially or environmentally separated folds fork-fold cross-validation of species distribution models.Methods EcolEvol. 2019; 10:225--232.https://doi.org/10.1111/2041-210X.13107

About

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See

Topics

Resources

License

Stars

Watchers

Forks

Languages


[8]ページ先頭

©2009-2025 Movatter.jp