Movatterモバイル変換

InterpolateR:A Comprehensive Toolkit for Fast and Efficient SpatialInterpolation

Introduction

InterpolateR is one of the most efficient R packages forspatial interpolation. It leverages vectorized operations to maximizecomputational efficiency while maintaining high accuracy ininterpolation tasks. The package integrates both traditional andadvanced interpolation methods, making it a versatile tool for a widerange of spatial analysis applications.

Among the traditional interpolation techniques included are:

Inverse Distance Weighting (IDW): A widely usedmethod that assigns weights to observations based on their inversedistance.
Cressman Objective Analysis Method (Cressman): Apowerful approach that iteratively refines interpolated values based onsurrounding observations.

Additionally,InterpolateR incorporates advanced machinelearning-based interpolation techniques:

RFmerge: Methodology developed byBaez-Villanueva et al. (2020) for the fusion of satellite precipitationdatasets with ground-based observations
RFplus: A novel approach that combines RandomForest with Quantile Mapping to refine interpolated estimates andminimize bias.

The package is continuously evolving, with new interpolation methodsbeing added in future releases. Its highly optimized implementationensures fast computations, making it an excellent choice for researchersand practitioners working with large spatial datasets.

Installation

Install the stable version from CRAN.

install.packages("InterpolateR")

You can install the development version from GitHub.

devtools::install_github("Jonnathan-Landi/InterpolateR")

Note: This version may include errors due to theincorporation of new functionalities.

Getting Started

InterpolateR is designed to maintain a consistent datastructure across all functions. The input datasets BD_Obs and BD_Coordmust follow specific formats to ensure smooth functionality.

Structure of BD_Obs

BD_Obs can be adata.frame ordata.table, and its structure should follow thisformat:

Date	Est_1	Est_2	Est_3	…	Est_n
2024-01-01	12.4	15.3	10.2	…	18.1
2024-01-02	14.1	16.8	11.0	…	19.5
2024-01-03	13.5	14.9	10.8	…	16.4

Each column (except the first one) represents a different season(Est_1, Est_2, etc.), while the rows correspond to different time stamps(dates).

Structure of BD_Coord

BD_Coord must have the following structure:

Cod	X	Y	Z
Est_1	720227	9680901	2668
Est_2	732260	9682763	2527
Est_3	714574	967587	2956
…	…	…	…
Est_n	734574	9683686	2622

X and Y correspond to the longitude and latitude of each stationin UTM.
Z is the altitude at which the station is located.
Cod is the identifier that matches the column names inBD_Obs.

For correct operation, make sure that the first column of Data_Obscontains the date of each observation. For BD_Coords it is important tostructure the data in this way, i.e. Cod, X,Y,Z. Also, make sure thatthe Cod you assign matches the corresponding column in BD_Obs.

By ensuring that BD_Obs and BD_Coord follow these structures, you canseamlessly use any interpolation method provided by InterpolateR.

Available InterpolationMethods

Inverse distance weighting(IDW)

Inverse Distance Weighting (IDW) is a deterministic interpolationmethod that assumes that values closer to an unknown point have agreater influence than those farther away. This method assigns weightsto sample points so that their influence decreases as the distance tothe unknown point increases.

IDW estimates the value at an unknown point using a weighted averageof the known values, where the weights are inversely proportional to thedistances between the prediction point and the known points. The basicformula, as defined by Shepard, is:

\[\hat{Z}(s_0) = \frac{\sum_{i=1}^{N} w_i Z(s_i)}{\sum_{i=1}^{N} w_i}\]

where:

\(\hat{Z}(s_0)\) is the estimatedvalue at the unknown point.
\(Z(s_i)\) are the knownvalues.
\(w_i\) are the weights assigned toeach known point.
\(N\) is the total number of knownpoints used.

The weights are calculated using:

\[w_i = \frac{1}{d(s_0, s_i)^p}\]

where:

\(d(s_0, s_i)\) is the distancebetween the unknown point\(s_0\) andthe known point\(s_i\).
\(p\) is the power parameter thatcontrols how the influence decreases with distance.

Application of the method inR

library(InterpolateR)# Load data from on-site observationsdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the study area where the interpolation is performed.shapefile <- terra::vect(system.file("extdata/study_area.shp", package = "InterpolateR"))# Perform the interpolationInterpolated_data <- IDW(BD_Obs, BD_Coord, shapefile, grid_resolution = 5, p = 2, n_round = 1, training = 1, Rain_threshold = NULL, stat_validation = "M001", save_model = FALSE, name_save = NULL)# Summary of model performanceprint(Interpolated_data$Validation)

Since the algorithms are designed to work in the UTM system, makesure that grid_resolution is entered in Km.

Cressman ObjectiveAnalysis Method (Cressman)

The Cressman objective analysis computes values at grid points\(Z_{ij}^a\) (where\(i\) and\(j\) are the grid point indices for a 2Dgrid) as the weighted average of the difference between observed values\(Z_k^o\) and background valuesinterpolated to the observation locations\(Z_k^b\) (i.e.,\(Z_k^o - Z_k^b\), called the observationincrement) plus the background value at the grid point\(Z_{ij}^b\).

The Cressman method is defined by the following equation:

\[Z_{ij}^a = Z_{ij}^b + \frac{\sum_{k=1}^{n} w_k (Z_k^o -Z_k^b)}{\sum_{k=1}^{n} w_k}\]

where:

\(Z_{ij}^a\) is the analysisvalue at grid point\(i,j\).
\(Z_{ij}^b\) is the backgroundvalue at grid point\(i,j\).
\(Z_k^o\) is the observed valueat station\(k\).
\(Z_k^b\) is the backgroundvalue interpolated to station\(k\).
\(w_k\) is the weight assignedto station\(k\).
\(n\) is the total number ofstations used.

The weight\(w_k\) is a relation ofinfluence radius\(R\) and the distance\(r\) between the observation point andthe grid point. The weight is defined as:\[w_k = \frac{R^2 - r^2}{R^2 + r^2}\]

where:

\(r = \sqrt{(x_{ij} - x_k)^2 + (y_{ij} -y_k)^2}\) is the distance between the individual observation\(k\) and grid point\((i, j)\).
\(R\) is the influence radius.
Beyond the influence radius, the weight is set to zero.\(R\) is therefore often referred to as thecut-off radius.

Search Radius Considerations

Thesearch_radius parameter defines the influence rangefor the Cressman interpolation method. It determines the maximumdistance (in kilometers) within which observational data pointscontribute to the interpolated value at a given location. A largerradius results in smoother interpolated fields but may oversmooth localvariations, while a smaller radius preserves finer details but mayintroduce noise.

The Cressman method typically applies an iterative approach, wherethe search radius is progressively reduced to refine the interpolation.Each iteration recalculates interpolated values with a smaller radius,allowing a better representation of small-scale features in thedataset.

Application of the method inR

# Load data from on-site observationsdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the study area where the interpolation is performed.shapefile <- terra::vect(system.file("extdata/study_area.shp", package = "InterpolateR"))# Perform the interpolationInterpolated_Cressman <- Cressman(BD_Obs, BD_Coord, shapefile, grid_resolution = 5, search_radius = c(20,10), training = 1,stat_validation = "M001", Rain_threshold = NULL, save_model = FALSE)# Summary of model performanceprint(Interpolated_Cressman$Validation)

Important considerations

Thesearch_radius should be defined as a numeric vectorrepresenting the influence range in kilometers (km) foreach interpolation iteration. For example, setting:search_radius = c(50, 20, 10).
Since the algorithm works in the UTM coordinate system, make surethat thegrid_resolution is entered in km.

RFplus

The RFplus package implements a novel spatial extrapolation and biascorrection framework, integrating Random Forest (RF) and QuantileMapping (QM) in a multi-stage process to improve the accuracy ofsatellite precipitation estimates. The methodology consists of three keystages:

Spatial Extrapolation of Precipitation: Thefirst stage employs a Random Forest model to extrapolate the spatialdistribution of precipitation. The model is trained using in-situmeasurements as the response variable and a diverse set of satelliteprecipitation products and environmental covariates as predictors. Thisapproach enables the generation of an initial precipitation field thatextends observed precipitation patterns across unmonitored regions withhigh spatial flexibility, allowing applications at different temporalscales (e.g., daily, monthly, or annual).
Residual Correction through a Secondary RFModel: To enhance predictive accuracy, a second Random Forestmodel is trained to estimate residual errors from the initialpredictions. The residuals are defined as the difference betweenobserved and modeled precipitation values at station locations. Bymodeling these residuals as a function of the same covariates used inthe first stage, systematic biases are identified and correctediteratively. The corrected precipitation estimate is obtained by summingthe residual predictions to the initial RF-based precipitationestimates, leading to a refined precipitation product with reduced biasand improved spatial coherence.
Bias Adjustment via Non-Parametric Quantile Mapping(QM): In the third stage, a nonparametric quantile mapping (QM)is applied to adapt the distribution of each time series to the in situobservations of the nearest station. The QM correction will be appliedto those pixels that meet the proximity criterion, which states thatonly pixels within a predefined radius of influence (e.g., ≤15 km) areQM corrected.

The RFplus package is designed to be highly adaptable and can beutilized across various satellite precipitation products and geographicregions. Although initially developed for precipitation bias correction,its methodology is applicable to other environmental variables such astemperature, wind speed, and soil moisture. This versatility makesRFplus a powerful tool for enhancing the accuracy of remotesensing-based estimations across diverse environmental conditions.

Application of the method inR

# Load the in-situ data and the coordinatesdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the covariatesCovariates <- list(  MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "InterpolateR")),  CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "InterpolateR")),  DEM = terra::rast(system.file("extdata/DEM.nc", package = "InterpolateR"))   )# 2. Apply de the modelModel_rfplus = RFplus(BD_Obs, BD_Coord, Covariates, n_round = 1, wet.day = 0.1,ntree = 2000, seed = 123, training = 1, stat_validation = c("M001"), Rain_threshold = NULL, method = "none", ratio = 15, save_model = FALSE, name_save = NULL)# 3. Summary of model performanceprint(Model_rfplus$Validation)

RFmerge

RFmerge is a methodology developed by Baez-Villanueva et al. (2020)for the fusion of satellite precipitation datasets with ground-basedobservations, with the objective of improving the accuracy and spatialrepresentativeness of the data. This package implements RFmerge usingRandom Forest as a machine learning technique to correct biases andadjust the distribution of satellite products to in situmeasurements.

Application of the method inR

# Load the in-situ data and the coordinatesdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the covariatescov <- list(  MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "InterpolateR")),   CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "InterpolateR")),   DEM = terra::rast(system.file("extdata/DEM.nc", package = "InterpolateR"))   )# Apply the RFmergemodel_RFmerge = RFmerge(BD_Obs, BD_Coord, cov, mask = NULL, n_round = 1, ntree = 2000, seed = 123,  training = 1, stat_validation = c("M001"), Rain_threshold = NULL, save_model = FALSE, name_save = NULL)# 3. Summary of model performanceprint(model_RFmerge$Validation)

Validation methods

In all interpolation methods it is possible to perform a validationof the results at point to pixel level. To perform this validationprocess, several parameters such as training, stat_validation andRain_threshold must be correctly configured. These parameters allow todefine how the data will be divided for training and validation.

Key parameters forvalidation

training: This parameter defines the proportionof stations to be used to train the model. The value must be between 0and 1. For example, if you want to use 80% of the stations for trainingand the remaining 20% for validation, the training parameter should beset to 0.8. This means that 80% of the stations will be used to trainthe interpolation model, while the remaining 20% will be used tovalidate it. If the training value is 1, all stations will be used fortraining and no validation will be performed.
stat_validation: This parameter allows toperform a manual validation, i.e. to select the specific stations to beused for validation. Instead of the selection of the stations beingrandom, you can explicitly set the stations you want to use for thispurpose. For example, if you want to use the stations “Est_1”, “Est_5”and “Est_7” for the validation, you must set stat_validation as follows:stat_validation = c(“Est_1”, “Est_5”, “Est_7”)
Rain_threshold: The Rain_threshold parameter isused exclusively when performing precipitation validation. Thisparameter is critical for calculating categorical metrics that helpevaluate the performance of the interpolation model, such as: CriticalSuccess Index (CSI); Probability of Detection (POD); False Alarm Rate(FAR); Success Ratio (SR), etc.
These metrics are essential for assessing the quality ofprecipitation prediction by comparing estimates with actualprecipitation observations.
Parameter format
When using the Rain_Threshold parameter, it should be entered innamed list format, where each element represents a precipitationcategory. The element name should be the category name, and the valuesassociated with each category should be a numeric vector with twovalues: the lower limit and the upper limit of the category. Forexample:
```
Rain_threshold = list(  no_rain = c(0, 1),  light_rain = c(1, 5),  moderate_rain = c(5, 20),  heavy_rain = c(20, 40),  violent_rain = c(40, Inf))
```
By entering these values, InterpolateR will classify theprecipitation values into these categories and can calculate thecorresponding validation metrics to evaluate model performance.

Goodness-of-fitand categorical metrics calculated by InterpolateR

InterpolateR calculates two main types of metrics to assess theaccuracy of interpolation predictions: Goodness-of-fit metrics andcategorical metrics.

Goodness-of-fit metrics

These metrics evaluate the overall agreement between predicted andobserved values. Metrics calculated by InterpolateR include:

Mean absolute error (MAE)
Mean Squared Error (MSE)
Spearman’s Correlation (rSpearman)
pearson’s Correlation (rPearson)
Root mean square error (RMSE)
Kling-Gupta Efficiency (KGE)
Nash-Sutcliffe Efficiency (NSE)
Percentage bias (PBIAS %)

Categorical metrics

These metrics are used to evaluate the model’s ability to predictextreme precipitation events by ranking values according to theirintensity. Categorical metrics calculated by InterpolateR include:

Critical Success Index (CSI).
Probability of detection (POD)
False Alarm Rate (FAR)
Success Rate (SR)
Hit Bias (HB)
Heidke Skill Score (HSS)
Hanssen-Kuipers Discriminant (HK)
Equal Threat Score (ETS)
Gilbert Skill Score

These metrics allow for a more detailed and specific assessment ofthe quality of precipitation predictions based on intensitycategories.

[8]ページ先頭