InterpolateR is one of the most efficient R packages forspatial interpolation. It leverages vectorized operations to maximizecomputational efficiency while maintaining high accuracy ininterpolation tasks. The package integrates both traditional andadvanced interpolation methods, making it a versatile tool for a widerange of spatial analysis applications.
Among the traditional interpolation techniques included are:
Inverse Distance Weighting (IDW): A widely usedmethod that assigns weights to observations based on their inversedistance.
Cressman Objective Analysis Method (Cressman): Apowerful approach that iteratively refines interpolated values based onsurrounding observations.
Additionally,InterpolateR incorporates advanced machinelearning-based interpolation techniques:
RFmerge: Methodology developed byBaez-Villanueva et al. (2020) for the fusion of satellite precipitationdatasets with ground-based observations
RFplus: A novel approach that combines RandomForest with Quantile Mapping to refine interpolated estimates andminimize bias.
The package is continuously evolving, with new interpolation methodsbeing added in future releases. Its highly optimized implementationensures fast computations, making it an excellent choice for researchersand practitioners working with large spatial datasets.
Install the stable version from CRAN.
install.packages("InterpolateR")You can install the development version from GitHub.
devtools::install_github("Jonnathan-Landi/InterpolateR")Note: This version may include errors due to theincorporation of new functionalities.
InterpolateR is designed to maintain a consistent datastructure across all functions. The input datasets BD_Obs and BD_Coordmust follow specific formats to ensure smooth functionality.
BD_Obs can be adata.frame ordata.table, and its structure should follow thisformat:
| Date | Est_1 | Est_2 | Est_3 | … | Est_n |
|---|---|---|---|---|---|
| 2024-01-01 | 12.4 | 15.3 | 10.2 | … | 18.1 |
| 2024-01-02 | 14.1 | 16.8 | 11.0 | … | 19.5 |
| 2024-01-03 | 13.5 | 14.9 | 10.8 | … | 16.4 |
Each column (except the first one) represents a different season(Est_1, Est_2, etc.), while the rows correspond to different time stamps(dates).
BD_Coord must have the following structure:
| Cod | X | Y | Z |
|---|---|---|---|
| Est_1 | 720227 | 9680901 | 2668 |
| Est_2 | 732260 | 9682763 | 2527 |
| Est_3 | 714574 | 967587 | 2956 |
| … | … | … | … |
| Est_n | 734574 | 9683686 | 2622 |
X and Y correspond to the longitude and latitude of each stationin UTM.
Z is the altitude at which the station is located.
Cod is the identifier that matches the column names inBD_Obs.
For correct operation, make sure that the first column of Data_Obscontains the date of each observation. For BD_Coords it is important tostructure the data in this way, i.e. Cod, X,Y,Z. Also, make sure thatthe Cod you assign matches the corresponding column in BD_Obs.
By ensuring that BD_Obs and BD_Coord follow these structures, you canseamlessly use any interpolation method provided by InterpolateR.
Inverse Distance Weighting (IDW) is a deterministic interpolationmethod that assumes that values closer to an unknown point have agreater influence than those farther away. This method assigns weightsto sample points so that their influence decreases as the distance tothe unknown point increases.
IDW estimates the value at an unknown point using a weighted averageof the known values, where the weights are inversely proportional to thedistances between the prediction point and the known points. The basicformula, as defined by Shepard, is:
\[\hat{Z}(s_0) = \frac{\sum_{i=1}^{N} w_i Z(s_i)}{\sum_{i=1}^{N} w_i}\]
where:
The weights are calculated using:
\[w_i = \frac{1}{d(s_0, s_i)^p}\]
where:
library(InterpolateR)# Load data from on-site observationsdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the study area where the interpolation is performed.shapefile <- terra::vect(system.file("extdata/study_area.shp", package = "InterpolateR"))# Perform the interpolationInterpolated_data <- IDW(BD_Obs, BD_Coord, shapefile, grid_resolution = 5, p = 2, n_round = 1, training = 1, Rain_threshold = NULL, stat_validation = "M001", save_model = FALSE, name_save = NULL)# Summary of model performanceprint(Interpolated_data$Validation)Since the algorithms are designed to work in the UTM system, makesure that grid_resolution is entered in Km.
The Cressman objective analysis computes values at grid points
The Cressman method is defined by the following equation:
\[Z_{ij}^a = Z_{ij}^b + \frac{\sum_{k=1}^{n} w_k (Z_k^o -Z_k^b)}{\sum_{k=1}^{n} w_k}\]
where:
\(Z_{ij}^a\) is the analysisvalue at grid point\(i,j\).
\(Z_{ij}^b\) is the backgroundvalue at grid point\(i,j\).
\(Z_k^o\) is the observed valueat station\(k\).
\(Z_k^b\) is the backgroundvalue interpolated to station
\(w_k\) is the weight assignedto station\(k\).
\(n\) is the total number ofstations used.
The weight\(w_k\) is a relation ofinfluence radius\(R\) and the distance\(r\) between the observation point andthe grid point. The weight is defined as:\[w_k = \frac{R^2 - r^2}{R^2 + r^2}\]
where:
Thesearch_radius parameter defines the influence rangefor the Cressman interpolation method. It determines the maximumdistance (in kilometers) within which observational data pointscontribute to the interpolated value at a given location. A largerradius results in smoother interpolated fields but may oversmooth localvariations, while a smaller radius preserves finer details but mayintroduce noise.
The Cressman method typically applies an iterative approach, wherethe search radius is progressively reduced to refine the interpolation.Each iteration recalculates interpolated values with a smaller radius,allowing a better representation of small-scale features in thedataset.
# Load data from on-site observationsdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the study area where the interpolation is performed.shapefile <- terra::vect(system.file("extdata/study_area.shp", package = "InterpolateR"))# Perform the interpolationInterpolated_Cressman <- Cressman(BD_Obs, BD_Coord, shapefile, grid_resolution = 5, search_radius = c(20,10), training = 1,stat_validation = "M001", Rain_threshold = NULL, save_model = FALSE)# Summary of model performanceprint(Interpolated_Cressman$Validation)search_radius should be defined as a numeric vectorrepresenting the influence range in kilometers (km) foreach interpolation iteration. For example, setting:search_radius = c(50, 20, 10).grid_resolution is entered in km.The RFplus package implements a novel spatial extrapolation and biascorrection framework, integrating Random Forest (RF) and QuantileMapping (QM) in a multi-stage process to improve the accuracy ofsatellite precipitation estimates. The methodology consists of three keystages:
Spatial Extrapolation of Precipitation: Thefirst stage employs a Random Forest model to extrapolate the spatialdistribution of precipitation. The model is trained using in-situmeasurements as the response variable and a diverse set of satelliteprecipitation products and environmental covariates as predictors. Thisapproach enables the generation of an initial precipitation field thatextends observed precipitation patterns across unmonitored regions withhigh spatial flexibility, allowing applications at different temporalscales (e.g., daily, monthly, or annual).
Residual Correction through a Secondary RFModel: To enhance predictive accuracy, a second Random Forestmodel is trained to estimate residual errors from the initialpredictions. The residuals are defined as the difference betweenobserved and modeled precipitation values at station locations. Bymodeling these residuals as a function of the same covariates used inthe first stage, systematic biases are identified and correctediteratively. The corrected precipitation estimate is obtained by summingthe residual predictions to the initial RF-based precipitationestimates, leading to a refined precipitation product with reduced biasand improved spatial coherence.
Bias Adjustment via Non-Parametric Quantile Mapping(QM): In the third stage, a nonparametric quantile mapping (QM)is applied to adapt the distribution of each time series to the in situobservations of the nearest station. The QM correction will be appliedto those pixels that meet the proximity criterion, which states thatonly pixels within a predefined radius of influence (e.g., ≤15 km) areQM corrected.
The RFplus package is designed to be highly adaptable and can beutilized across various satellite precipitation products and geographicregions. Although initially developed for precipitation bias correction,its methodology is applicable to other environmental variables such astemperature, wind speed, and soil moisture. This versatility makesRFplus a powerful tool for enhancing the accuracy of remotesensing-based estimations across diverse environmental conditions.
# Load the in-situ data and the coordinatesdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the covariatesCovariates <- list( MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "InterpolateR")), CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "InterpolateR")), DEM = terra::rast(system.file("extdata/DEM.nc", package = "InterpolateR")) )# 2. Apply de the modelModel_rfplus = RFplus(BD_Obs, BD_Coord, Covariates, n_round = 1, wet.day = 0.1,ntree = 2000, seed = 123, training = 1, stat_validation = c("M001"), Rain_threshold = NULL, method = "none", ratio = 15, save_model = FALSE, name_save = NULL)# 3. Summary of model performanceprint(Model_rfplus$Validation)RFmerge is a methodology developed by Baez-Villanueva et al. (2020)for the fusion of satellite precipitation datasets with ground-basedobservations, with the objective of improving the accuracy and spatialrepresentativeness of the data. This package implements RFmerge usingRandom Forest as a machine learning technique to correct biases andadjust the distribution of satellite products to in situmeasurements.
# Load the in-situ data and the coordinatesdata("BD_Obs", package = "InterpolateR")data("BD_Coord", package = "InterpolateR")# Load the covariatescov <- list( MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "InterpolateR")), CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "InterpolateR")), DEM = terra::rast(system.file("extdata/DEM.nc", package = "InterpolateR")) )# Apply the RFmergemodel_RFmerge = RFmerge(BD_Obs, BD_Coord, cov, mask = NULL, n_round = 1, ntree = 2000, seed = 123, training = 1, stat_validation = c("M001"), Rain_threshold = NULL, save_model = FALSE, name_save = NULL)# 3. Summary of model performanceprint(model_RFmerge$Validation)In all interpolation methods it is possible to perform a validationof the results at point to pixel level. To perform this validationprocess, several parameters such as training, stat_validation andRain_threshold must be correctly configured. These parameters allow todefine how the data will be divided for training and validation.
training: This parameter defines the proportionof stations to be used to train the model. The value must be between 0and 1. For example, if you want to use 80% of the stations for trainingand the remaining 20% for validation, the training parameter should beset to 0.8. This means that 80% of the stations will be used to trainthe interpolation model, while the remaining 20% will be used tovalidate it. If the training value is 1, all stations will be used fortraining and no validation will be performed.
stat_validation: This parameter allows toperform a manual validation, i.e. to select the specific stations to beused for validation. Instead of the selection of the stations beingrandom, you can explicitly set the stations you want to use for thispurpose. For example, if you want to use the stations “Est_1”, “Est_5”and “Est_7” for the validation, you must set stat_validation as follows:stat_validation = c(“Est_1”, “Est_5”, “Est_7”)
Rain_threshold: The Rain_threshold parameter isused exclusively when performing precipitation validation. Thisparameter is critical for calculating categorical metrics that helpevaluate the performance of the interpolation model, such as: CriticalSuccess Index (CSI); Probability of Detection (POD); False Alarm Rate(FAR); Success Ratio (SR), etc.
These metrics are essential for assessing the quality ofprecipitation prediction by comparing estimates with actualprecipitation observations.
Parameter format
When using the Rain_Threshold parameter, it should be entered innamed list format, where each element represents a precipitationcategory. The element name should be the category name, and the valuesassociated with each category should be a numeric vector with twovalues: the lower limit and the upper limit of the category. Forexample:
Rain_threshold = list( no_rain = c(0, 1), light_rain = c(1, 5), moderate_rain = c(5, 20), heavy_rain = c(20, 40), violent_rain = c(40, Inf))By entering these values, InterpolateR will classify theprecipitation values into these categories and can calculate thecorresponding validation metrics to evaluate model performance.
InterpolateR calculates two main types of metrics to assess theaccuracy of interpolation predictions: Goodness-of-fit metrics andcategorical metrics.
These metrics evaluate the overall agreement between predicted andobserved values. Metrics calculated by InterpolateR include:
These metrics are used to evaluate the model’s ability to predictextreme precipitation events by ranking values according to theirintensity. Categorical metrics calculated by InterpolateR include:
These metrics allow for a more detailed and specific assessment ofthe quality of precipitation predictions based on intensitycategories.