| Type: | Package |
| Title: | Memory-Based Learning in Spectral Chemometrics |
| Version: | 2.2.5 |
| Date: | 2025-10-16 |
| Maintainer: | Leonardo Ramirez-Lopez <ramirez.lopez.leo@gmail.com> |
| BugReports: | https://github.com/l-ramirez-lopez/resemble/issues |
| Description: | Functions for dissimilarity analysis and memory-based learning (MBL, a.k.a local modeling) in complex spectral data sets. Most of these functions are based on the methods presented in Ramirez-Lopez et al. (2013) <doi:10.1016/j.geoderma.2012.12.014>. |
| License: | MIT + file LICENSE |
| URL: | http://l-ramirez-lopez.github.io/resemble/ |
| Depends: | R (≥ 3.5.0) |
| Imports: | foreach, iterators, Rcpp (≥ 1.0.3), mathjaxr (≥ 1.0),magrittr (≥ 1.5.0), lifecycle (≥ 0.2.0), data.table (≥1.9.8) |
| Suggests: | prospectr, parallel, doParallel, testthat, formatR,rmarkdown, bookdown, knitr |
| LinkingTo: | Rcpp, RcppArmadillo |
| RdMacros: | mathjaxr |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Repository: | CRAN |
| RoxygenNote: | 7.3.2 |
| Encoding: | UTF-8 |
| Config/VersionName: | dstatements |
| Packaged: | 2025-10-17 18:47:43 UTC; leo |
| Author: | Leonardo Ramirez-Lopez |
| Date/Publication: | 2025-10-17 19:20:02 UTC |
Overview of the functions in the resemble package
Description
Functions for memory-based learning

Details
This is the version2.2.5 – dstatementsof the package. It implements a number of functions useful formodeling complex spectral spectra (e.g. NIR, IR).The package includes functions for dimensionality reduction,computing spectral dissimilarity matrices, nearest neighbor search,and modeling spectral data using memory-based learning. This package buildsupon the methods presented in Ramirez-Lopez et al. (2013)doi:10.1016/j.geoderma.2012.12.014.
Development versions can be found in the github repository of the packageathttps://github.com/l-ramirez-lopez/resemble.
The functions available for dimensionality reduction are:
The functions available for computing dissimilarity matrices are:
The functions available for evaluating dissimilarity matrices are:
The functions available for nearest neighbor search:
The functions available for modeling spectral data:
Other supplementary functions:
Author(s)
Maintainer / Creator: Leonardo Ramirez-Lopezramirez.lopez.leo@gmail.com
Authors:
Leonardo Ramirez-Lopez (ORCID)
Antoine Stevens (ORCID)
Claudio Orellano
Raphael Viscarra Rossel (ORCID)
Zefang Shen
Craig Lobsey (ORCID)
Alex Wadoux (ORCID)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex data sets. Geoderma 195-196,268-279.
See Also
Useful links:
Report bugs athttps://github.com/l-ramirez-lopez/resemble/issues
Print method for an object of classlocal_ortho_diss
Description
prints the subsets of local_ortho_diss objects
Usage
## S3 method for class 'local_ortho_diss'x[rows, columns, drop = FALSE, ...]Arguments
x |
|
rows | the indices of the rows |
columns | the indices of the columns |
drop | drop argument |
... | not used |
checks the pc_selection argument
Description
internal
Usage
check_pc_arguments( n_rows_x, n_cols_x, pc_selection, default_max_comp = 40, default_max_cumvar = 0.99, default_max_var = 0.01)Correlation and moving correlation dissimilarity measurements (cor_diss)
Description
Computes correlation and moving correlation dissimilarity matrices.
Usage
cor_diss(Xr, Xu = NULL, ws = NULL, center = TRUE, scale = FALSE)Arguments
Xr | a matrix. |
Xu | an optional matrix containing data of a second set of observations. |
ws | for moving correlation dissimilarity, an odd integer value whichspecifies the window size. If |
center | a logical indicating if the spectral data |
scale | a logical indicating if |
Details
The correlation dissimilarity \(d\) between two observations\(x_i\) and \(x_j\) is based on the Perason'scorrelation coefficient (\(\rho\)) and it can be computed asfollows:
\[d(x_i, x_j) = \frac{1}{2}((1 - \rho(x_i, x_j)))\]The above formula is used whenws = NULL.On the other hand (whenws != NULL) the moving correlationdissimilarity between two observations \(x_i\) and \(x_j\)is computed as follows:
where \(ws\) represents a given window size which rolls sequentiallyfrom 1 up to \(p - ws\) and \(p\) is the number ofvariables of the observations.
The function does not accept input data containing missing values.
Value
a matrix of the computed dissimilarities.
Author(s)
Antoine Stevens andLeonardo Ramirez-Lopez
Examples
library(prospectr)data(NIRsoil)Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]cor_diss(Xr = Xr)cor_diss(Xr = Xr, Xu = Xu)cor_diss(Xr = Xr, ws = 41)cor_diss(Xr = Xr, Xu = Xu, ws = 41)From dissimilarity matrix to neighbors
Description
internal
Usage
diss_to_neighbors( diss_matrix, k = NULL, k_diss = NULL, k_range = NULL, spike = NULL, return_dissimilarity = FALSE, skip_first = FALSE)Arguments
diss_matrix | a matrix representing the dissimilarities betweenobservations in a matrix |
k | an integer value indicating the k-nearest neighbors of eachobservation in |
k_diss | an integer value indicating a dissimilarity treshold.For each observation in |
k_range | an integer vector of length 2 which specifies the minimum(first value) and the maximum (second value) number of neighbors to beretained when the |
spike | a vector of integers indicating what observations in |
return_dissimilarity | logical indicating if the input dissimilaritymust be mirroed in the output. |
skip_first | a logical indicating whether to skip the first neighbor ornot. Default is |
Dissimilarity computation between matrices
Description
This is a wrapper to integrate the different dissimilarity functions of theoffered by package.It computes the dissimilarities between observations innumerical matrices by using an specifed dissmilarity measure.
Usage
dissimilarity(Xr, Xu = NULL, diss_method = c("pca", "pca.nipals", "pls", "mpls", "cor", "euclid", "cosine", "sid"), Yr = NULL, gh = FALSE, pc_selection = list("var", 0.01), return_projection = FALSE, ws = NULL, center = TRUE, scale = FALSE, documentation = character(), ...)Arguments
Xr | a matrix of containing |
Xu | an optional matrix containing data of a second set of observationswith |
diss_method | a character string indicating the method to be used tocompute the dissimilarities between observations. Options are:
|
Yr | a numeric matrix of
|
gh | a logical indicating if the Mahalanobis distance (in the pls scorespace) between each observation and the pls centre/mean must becomputed. |
pc_selection | a list of length 2 to be passed onto the
The default is Optionally, the |
return_projection | a logical indicating if the projection(s) must bereturned. Projections are used if the |
ws | an odd integer value which specifies the window size, when |
center | a logical indicating if |
scale | a logical indicating if |
documentation | an optional character string that can be used todescribe anything related to the |
... | other arguments passed to the dissimilarity functions( |
Details
This function is a wrapper forortho_diss,cor_diss,f_diss,sid. Check the documentation of thesefunctions for further details.
Value
A list with the following components:
dissimilarity: the resulting dissimilarity matrix.projection: anortho_projectionobject. Only outputifreturn_projection = TRUEand ifdiss_method = "pca",diss_method = "pca.nipals",diss_method = "pls"ordiss_method = "mpls".This object contains the projection used to computethe dissimilarity matrix. In case of local dissimilarity matrices,the projection corresponds to the global projection used to select theneighborhoods (seeortho_dissfunction for furtherdetails).gh: a list containing the GH distances as well as thepls projection used to compute the GH.
Author(s)
References
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCALcalibration procedure for near infrared instruments. Journal of Near InfraredSpectroscopy, 5, 223-232.
Westerhaus, M. 2014. Eastern Analytical Symposium Award for outstandingWachievements in near infrared spectroscopy: my contributions toWnear infrared spectroscopy. NIR news, 25(8), 16-20.
See Also
Examples
library(prospectr)data(NIRsoil)# Filter the data using the first derivative with Savitzky and Golay# smoothing filter and a window size of 11 spectral variables and a# polynomial order of 4sg <- savitzkyGolay(NIRsoil$spc, m = 1, p = 4, w = 15)# Replace the original spectra with the filtered onesNIRsoil$spc <- sgXu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]Xu <- Xu[!is.na(Yu), ]Xr <- Xr[!is.na(Yr), ]Yu <- Yu[!is.na(Yu)]Yr <- Yr[!is.na(Yr)]dsm_pca <- dissimilarity( Xr = Xr, Xu = Xu, diss_method = c("pca"), Yr = Yr, gh = TRUE, pc_selection = list("opc", 30), return_projection = TRUE)A function for transforming a matrix from its Euclidean space toits Mahalanobis space
Description
For internal use only
Usage
euclid_to_mahal(X, sm_method = c("svd", "eigen"))evaluation of multiple distances obtained with multiple PCs
Description
internal
Usage
eval_multi_pc_diss( scores, side_info, from = 1, to = ncol(scores), steps = 1, method = c("pc", "pls"), check_dims = TRUE)Euclidean, Mahalanobis and cosine dissimilarity measurements
Description
This function is used to compute the dissimilarity between observationsbased on Euclidean or Mahalanobis distance measures or on cosinedissimilarity measures (a.k.a spectral angle mapper).
Usage
f_diss(Xr, Xu = NULL, diss_method = "euclid", center = TRUE, scale = FALSE)Arguments
Xr | a matrix containing the (reference) data. |
Xu | an optional matrix containing data of a second set of observations(samples). |
diss_method | the method for computing the dissimilarity betweenobservations.Options are |
center | a logical indicating if the spectral data |
scale | a logical indicating if |
Details
The results obtained for Euclidean dissimilarity are equivalent to thosereturned by thestats::dist() function, but are scaleddifferently. However,f_diss is considerably faster (which can beadvantageous when computing dissimilarities for very large matrices). Thefinal scaling of the dissimilarity scores inf_diss wherethe number of variables is used to scale the squared dissimilarity scores. Seethe examples section for a comparison betweenstats::dist() andf_diss.
In the case of both the Euclidean and Mahalanobis distances, the scaleddissimilarity matrix \(D\) between between observations in a givenmatrix \(X\) is computed as follows:
\[d(x_i, x_j)^{2} = \sum (x_i - x_j)M^{-1}(x_i - x_j)^{\mathrm{T}}\]\[d_{scaled}(x_i, x_j) = \sqrt{\frac{1}{p}d(x_i, x_j)^{2}}\]where \(p\) is the number of variables in \(X\), \(M\) is the identitymatrix in the case of the Euclidean distance and the variance-covariancematrix of \(X\) in the case of the Mahalanobis distance. The Mahalanobisdistance can also be viewed as the Euclidean distance after applying alinear transformation of the original variables. Such a linear transformationis done by using a factorization of the inverse covariance matrix as\(M^{-1} = W^{T}W\), where \(M\) is merely the square root of\(M^{-1}\) which can be found by using a singular value decomposition.
Note that when attempting to compute the Mahalanobis distance on a datasetwith highly correlated variables (i.e. spectral variables) thevariance-covariance matrix may result in a singular matrix which cannot beinverted and therefore the distance cannot be computed.This is also the case when the number of observations in the dataset issmaller than the number of variables.
For the computation of the Mahalanobis distance, the mentioned method isused.
The cosine dissimilarity \(c\) between two observations\(x_i\) and \(x_j\) is computed as follows:
\[c(x_i, x_j) = cos^{-1}{\frac{\sum_{k=1}^{p}x_{i,k} x_{j,k}}{\sqrt{\sum_{k=1}^{p} x_{i,k}^{2}} \sqrt{\sum_{k=1}^{p} x_{j,k}^{2}}}}\]where \(p\) is the number of variables of the observations.The function does not accept input data containing missing values.NOTE: The computed distances are divided by the number of variables/columnsinXr.
Value
a matrix of the computed dissimilarities.
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
Examples
library(prospectr)data(NIRsoil)Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]# Euclidean distances between all the observations in Xred <- f_diss(Xr = Xr, diss_method = "euclid")# Equivalence with the dist() fucntion of R baseed_dist <- (as.matrix(dist(Xr))^2 / ncol(Xr))^0.5round(ed_dist - ed, 5)# Comparing the computational timeiter <- 20tm <- proc.time()for (i in 1:iter) { f_diss(Xr)}f_diss_time <- proc.time() - tmtm_2 <- proc.time()for (i in 1:iter) { dist(Xr)}dist_time <- proc.time() - tm_2f_diss_timedist_time# Euclidean distances between observations in Xr and observations in Xued_xr_xu <- f_diss(Xr, Xu)# Mahalanobis distance computed on the first 20 spectral variablesmd_xr_xu <- f_diss(Xr[, 1:20], Xu[, 1:20], "mahalanobis")# Cosine dissimilarity matrixcdiss_xr_xu <- f_diss(Xr, Xu, "cosine")A fast distance algorithm for two matrices written in C++
Description
Computes distances between two data matrices using"euclid", "cor", "cosine"
Usage
fast_diss(X, Y, method)Arguments
X | a matrix |
Y | a matrix |
method | a |
Value
a distance matrix
Author(s)
Antoine Stevens and Leonardo Ramirez-Lopez
A fast algorithm of (squared) Euclidean cross-distance for vectors written in C++
Description
A fast (parallel for linux) algorithm of (squared) Euclidean cross-distance for vectors written in C++
Usage
fast_diss_vector(X)Arguments
X | a vector. |
Details
used internally in ortho_projection
Value
a vector of distance (lower triangle of the distance matrix, stored by column)
Author(s)
Antoine Stevens
Local multivariate regression
Description
internal
Usage
fit_and_predict( x, y, pred_method, scale = FALSE, weights = NULL, newdata, pls_c = NULL, CV = FALSE, tune = FALSE, number = 10, p = 0.75, group = NULL, noise_variance = 0.001, range_prediction_limits = TRUE, pls_max_iter = 1, pls_tol = 1e-06, modified = FALSE, seed = NULL)format internal messages
Description
internal
Usage
format_xr_xu_indices(xr_xu_names)Arguments
xr_xu_names | the names of Xr and Xu |
Cross validation for Gaussian process regression
Description
internal
Usage
gaussian_pr_cv( x, y, scale, weights = NULL, p = 0.75, number = 10, group = NULL, noise_variance = 0.001, retrieve = c("final_model", "none"), seed = NULL)Gaussian process regression with linear kernel (gaussian_process)
Description
Carries out a gaussian process regression with a linear kernel (dot product). For internal use only!
Usage
gaussian_process(X, Y, noisev, scale)Arguments
X | a matrix of predictor variables |
Y | a matrix with a single response variable |
noisev | a value indicating the variance of the noise for Gaussian process regression. Default is 0.001. a matrix with a single response variable |
scale | a logical indicating whether both the predictorsand the response variable must be scaled to zero mean and unit variance. |
Value
a list containing the following elements:
b: the regression coefficients.Xz: the (final transformed) matrix of predictor variables.alpha: the alpha matrix.is.scaled: logical indicating whether both the predictors and response variable were scaled to zero mean and unit variance.Xcenter: if matrix of predictors was scaled, the centering vector used forX.Xscale: if matrix of predictors was scaled, the scaling vector used forX.Ycenter: if matrix of predictors was scaled, the centering vector used forY.Yscale: if matrix of predictors was scaled, the scaling vector used forY.
Author(s)
Leonardo Ramirez-Lopez
Internal Cpp function for performing leave-group-out crossvalidations for gaussian process
Description
For internal use only!.
Usage
gaussian_process_cv(X, Y, mindices, pindices, noisev = 0.001, scale = TRUE, statistics = TRUE)Arguments
X | a matrix of predictor variables. |
Y | a matrix of a single response variable. |
mindices | a matrix with |
pindices | a matrix with |
scale | a logical indicating whether both the predictorsand the response variable must be scaled to zero mean and unit variance. |
statistics | a logical value indicating whether the precision andaccuracy statistics are to be returned, otherwise the predictions for eachvalidation segment are retrieved. |
Value
a list containing the following one-row matrices:
rmse.seg: the RMSEs.st.rmse.seg: the standardized RMSEs.rsq.seg: the coefficients of determination.
Author(s)
Leonardo Ramirez-Lopez
Function for identifiying the column in a matrix with the largest standard deviation
Description
Identifies the column with the largest standard deviation. For internal use only!
Usage
get_col_largest_sd(X)Arguments
X | a matrix. |
Value
a value indicating the index of the column with the largest standard deviation.
Author(s)
Leonardo Ramirez-Lopez
Standard deviation of columns
Description
For internal use only!
Usage
get_col_sds(x)Function for computing the mean of each column in a matrix
Description
Computes the mean of each column in a matrix. For internal use only!
Usage
get_column_means(X)Arguments
X | a a matrix. |
Value
a vector of mean values.
Author(s)
Leonardo Ramirez-Lopez
Function for computing the standard deviation of each column in a matrix
Description
Computes the standard deviation of each column in a matrix. For internal use only!
Usage
get_column_sds(X)Arguments
X | a a matrix. |
Value
a vector of standard deviation values.
Author(s)
Leonardo Ramirez-Lopez
Function for computing sum of each column in a matrix
Description
Computes the sum of each column in a matrix. For internal use only!
Usage
get_column_sums(X)Arguments
X | a matrix. |
Value
a vector of standard deviation values.
Author(s)
Leonardo Ramirez-Lopez
get the evaluation results for categorical data
Description
internal
Usage
get_eval_categorical(y, indices_closest)get the evaluation results for continuous data
Description
internal
Usage
get_eval_continuous(y, indices_closest)A function to obtain the local neighbors based on dissimilaritymatrices from orthogonal projections.
Description
internal function. This function is used to obtain the localneighbors based on dissimilarity matrices from orthogonal projections. Theseneighbors are obatin from an orthogonal projection on a set of precomputedneighbors. This function is used internally by the mbl fucntion.ortho_diss(, .local = TRUE) operates in the same way, however for mbl, it ismore efficient to do the re-search of the neighbors inside its main for loop
Usage
get_ith_local_neighbors( ith_xr, ith_xu, ith_yr, ith_yu = NULL, diss_usage = "none", ith_neig_indices, k = NULL, k_diss = NULL, k_range = NULL, spike = NULL, diss_method, pc_selection, ith_group = NULL, center, scale, ...)Arguments
ith_xr | the set of neighbors of a Xu observation found in Xr |
ith_xu | the Xu observation |
ith_yr | the response values of the set of neighbors of the Xuobservation found in Xr |
ith_yu | the response value of the xu observation |
diss_usage | a character string indicating if the dissimilarity datawill be used as predictors ("predictors") or not ("none"). |
ith_neig_indices | a vector of the original indices of the Xr neighbors. |
k | the number of nearest neighbors to select from the alreadyidentified neighbors |
k_diss | the distance threshold to select the neighbors from the alreadyidentified neighbors |
k_range | a min and max number of allowed neighbors when |
spike | a vector with the indices of the observations forced to beretained as neighbors. They have to be present in all the neighborhoods andat the top of |
diss_method | the ortho_diss() method |
pc_selection | the pc_selection argument as in ortho_diss() |
ith_group | the vector containing the group labes of |
center | center the data in the local diss computation? |
scale | scale the data in the local diss computation? |
Value
a list:
ith_xr: the new Xr data of the neighbors for the ith observation (if
diss_usage = "predictors", this data is combined with the localdissmilarity scores of the neighbors of Xu)ith_yr: the new Yr data of the neighbors for the ith observation
ith_xu: the ith Xu observation (if
diss_usage = "predictors",this data is combined with the local dissmilarity scores to its Xr neighborsith_yu: the ith Yu observation
ith_neigh_diss: the new dissimilarity scores of the neighbors for the ithobservation
ith_group: the group labels for the new ith_xr
n_k: the number of neighbors
ith_components: the number of components used
Author(s)
Leonardo Ramirez-Lopez
Internal Cpp function for computing the weights of the PLS componentsnecessary for weighted average PLS
Description
For internal use only!.
Usage
get_local_pls_weights(projection_mat, xloadings, coefficients, new_x, min_component, max_component, scale, Xcenter, Xscale)Arguments
projection_mat | the projection matrix generated either by the |
xloadings | . |
coefficients | the matrix of regression coefficients. |
new_x | a matrix of one new spectra to be predicted. |
min_component | an integer indicating the minimum number of pls components. |
max_component | an integer indicating the maximum number of pls components. |
scale | a logical indicating whether the matrix of predictors used to create the regression model was scaled. |
Xcenter | a matrix of one row with the values that must be used for centering |
Xscale | if |
Value
a matrix of one row with the weights for each component between the max. and min. specified.
Author(s)
Leonardo Ramirez-Lopez
A function to get the neighbor information
Description
This fucntion gathers information of all neighborhoods of theXu observations found inXr. This information is equired duringlocal regressions.
Usage
get_neighbor_info( Xr, Xu, diss_method, Yr = NULL, k = NULL, k_diss = NULL, k_range = NULL, spike = NULL, pc_selection, return_dissimilarity, center, scale, gh, diss_usage, allow_parallel = FALSE, ...)Details
For local pca and pls distances, the local dissimilarity matrices are notcomputed as it is cheaer to compute them during the local regressions.Instead the global distances (required for later local dissimilarity matrixcomputation are output)
Extract predictions from an object of classmbl
Description
Extract predictions from an object of classmbl
Usage
get_predictions(object)Arguments
object | an object of class |
Value
a data.table of predicted values according to eitherk ork_dist
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
See Also
A function to assign values to sample distribution strata
Description
for internal use only! This function takes a continuous variable,creates n strata based on its distribution and assigns the corresponding startato every value.
Usage
get_sample_strata(y, n = NULL, probs = NULL)Arguments
y | a matrix of one column with the response variable. |
n | the number of strata. |
Value
a data table with the inputy and the corresponding strata toevery value.
A function for stratified calibration/validation sampling
Description
for internal use only! This function selects samplesbased on provided strata.
Usage
get_samples_from_strata( y, original_order, strata, samples_per_strata, sampling_for = c("calibration", "validation"), replacement = FALSE)Arguments
original_order | a matrix of one column with the response variable. |
strata | the number of strata. |
sampling_for | sampling to select the calibration samples ("calibration")or sampling to select the validation samples ("validation"). |
replacement | logical indicating if sampling with replacement must bedone. |
Value
a list with the indices of the calibration and validation samples.
Internal function for computing the weights of the PLS componentsnecessary for weighted average PLS
Description
internal
Usage
get_wapls_weights(pls_model, original_x, type = "w1", new_x = NULL, pls_c)Arguments
pls_model | either an object returned by the |
original_x | the original spectral matrix which was used for calibrating thepls model. |
type | type of weight to be computed. The only available option (forthe moment) is |
new_x | a vector of a new spectral observation. When "w1" is selected, new_xmust be specified. |
pls_c | a vector of length 2 which contains both the minimum and maximumnumber of PLS components for which the weights must be computed. |
Value
get_wapls_weights returns a vector of weights for each PLScomponent specified
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
Computes the weights for pls regressions
Description
This is an internal function that computes the wights required for obtainingeach vector of pls scores. Implementation is done in C++ for improved performance.
Usage
get_weights(X, Y, algorithm = "pls", xls_min_w = 3L, xls_max_w = 15L)Arguments
X | a numeric matrix of spectral data. |
Y | a matrix of one column with the response variable. |
algorithm | a character string indicating what method to use. Options are: |
xls_min_w | an integer indicating the minimum window size for the "xls"method. Only used if |
xls_max_w | an integer indicating the maximum window size for the "xls"method. Only used if |
Value
amatrix of one column containing the weights.
Author(s)
Leonardo Ramirez-Lopez and Claudio Orellano
References
Shenk, J. S., & Westerhaus, M. O. (1991). Populations structuring ofnear infrared spectra and modified partial least squares regression.Crop Science, 31(6), 1548-1555.
Westerhaus, M. (2014). Eastern Analytical Symposium Award for outstandingWachievements in near infrared spectroscopy: my contributions toWnear infrared spectroscopy. NIR news, 25(8), 16-20.
An iterator for local prediction data in mbl
Description
internal function. It collects only the data necessary toexecute a local prediction for the mbl function based on a list of neighbors.Not valid for local dissmilitary (e.g. for ortho_diss(...., .local = TRUE))
Usage
ith_mbl_neighbor( Xr, Xu = NULL, Yr, Yu = NULL, diss_usage = "none", neighbor_indices, neighbor_diss = NULL, diss_xr_xr = NULL, group = NULL)Arguments
Xr | the Xr matrix in mbl. |
Xu | the Xu matrix in mbl. Default |
Yr | the Yr matrix in mbl. |
Yu | the Yu matrix in mbl. Default |
diss_usage | a character string indicating if the dissimilarity datawill be used as predictors ("predictors") or not ("none"). |
neighbor_indices | a matrix with the indices of neighbors of every Xufound in Xr. |
neighbor_diss | a matrix with the dissimilarity socres for the neighborsof every Xu found in Xr. This matrix is organized in the same way as |
diss_xr_xr | a dissimilarity matrix between sampes in Xr. |
group | a factor representing the group labels of Xr. |
Details
isubset will look at the order of knn in each col of D andre-organize the rows of x accordingly
Value
an object ofclass iterator giving the following list:
ith_xr: the Xr data of the neighbors for the ith observation (if
diss_usage = "predictors", this data is combined with the localdissmilarity scores of the neighbors of Xu (or Xr if Xu was not provided))ith_yr: the Yr data of the neighbors for the ith observation
ith_xu: the ith Xu observation (or Xr if Xu was not provided).If
diss_usage = "predictors", this data is combined with the localdissmilarity scores to its Xr neighbors.ith_yu: the ith Yu observation (or Yr observation if Xu was not provided).
ith_neigh_diss: the dissimilarity scores of the neighbors for the ithobservation.
ith_group: the group labels for ith_xr.
n_k: the number of neighbors.
Author(s)
Leonardo Ramirez-Lopez
iterator for nearest neighbor subsets
Description
internal
Usage
ith_subsets_ortho_diss(x, xu = NULL, y, kindx, na_rm = FALSE)Arguments
x | a reference matrix |
xu | a second matrix |
y | a matrix of side information |
kindx | a matrix of nearest neighbor indices |
na_rm | logical indicating whether NAs must be removed. |
Local fit functions
Description
These functions define the way in which each local fit/prediction is donewithin each iteration in thembl function.
Usage
local_fit_pls(pls_c, modified = FALSE, max_iter = 100, tol = 1e-6)local_fit_wapls(min_pls_c, max_pls_c, modified = FALSE, max_iter = 100, tol = 1e-6)local_fit_gpr(noise_variance = 0.001)Arguments
pls_c | an integer indicating the number of pls components to be used inthe local regressions when the partial least squares ( |
modified | a logical indicating whether the modified version of the plsalgorithm (Shenk and Westerhaus, 1991 and Westerhaus, 2014). Default is |
max_iter | an integer indicating the maximum number of iterations incase |
tol | a numeric value indicating the convergence for calculating thescores. Default is 1-e6. |
min_pls_c | an integer indicating the minimum number of pls componentsto be used in the local regressions when the weighted average partial leastsquares ( |
max_pls_c | integer indicating the maximum number of pls componentsto be used in the local regressions when the weighted average partial leastsquares ( |
noise_variance | a numeric value indicating the variance of the noisefor Gaussian process local regressions ( |
Details
These functions are used to indicate how to fitthe regression models within thembl function.
There are three possible options for performing these regressions:
Partial least squares (pls,
local_fit_pls): It uses theorthogonal scores (non-linear iterative partial least squares, nipals)algorithm. The only parameter which needs to be optimized is the number ofpls components.Weighted average pls (
\[w_{j} = \frac{1}{s_{1:j}\times g_{j}}\]local_fit_wapls): This method wasdeveloped by Shenk et al. (1997) and it used as the regression method in thewidely known LOCAL algorithm. It uses multiple models generated by multiplepls components (i.e. between a minimum and a maximum number of plscomponents). At each local partition the final predicted value is a ensemble(weighted average) of all the predicted values generated by the multiple plsmodels. The weight for each component is calculated as follows:where \(s_{1:j}\) is the root mean square of thespectral reconstruction error of the unknown (or target) observation(s)when a total of \(j\) pls components are used and\(g_{j}\) is the root mean square of the squared regressioncoefficients corresponding to the \(j\)th pls component (seeShenk et al., 1997 for more details).
Gaussian process with dot product covariance (
\[A = (X X^{T} + \sigma^2 I)^{-1} Y\]local_fit_gpr):Gaussian process regression is a probabilistic and non-parametric Bayesianmethod. It is commonly described as a collection of random variables whichhave a joint Gaussian distribution and it is characterized by both a meanand a covariance function (Rasmussen and Williams, 2006). The covariancefunction used in the implemented method is the dot product. The onlyparameter to be taken into account in this method is the noise. In thismethod, the process for predicting the response variable of a new sample(\(y_u\)) from its predictor variables(\(x_u\)) is carried out first by computing a predictionvector (\(A\)). It is derived from a reference/training observationscongaing both a response vector (\(Y\)) and predictors (\(X\)) as follows:where \(\sigma^2\) denotes the variance of the noise and \(I\) theidentity matrix (with dimensions equal to the number of observations in\(X\)). The prediction of \(y_{u}\) is then done as follows:
\[\hat{y}_{u} = (x_{u}x_{u}^{T}) A\]
Themodified argument in the pls methods (local_fit_pls()andlocal_fit_wapls()) is used to indicate ifa modified version of the pls algorithm (modified pls or mpls) is to be used.The modified pls was proposed Shenk and Westerhaus(1991, see also Westerhaus, 2014) and it differs from the standard pls methodin the way the weights of the predictors (used to compute the matrix ofscores) are obtained. While pls uses the covariance between response(s)and predictors (and later their deflated versions corresponding at each plscomponent iteration) to obtain these weights, the modified pls uses thecorrelation as weights. The authors indicate that by using correlation,a larger potion of the response variable(s) can be explained.
Value
An object of classlocal_fit mirroring the input arguments.
Author(s)
References
Shenk, J. S., & Westerhaus, M. O. 1991. Populations structuring ofnear infrared spectra and modified partial least squares regression.Crop Science, 31(6), 1548-1555.
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCALcalibration procedure for near infrared instruments. Journal of Near InfraredSpectroscopy, 5, 223-232.
Rasmussen, C.E., Williams, C.K. Gaussian Processes for Machine Learning.Massachusetts Institute of Technology: MIT-Press, 2006.
Westerhaus, M. 2014. Eastern Analytical Symposium Award for outstandingWachievements in near infrared spectroscopy: my contributions toWnear infrared spectroscopy. NIR news, 25(8), 16-20.
See Also
Examples
local_fit_wapls(min_pls_c = 3, max_pls_c = 12)local ortho dissimilarity matrices initialized by a globaldissimilarity matrix
Description
internal
Usage
local_ortho_diss( k_index_matrix, Xr, Yr, Xu, diss_method, pc_selection, center, scale, allow_parallel, ...)Arguments
k_index_matrix | a matrix of nearest neighnbor indices |
Xr | argument passed to ortho_projection |
Yr | argument passed to ortho_projection |
Xu | argument passed to ortho_projection |
diss_method | argument passed to ortho_projection |
pc_selection | argument passed to ortho_projection |
center | argument passed to ortho_projection |
scale | argument passed to ortho_projection |
A function for memory-based learning (mbl)
Description
This function is implemented for memory-based learning (a.k.a.instance-based learning or local regression) which is a non-linear lazylearning approach for predicting a given response variable from a set ofpredictor variables. For each observation in a prediction set, a specificlocal regression is carried out based on a subset of similar observations(nearest neighbors) selected from a reference set. The local model isthen used to predict the response value of the target (prediction)observation. Therefore this function does not yield a globalregression model.
Usage
mbl(Xr, Yr, Xu, Yu = NULL, k, k_diss, k_range, spike = NULL, method = local_fit_wapls(min_pls_c = 3, max_pls_c = min(dim(Xr), 15)), diss_method = "pca", diss_usage = "predictors", gh = TRUE, pc_selection = list(method = "opc", value = min(dim(Xr), 40)), control = mbl_control(), group = NULL, center = TRUE, scale = FALSE, verbose = TRUE, documentation = character(), seed = NULL, ...)Arguments
Xr | a matrix of predictor variables of the reference data(observations in rows and variables in columns). |
Yr | a numeric matrix of one column containing the values of theresponse variable corresponding to the reference data. |
Xu | a matrix of predictor variables of the data to be predicted(observations in rows and variables in columns). |
Yu | an optional matrix of one column containing the values of theresponse variable corresponding to the data to be predicted. Default is |
k | a vector of integers specifying the sequence of k-nearestneighbors to be tested. Either |
k_diss | a numeric vector specifying the sequence of dissimilaritythresholds to be tested for the selection of the nearest neighbors found in |
k_range | an integer vector of length 2 which specifies the minimum(first value) and the maximum (second value) number of neighbors to beretained when the |
spike | an integer vector (with positive and/or negative values) indicatingthe indices of observations in |
method | an object of class |
diss_method | a character string indicating the spectral dissimilaritymetric to be used in the selection of the nearest neighbors of eachobservation. Options are:
Alternatively, a matrix of dissimilarities can also be passed to thisargument. This matrix is supposed to be a user-defined matrixrepresenting the dissimilarities between observations in |
diss_usage | a character string specifying how the dissimilarityinformation shall be used. The possible options are: |
gh | a logical indicating if the global Mahalanobis distance (in the plsscore space) between each observation and the pls mean (centre) must becomputed. This metric is known as the GH distance in the literature. Notethat this computation is based on the number of pls components determined byusing the |
pc_selection | a list of length 2 used for the computation of GH (if
The list |
control | a list created with the |
group | an optional factor (or character vector vectorthat can be coerced to |
center | a logical if the predictor variables must be centred at eachlocal segment (before regression). In addition, if |
scale | a logical indicating if the predictor variables must be scaledto unit variance at each local segment (before regression). In addition, if |
verbose | a logical indicating whether or not to print a progress barfor each observation to be predicted. Default is |
documentation | an optional character string that can be used todescribe anything related to the |
seed | an integer value containing the random number generator (RNG)state for random number generation. This argument can be used forreproducibility purposes (for random sampling) in the cross-validationresults. Default is |
... | further arguments to be passed to the |
Details
The argumentspike can be used to indicate what reference observationsinXr must be kept in the neighborhood of every singleXuobservation. If a vector of length \(m\) is passed to this argument,this means that the \(m\) original neighbors with the largestdissimilarities to the target observations will be forced out of theneighborhood. Spiking might be useful in cases wheresome reference observations are known to be somehow related to the ones inXu and therefore might be relevant for fitting the local models. SeeGuerrero et al. (2010) for an example on the benefits of spiking.
Thembl function uses thedissimilarity function tocompute the dissimilarities betweenXr andXu. The dissimilaritymethod to be used is specified in thediss_method argument.Arguments todissimilarity as well as further arguments to thefunctions used insidedissimilarity(i.e.ortho_disscor_dissf_disssid) can be passed to those functions by using....
Thediss_usage argument is used to specify whether the dissimilarityinformation must be used within the local regressions and, if so, how.Whendiss_usage = "predictors" the local (square symmetric)dissimilarity matrix corresponding the selected neighborhood is used assource of additional predictors (i.e the columns of this local matrix aretreated as predictor variables). In some cases this results in an improvementof the prediction performance (Ramirez-Lopez et al., 2013a).Ifdiss_usage = "weights", the neighbors of the query point(\(xu_{j}\)) are weighted according to their dissimilarity to\(xu_{j}\) before carrying out each local regression. The followingtricubic function (Cleveland and Delvin, 1988; Naes et al., 1990) is used forcomputing the final weights based on the measured dissimilarities:
where if \({xr_{i} \in }\) neighbors of \(xu_{j}\):
\[v_{j}(xu_{j}) = d(xr_{i}, xu_{j})\]otherwise:
\[v_{j}(xu_{j}) = 0\]In the above formulas \(d(xr_{i}, xu_{j})\) represents thedissimilarity between the query point and each object in \(Xr\).Whendiss_usage = "none" is chosen the dissimilarity information isnot used.
The global Mahalanobis distance (a.k.a GH) is computed based on the scoresof a pls projection. A pls projection model is built with for{Yr}, {Xr}and this model is used to obtain the pls scores of theXuobservations. The Mahalanobis distance between eachXu observation in(the pls space) and the centre ofXr is then computed. The number ofpls components is optimized based on the parameters passed to thepc_selection argument. In addition, thembl function alsoreports the GH distance for the observations inXr.
Some aspects of the mbl process, such as the type of internal validation,parameter tuning, what extra objects to return, permission for parallelexecution, prediction limits, etc, can be specified by using thembl_control function.
By using thegroup argument one can specify groups of observationsthat have something in common (e.g. observations with very similar origin).The purpose ofgroup is to avoid biased cross-validation results dueto pseudo-replication. This argument allows to select calibration pointsthat are independent from the validation ones. In this regard, whenvalidation_type = "local_cv" (used inmbl_controlfunction), then thep argument refers to the percentage of groups ofobservations (rather than single observations) to be retained in eachsampling iteration at each local segment.
Value
alist of classmbl with the following components(sorted either byk ork_diss):
call: the call to mbl.cntrl_param: the list with the control parameters passed tocontrol.Xu_neighbors: a list containing two elements: a matrix ofXrindices corresponding to the neighbors ofXuand a matrixof dissimilarities between eachXuobservation and its correspondingneighbor inXr.dissimilarities: a list with the method used to obtain thedissimilarity matrices and the dissimilarity matrix corresponding to\(D(Xr, Xu)\). This object is returned only if thereturn_dissimilarityargument in thecontrollist was settoTRUE.n_predictions: the total number of observations predicted.gh: ifgh = TRUE, a list containing the globalMahalanobis distance values for the observations inXrandXuas well as the results of the global pls projection object used to obtainthe GH values.validation_results: a list of validation results for"local cross validation" (returned if thevalidation_typeincontrollist was set to"local_cv"),"nearest neighbor validation" (returned if thevalidation_typeincontrollist was set to"NNv") and"Yu prediction statistics" (returned ifYuwas supplied).“results: a list of data tables containing the results of thepredictions for each eitherkork_diss. Each data tablecontains the following columns:o_index: The index of the predicted observation.k_diss: This column is only output if thek_dissargument is used. It indicates the corresponding dissimilarity thresholdfor selecting the neighbors.k_original: This column is only output if thek_dissargument is used. It indicates the number of neighbors that were originallyfound when the given dissimilarity threshold is used.k: This column indicates the final number of neighborsused.npls: This column is only output if theplsregression method was used. It indicates the final number of plscomponents used.min_pls: This column is only output ifwaplsregression method was used. It indicates the final number of minimum plscomponents used. If no optimization was set, it retrieves the originalminimum pls components passed to themethodargument.max_pls: This column is only output if thewaplsregression method was used. It indicates the final number of maximum plscomponents used. If no optimization was set, it retrieves the originalmaximum pls components passed to themethodargument.yu_obs: The input values given inYu(the responsevariable corresponding to the data to be predicted). IfYu = NULL,thenNAs are retrieved.pred: The predicted Yu values.yr_min_obs: The minimum reference value (of the responsevariable) in the neighborhood.yr_max_obs: The maximum reference value (of the responsevariable) in the neighborhood.index_nearest_in_Xr: The index of the nearest neighbor foundinXr.index_farthest_in_Xr: The index of the farthest neighborfound inXr.y_nearest: The reference value (Yr) corresponding tothe nearest neighbor found inXr.y_nearest_pred: This column is only output if thevalidation method in the object passed tocontrolwas set to"NNv". It represents the predicted value of the nearest neighborobservation found inXr. This prediction come from model fittedwith the remaining observations in the neighborhood of the targetobservation inXu.loc_rmse_cv: This column is only output if the validationmethod in the object passed tocontrolwas set to'local_cv'. It represents the RMSE of the cross-validationcomputed for the neighborhood of the target observation inXu.loc_st_rmse_cv: This column is only output if thevalidation method in the object passed tocontrolwas set to'local_cv'. It represents the standardized RMSE of thecross-validation computed for the neighborhood of the target observationinXu.dist_nearest: The distance to the nearest neighbor.dist_farthest: The distance to the farthest neighbor.loc_n_components: This column is only output if thedissimilarity method used is one of"pca","pca.nipals"or"pls"and in addition the dissimilarities are requested to becomputed locally by passing.local = TRUEto themblfunction.See.localargument in theortho_dissfunction.
seed: a value mirroring the one passed to seed.documentation: a character string mirroring the one providedin thedocumentationargument.
When thek_diss argument is used, the printed results show a tablewith a column named 'p_bounded. It represents the percentage ofobservations for which the neighbors selected by the given dissimilaritythreshold were outside the boundaries specified in thek_rangeargument.
Author(s)
Leonardo Ramirez-Lopezand Antoine Stevens
References
Cleveland, W. S., and Devlin, S. J. 1988. Locally weighted regression: anapproach to regression analysis by local fitting. Journal of the AmericanStatistical Association, 83, 596-610.
Guerrero, C., Zornoza, R., Gómez, I., Mataix-Beneyto, J. 2010. Spiking ofNIR regional models using observations from target sites: Effect of modelsize on prediction accuracy. Geoderma, 158(1-2), 66-77.
Naes, T., Isaksson, T., Kowalski, B. 1990. Locally weighted regression andscatter correction for near-infrared reflectance data. Analytical Chemistry62, 664-673.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex data sets. Geoderma 195-196,268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte,J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics foruse with soil vis-NIR spectra. Geoderma 199, 43-53.
Rasmussen, C.E., Williams, C.K. Gaussian Processes for Machine Learning.Massachusetts Institute of Technology: MIT-Press, 2006.
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCALcalibration procedure for near infrared instruments. Journal of NearInfrared Spectroscopy, 5, 223-232.
See Also
mbl_control,f_diss,cor_diss,sid,ortho_diss,search_neighbors,local_fit
Examples
library(prospectr)data(NIRsoil)# Proprocess the data using detrend plus first derivative with Savitzky and# Golay smoothing filtersg_det <- savitzkyGolay( detrend(NIRsoil$spc, wav = as.numeric(colnames(NIRsoil$spc)) ), m = 1, p = 1, w = 7)NIRsoil$spc_pr <- sg_det# split into training and testing setstest_x <- NIRsoil$spc_pr[NIRsoil$train == 0 & !is.na(NIRsoil$CEC), ]test_y <- NIRsoil$CEC[NIRsoil$train == 0 & !is.na(NIRsoil$CEC)]train_y <- NIRsoil$CEC[NIRsoil$train == 1 & !is.na(NIRsoil$CEC)]train_x <- NIRsoil$spc_pr[NIRsoil$train == 1 & !is.na(NIRsoil$CEC), ]# Example 1# A mbl implemented in Ramirez-Lopez et al. (2013,# the spectrum-based learner)# Example 1.1# An exmaple where Yu is supposed to be unknown, but the Xu# (spectral variables) are knownmy_control <- mbl_control(validation_type = "NNv")## The neighborhood sizes to testks <- seq(40, 140, by = 20)sbl <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, k = ks, method = local_fit_gpr(), control = my_control, scale = TRUE)sblplot(sbl)get_predictions(sbl)# Example 1.2# If Yu is actually known...sbl_2 <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, Yu = test_y, k = ks, method = local_fit_gpr(), control = my_control)sbl_2plot(sbl_2)# Example 2# the LOCAL algorithm (Shenk et al., 1997)local_algorithm <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, Yu = test_y, k = ks, method = local_fit_wapls(min_pls_c = 3, max_pls_c = 15), diss_method = "cor", diss_usage = "none", control = my_control)local_algorithmplot(local_algorithm)# Example 3# A variation of the LOCAL algorithm (using the optimized pc# dissmilarity matrix) and dissimilarity matrix as source of# additional preditorslocal_algorithm_2 <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, Yu = test_y, k = ks, method = local_fit_wapls(min_pls_c = 3, max_pls_c = 15), diss_method = "pca", diss_usage = "predictors", control = my_control)local_algorithm_2plot(local_algorithm_2)# Example 4# Running the mbl function in parallel with example 2n_cores <- 2if (parallel::detectCores() < 2) { n_cores <- 1}# Alternatively:# n_cores <- parallel::detectCores() - 1# if (n_cores == 0) {# n_cores <- 1# }library(doParallel)clust <- makeCluster(n_cores)registerDoParallel(clust)# Alernatively:# library(doSNOW)# clust <- makeCluster(n_cores, type = "SOCK")# registerDoSNOW(clust)# getDoParWorkers()local_algorithm_par <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, Yu = test_y, k = ks, method = local_fit_wapls(min_pls_c = 3, max_pls_c = 15), diss_method = "cor", diss_usage = "none", control = my_control)local_algorithm_parregisterDoSEQ()try(stopCluster(clust))# Example 5# Using local pls distanceswith_local_diss <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, Yu = test_y, k = ks, method = local_fit_wapls(min_pls_c = 3, max_pls_c = 15), diss_method = "pls", diss_usage = "predictors", control = my_control, .local = TRUE, pre_k = 150,)with_local_dissplot(with_local_diss)A function that controls some few aspects of the memory-based learningprocess in thembl function
Description
This function is used to further control some aspects of the memory-basedlearning process in thembl function.
Usage
mbl_control( return_dissimilarity = FALSE, validation_type = c("NNv", "local_cv"), tune_locally = TRUE, number = 10, p = 0.75, range_prediction_limits = TRUE, progress = TRUE, allow_parallel = TRUE)Arguments
return_dissimilarity | a logical indicating if the dissimilarity matrixbetween |
validation_type | a character vector which indicates the (internal) validationmethod(s) to be used for assessing the global performance of the local models.Possible options are: |
tune_locally | a logical. It only applies when |
number | an integer indicating the number of sampling iterations ateach local segment when |
p | a numeric value indicating the percentage of observations to be retainedat each sampling iteration at each local segment when |
range_prediction_limits | a logical. It indicates whether the predictionlimits at each local regression are determined by the range of the responsevariable within each neighborhood. When the predicted value is outsidethis range, it will be automatically replaced with the value of the nearestrange value. If |
progress | a logical indicating whether or not to print a progress barfor each observation to be predicted. Default is |
allow_parallel | a logical indicating if parallel execution is allowed.If |
Details
The validation methods available for assessing the predictive performance ofthe memory-based learning method used are described as follows:
Leave-nearest-neighbor-out cross-validation (
"NNv"): Fromthe group of neighbors of each observation to be predicted, the nearest observation(i.e. the most similar observation) is excluded and then a local model is fittedusing the remaining neighbors. This model is then used to predict the valueof the target response variable of the nearest observation. These predictedvalues are finally cross validated with the actual values (See Ramirez-Lopezet al. (2013a) for additional details). This method is faster than"local_cv".Local leave-group-out cross-validation (
"local_cv"): Thegroup of neighbors of each observation to be predicted is partitioned intodifferent equal size subsets. Each partition is selected based on astratified random sampling which takes into account the values of theresponse variable of the corresponding set of neighbors. The selectedlocal subset is used as local validation subset and the remaining observationsare used for fitting a model. This model is used to predict the targetresponse variable values of the local validation subset and the local rootmean square error is computed. This process is repeated \(m\) times andthe final local error is computed as the average of the local root meansquare error of all the \(m\) iterations. In themblfunction\(m\) is controlled by thenumberargument and the size of thesubsets is controlled by thepargument which indicates thepercentage of observations to be selected from the subset of nearest neighbours.The global error of the predictions is computed as the average of the localroot mean square errors.No validation (
"none"): No validation is carried out.If"none"is selected along with"NNv"and/or"local_cv", then it will be ignored and the respectivevalidation(s) will be carried out.
Value
alist mirroring the specified parameters
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte,J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics foruse with soil vis-NIR spectra. Geoderma 199, 43-53.
See Also
f_diss,cor_diss,sid,ortho_diss,mbl
Examples
# A control list with the default parametersmbl_control()Moving/rolling correlation distance of two matrices
Description
Computes a moving window correlation distance between two data matrices
Usage
moving_cor_diss(X,Y,w)Arguments
X | a matrix |
Y | a matrix |
w | window size (must be odd) |
Value
a matrix of correlation distance
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
orthogonal scores algorithn of partial leat squares (opls)
Description
Computes orthogonal socres partial least squares (opls)regressions with the NIPALS algorithm. It allows multiple response variables.It does not return the variance information of the components. NOTE: Forinternal use only!
Usage
opls(X, Y, ncomp, scale, maxiter, tol, algorithm = "pls", xls_min_w = 3, xls_max_w = 15)Arguments
X | a matrix of predictor variables. |
Y | a matrix of either a single or multiple response variables. |
ncomp | the number of pls components. |
scale | logical indicating whether |
maxiter | maximum number of iterations. |
tol | limit for convergence of the algorithm in the nipals algorithm. |
algorithm | (for weights computation) a character string indicatingwhat method to use. Options are: |
xls_min_w | (for weights computation) an integer indicating the minimum window size for the "xls"method. Only used if |
xls_max_w | (for weights computation) an integer indicating the maximum window size for the "xls"method. Only used if |
Value
a list containing the following elements:
coefficients: the matrix of regression coefficients.bo: a matrix of one row containing the intercepts for each component.scores: the matrix of scores.X_loadings: the matrix of X loadings.Y_loadings: the matrix of Y loadings.projection_mat: the projection matrix.Y: theYinput.transf: alistconating two objects:XcenterandXscale.weights: the matrix of wheights.
Author(s)
Leonardo Ramirez-Lopez
Internal Cpp function for performing leave-group-out cross-validations for pls regression
Description
For internal use only!.
Usage
opls_cv_cpp(X, Y, scale, method, mindices, pindices, min_component, ncomp, new_x, maxiter, tol, wapls_grid, algorithm, statistics = TRUE)Arguments
X | a matrix of predictor variables. |
Y | a matrix of a single response variable. |
scale | a logical indicating whether the matrix of predictors( |
method | the method used for regression. One of the following options: |
mindices | a matrix with |
pindices | a matrix with |
min_component | an integer indicating the number of minimum plscomponents (if the |
ncomp | an integer indicating the number of pls components. |
new_x | a matrix of one row corresponding to the observation to bepredicted (if the |
maxiter | maximum number of iterations. |
tol | limit for convergence of the algorithm in the nipals algorithm. |
wapls_grid | the grid on which the search for the best combination ofminimum and maximum pls factors of |
algorithm | either pls ( |
statistics | a logical value indicating whether the precision andaccuracy statistics are to be returned, otherwise the predictions for eachvalidation segment are retrieved. |
Value
ifstatistics = true a list containing the following one-row matrices:
rmse_seg: the RMSEs.st_rmse_seg: the standardized RMSEs.rsq_seg: the coefficients of determination.
ifstatistics = false a list containing the following one-row matrices:
predictions: the predictions of each of the validationsegments inpindices. Each column inpindicescontains thevalidation indices of a segment.st_rmse_seg: the standardized RMSEs.rsq_seg: the coefficients of determination.
Ifmethod = "wapls", data of the pls weights are output in thislist(compweights).
Ifmethod = "completewapls1", data of all the combination ofcomponents passed inwapls_grid areoutput in this list(complete_compweights).
Author(s)
Leonardo Ramirez-Lopez
orthogonal scores algorithn of partial leat squares (opls) projection
Description
Computes orthogonal socres partial least squares (opls)projection with the NIPALS algorithm. It allows multiple response variables.Although the main use of the function is for projection, it also retrievesregression coefficients. NOTE: For internal use only!
Usage
opls_for_projection(X, Y, ncomp, scale, maxiter, tol, pcSelmethod = "var", pcSelvalue = 0.01, algorithm = "pls", xls_min_w = 3, xls_max_w = 15)Arguments
X | a matrix of predictor variables. |
Y | a matrix of either a single or multiple response variables. |
ncomp | the number of pls components. |
scale | logical indicating whether |
maxiter | maximum number of iterations. |
tol | limit for convergence of the algorithm in the nipals algorithm. |
pcSelmethod | if |
pcSelvalue | a numerical value that complements the selected method( |
algorithm | (for weights computation) a character string indicatingwhat method to use. Options are: |
xls_min_w | (for weights computation) an integer indicating the minimum window size for the "xls"method. Only used if |
xls_max_w | (for weights computation) an integer indicating the maximum window size for the "xls"method. Only used if |
Value
a list containing the following elements:
coefficients: the matrix of regression coefficients.bo: a matrix of one row containing the intercepts foreach component.scores: the matrix of scores.X_loadings: the matrix of X loadings.Y_loadings: the matrix of Y loadings.projection_mat: the projection matrix.Y: theYinput.variance: alistconating two objects:x_varandy_var.These objects contain information on the explained variance for theXandYmatrices respectively.transf: alistconating two objects:XcenterandXscale.weights: the matrix of wheights.
Author(s)
Leonardo Ramirez-Lopez
orthogonal scores algorithn of partial leat squares (opls_get_all)
Description
Computes orthogonal socres partial least squares (opls_get_all)regressions with the NIPALS algorithm. It retrives a comprehensive set ofpls outputs (e.g. vip and sensivity radius). It allows multiple responsevariables. NOTE: For internal use only!
Usage
opls_get_all(X, Y, ncomp, scale, maxiter, tol, algorithm = "pls", xls_min_w = 3, xls_max_w = 15)Arguments
X | a matrix of predictor variables. |
Y | a matrix of either a single or multiple response variables. |
ncomp | the number of pls components. |
scale | logical indicating whether |
maxiter | maximum number of iterations. |
tol | limit for convergence of the algorithm in the nipals algorithm. |
algorithm | (for weights computation) a character string indicatingwhat method to use. Options are: |
xls_min_w | (for weights computation) an integer indicating the minimum window size for the "xls"method. Only used if |
xls_max_w | (for weights computation) an integer indicating the maximum window size for the "xls"method. Only used if |
Value
a list containing the following elements:
ncomp: the number of components used.coefficients: the matrix of regression coefficients.bo: a matrix of one row containing the intercepts for each component.scores: the matrix of scores.X_loadings: the matrix of X loadings.Y_loadings: the matrix of Y loadings.vip: the projection matrix.selectivity_ratio: the matrix of selectivity ratio (see Rajalahti, Tarja, et al. 2009).Y: theYinput.variance: alistconating two objects:x_varandy_var.These objects contain information on the explained variance for theXandYmatrices respectively.transf: alistconating two objects:XcenterandXscale.weights: the matrix of wheights.
Author(s)
Leonardo Ramirez-Lopez
fast orthogonal scores algorithn of partial leat squares (opls)
Description
Computes orthogonal socres partial least squares (opls)regressions with the NIPALS algorithm. It allows multiple response variables.In contrast toopls function, this one does not compute unnecessarydata for (local) regression.For internal use only!
Usage
opls_get_basics(X, Y, ncomp, scale, maxiter, tol, algorithm = "pls", xls_min_w = 3, xls_max_w = 15)Arguments
X | a matrix of predictor variables. |
Y | a matrix of either a single or multiple response variables. |
ncomp | the number of pls components. |
scale | logical indicating whether |
maxiter | maximum number of iterations. |
tol | limit for convergence of the algorithm in the nipals algorithm. |
algorithm | (for weights computation) a character string indicatingwhat method to use. Options are: |
xls_min_w | (for weights computation) an integer indicating the minimum window size for the "xls"method. Only used if |
xls_max_w | (for weights computation) an integer indicating the maximum window size for the "xls"method. Only used if |
Value
a list containing the following elements:
coefficients: the matrix of regression coefficients.bo: a matrix of one row containing the intercepts for each component.Y_loadings: the matrix of Y loadings.projection_mat: the projection matrix.transf: alistconating two objects:XcenterandXscale.
Author(s)
Leonardo Ramirez-Lopez
orthogonal scores algorithm of partial leat squares (opls)
Description
Computes orthogonal scores partial least squares (opls)regressions with the NIPALS algorithm. It allows multiple response variables.It does not return the variance information of the components. NOTE: Forinternal use only!
Usage
opls_gs(Xr, Yr, Xu, ncomp, scale, response = FALSE, reconstruction = TRUE, similarity = TRUE, fresponse = TRUE, algorithm = "pls")Arguments
Xr | a matrix of predictor variables for the training set. |
Yr | a matrix of a single response variable for the training set. |
Xu | a matrix of predictor variables for the test set. |
ncomp | the number of pls components. |
scale | logical indicating whether |
response | logical indicating whether to compute the prediction of |
reconstruction | logical indicating whether to compute the reconstruction error of |
similarity | logical indicating whether to compute the the distance score between |
fresponse | logical indicating whether to compute the score of the variance not explained for |
algorithm | (for weights computation) a character string indicatingwhat method to use. Options are: |
Value
a list containing the following elements:
ncomp: the number of components.pred_response: the response predictions forXu.rmse_reconstruction: the rmse of the reconstruction forXu.score_dissimilarity: the distance score betweenXrandXu.
Author(s)
Leonardo Ramirez-Lopez
A function to construct an optimal strata for the samples, based onthe distribution of the given y.
Description
for internal use only! This function computes the optimal stratafrom the distribution of the given y
Usage
optim_sample_strata(y, n)Arguments
y | a matrix of one column with the response variable. |
n | number of samples that must be sampled. |
Value
a list with twodata.table objects:sample_strata containsthe optimal strata, whereassamples_to_get contains information on howmany samples per stratum are supposed to be drawn.
A function for computing dissimilarity matrices from orthogonalprojections (ortho_diss)
Description
This function computes dissimilarities (in an orthogonal space) betweeneither observations in a given set or between observations in two differentsets.The dissimilarities are computed based on either principal componentprojection or partial least squares projection of the data. After projectingthe data, the Mahalanobis distance is applied.
Usage
ortho_diss(Xr, Xu = NULL, Yr = NULL, pc_selection = list(method = "var", value = 0.01), diss_method = "pca", .local = FALSE, pre_k, center = TRUE, scale = FALSE, compute_all = FALSE, return_projection = FALSE, allow_parallel = TRUE, ...)Arguments
Xr | a matrix containing |
Xu | an optional matrix containing data of a second set of observationswith |
Yr | a matrix of
|
pc_selection | a list of length 2 which specifies the method to be usedfor optimizing the number of components (principal components or pls factors)to be retained. This list must contain two elements (in the following order):
Default is Optionally, the |
diss_method | a character value indicating the type of projection on whichthe dissimilarities must be computed. This argument is equivalent to
See the |
.local | a logical indicating whether or not to compute the dissimilaritieslocally (i.e. projecting locally the data) by using the |
pre_k | if |
center | a logical indicating if the |
scale | a logical indicating if the |
compute_all | a logical. In case |
return_projection | a logical. If |
allow_parallel | a logical (default TRUE). It allows parallel computingof the local distance matrices (i.e. when |
... | additional arguments to be passed to the |
Details
When.local = TRUE, first a global dissimilarity matrix is computed based onthe parameters specified. Then, by using this matrix for each targetobservation, a given set of nearest neighbors (pre_k) are identified.These neighbors (together with the target observation) are projected(from the original data space) onto a (local) orthogonal space (using thesame parameters specified in the function). In this projected space theMahalanobis distance between the target observation and its neighbors isrecomputed. A missing value is assigned to the observations that do not belong tothis set of neighbors (non-neighbor observations).In this case the dissimilarity matrix cannot be considered as a distancemetric since it does not necessarily satisfies the symmetry condition fordistance matrices (i.e. given two observations \(x_i\) and \(x_j\), the localdissimilarity (\(d\)) between them is relative since generally\(d(x_i, x_j) \neq d(x_j, x_i)\)). On the other hand, when.local = FALSE, the dissimilarity matrix obtained can be considered asa distance matrix.
In the cases where"Yr" is required to compute the dissimilarities andif.local = TRUE, care must be taken as some neighborhoods mightnot have enough observations with non-missing"Yr" values, which might retrieveunreliable dissimilarity computations.
If"opc" or"manual" are used inpc_selection$methodand.local = TRUE, the minimum number of observations with non-missing"Yr" values at each neighborhood is determined bypc_selection$value (i.e. the maximum number of components to compute).
Value
alist of classortho_diss with the following elements:
n_components: the number of components (either principalcomponents or partial least squares components) used for computing theglobal dissimilarities.global_variance_info: the information about the expalinedvariance(s) of the projection. When.local = TRUE, the informationcorresponds to the global projection done prior computing the localprojections.local_n_components: if.local = TRUE, a data.tablewhich specifies the number of local components (either principal componentsor partial least squares components) used for computing the dissimilaritybetween each target observation and its neighbor observations.dissimilarity: the computed dissimilarity matrix. If.local = FALSEa distance matrix. If.local = TRUEa matrix ofclasslocal_ortho_diss. In this case, each column represent the dissimilaritybetween a target observation and its neighbor observations.projection: ifreturn_projection = TRUE,anortho_projectionobject.
Author(s)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte,J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for usewith soil vis-NIR spectra. Geoderma 199, 43-53.
See Also
Examples
library(prospectr)data(NIRsoil)Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Yu <- NIRsoil[!as.logical(NIRsoil$train), "CEC", drop = FALSE]Yr <- NIRsoil[as.logical(NIRsoil$train), "CEC", drop = FALSE]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]Xu <- Xu[!is.na(Yu), ]Yu <- Yu[!is.na(Yu), , drop = FALSE]Xr <- Xr[!is.na(Yr), ]Yr <- Yr[!is.na(Yr), , drop = FALSE]# Computation of the orthogonal dissimilarity matrix using the# default parameterspca_diss <- ortho_diss(Xr, Xu)# Computation of a principal component dissimilarity matrix using# the "opc" method for the selection of the principal componentspca_diss_optim <- ortho_diss( Xr, Xu, Yr, pc_selection = list("opc", 40), compute_all = TRUE)# Computation of a partial least squares (PLS) dissimilarity# matrix using the "opc" method for the selection of the PLS# componentspls_diss_optim <- ortho_diss( Xr = Xr, Xu = Xu, Yr = Yr, pc_selection = list("opc", 40), diss_method = "pls")Orthogonal projections using principal component analysis and partialleast squares
Description
Functions to perform orthogonal projections of high dimensional data matricesusing principal component analysis (pca) and partial least squares (pls).
Usage
ortho_projection(Xr, Xu = NULL, Yr = NULL, method = "pca", pc_selection = list(method = "var", value = 0.01), center = TRUE, scale = FALSE, ...)pc_projection(Xr, Xu = NULL, Yr = NULL, pc_selection = list(method = "var", value = 0.01), center = TRUE, scale = FALSE, method = "pca", tol = 1e-6, max_iter = 1000, ...)pls_projection(Xr, Xu = NULL, Yr, pc_selection = list(method = "opc", value = min(dim(Xr), 40)), scale = FALSE, method = "pls", tol = 1e-6, max_iter = 1000, ...)## S3 method for class 'ortho_projection'predict(object, newdata, ...)Arguments
Xr | a matrix of observations. |
Xu | an optional matrix containing data of a second set of observations. |
Yr | if the method used in the |
method | the method for projecting the data. Options are:
|
pc_selection | a list of length 2 which specifies the method to be usedfor optimizing the number of components (principal components or pls factors)to be retained. This list must contain two elements (in the following order):
The list |
center | a logical indicating if the data |
scale | a logical indicating if |
... | additional arguments to be passedto |
tol | tolerance limit for convergence of the algorithm in the nipalsalgorithm (default is 1e-06). In the case of PLS this applies only to Yr withmore than one variable. |
max_iter | maximum number of iterations (default is 1000). In the case of |
object | object of class |
newdata | an optional data frame or matrix in which to look for variableswith which to predict. If omitted, the scores are used. It must contain thesame number of columns, to be used in the same order. |
Details
In the case ofmethod = "pca", the algorithm used is the singular valuedecomposition in which a given data matrix (\(X\)) is factorized as follows:
where \(U\) and \(V\) are orthogonal matrices, being the left and rightsingular vectors of \(X\) respectively, \(D\) is a diagonal matrixcontaining the singular values of \(X\) and \(V\) is the is a matrix ofthe right singular vectors of \(X\).The matrix of principal component scores is obtained by a matrixmultiplication of \(U\) and \(D\), and the matrix of principal componentloadings is equivalent to the matrix \(V\).
Whenmethod = "pca.nipals", the algorithm used for principal componentanalysis is the non-linear iterative partial least squares (nipals).
In the case of the of the partial least squares projection (a.k.a projectionto latent structures) the nipals regression algorithm is used by default.Details on the "nipals" algorithm are presented in Martens (1991). Anothermethod called modified pls ('mpls') can also be used. The modifiedpls was proposed Shenk and Westerhaus (1991, see also Westerhaus, 2014) and itdiffers from the standard pls method in the way the weights of theXr(used to compute the matrix of scores) are obtained. While pls uses the covariancebetweenYr andXr (and later their deflated versionscorresponding at each pls component iteration) to obtain these weights, the modified plsuses the correlation as weights. The authors indicate that by using correlation,a larger potion of the response variable(s) can be explained.
Whenmethod = "opc", the selection of the components is carried out byusing an iterative method based on the side information concept(Ramirez-Lopez et al. 2013a, 2013b). First let be \(P\) a sequence ofretained components (so that \(P = 1, 2, ...,k \)).At each iteration, the function computes a dissimilarity matrix retaining\(p_i\) components. The values in this side information variable arecompared against the side information values of their most spectrally similarobservations (closestXr observation).The optimal number of components retrieved by the function is the one thatminimizes the root mean squared differences (RMSD) in the case of continuousvariables, or maximizes the kappa index in the case of categorical variables.In this process, thesim_eval function is used.Note that for the"opc" methodYr is required (i.e. theside information of the observations).
Value
alist of classortho_projection with the followingcomponents:
scores: a matrix of scores corresponding to the observations inXr(andXuif it was provided). The components retrievedcorrespond to the ones optimized or specified.X_loadings: a matrix of loadings corresponding to theexplanatory variables. The components retrieved correspond to the onesoptimized or specified.Y_loadings: a matrix of partial least squares loadingscorresponding toYr. The components retrieved correspond to theones optimized or specified.This object is only returned if the partial least squares algorithm was used.weigths: a matrix of partial least squares ("pls") weights.This object is only returned if the "pls" algorithm was used.projection_mat: a matrix that can be used to project new dataonto a "pls" space. This object is only returned if the "pls" algorithm wasused.variance: a list with information on the original variance andthe explained variances. This list contains a matrix indicating the amount ofvariance explained by each component (var), the ratio between explainedvariance by each single component and the original variance (explained_var) andthe cumulative ratio of explained variance (cumulative_explained_var).The amount of variance explained by each component is computed by multiplyingits score vector by its corresponding loading vector and calculating thevariance of the result. These values are computed based on the observationsused to create the projection matrices. For example if the "pls" method wasused, then these values are computed based only on the data that containsinformation onYr(i.e. theXrdata). If the principalcomponent method is used, the this data is computed on the basis ofXrandXu(if it applies) since both matrices are employed inthe computation of the projection matrix (loadings in this case).sdv: the standard deviation of the retrieved scores. This vectorcan be different from the "sd" invariance.n_components: the number of components (either principalcomponents or partial least squares components) used for computing theglobal dissimilarity scores.opc_evaluation: a matrix containing the statistics computedfor optimizing the number of principal components based on the variable(s)specified in theYrargument. IfYrwas a continuous was acontinuous vector or matrix then this object indicates the root mean squareof differences (rmse) for each number of components. IfYrwas acategorical variable this object indicates the kappa values for each numberof components. This object is returned only if"opc"was used withinthepc_selectionargument. See thesim_evalfunction formore details.method: theortho_projectionmethod used.
predict.ortho_projection, returns a matrix of scores proprojected fornewdtata.
Author(s)
References
Martens, H. (1991). Multivariate calibration. John Wiley & Sons.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte,J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for usewith soil vis-NIR spectra. Geoderma 199, 43-53.
Shenk, J. S., & Westerhaus, M. O. 1991. Populations structuring ofnear infrared spectra and modified partial least squares regression.Crop Science, 31(6), 1548-1555.
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCALcalibration procedure for near infrared instruments. Journal of Near InfraredSpectroscopy, 5, 223-232.
Westerhaus, M. 2014. Eastern Analytical Symposium Award for outstandingWachievements in near infrared spectroscopy: my contributions toWnear infrared spectroscopy. NIR news, 25(8), 16-20.
See Also
Examples
library(prospectr)data(NIRsoil)# Proprocess the data using detrend plus first derivative with Savitzky and# Golay smoothing filtersg_det <- savitzkyGolay( detrend(NIRsoil$spc, wav = as.numeric(colnames(NIRsoil$spc)) ), m = 1, p = 1, w = 7)NIRsoil$spc_pr <- sg_det# split into training and testing setstest_x <- NIRsoil$spc_pr[NIRsoil$train == 0 & !is.na(NIRsoil$CEC), ]test_y <- NIRsoil$CEC[NIRsoil$train == 0 & !is.na(NIRsoil$CEC)]train_y <- NIRsoil$CEC[NIRsoil$train == 1 & !is.na(NIRsoil$CEC)]train_x <- NIRsoil$spc_pr[NIRsoil$train == 1 & !is.na(NIRsoil$CEC), ]# A principal component analysis using 5 componentspca_projected <- ortho_projection(train_x, pc_selection = list("manual", 5))pca_projected# A principal components projection using the "opc" method# for the selection of the optimal number of componentspca_projected_2 <- ortho_projection( Xr = train_x, Xu = test_x, Yr = train_y, method = "pca", pc_selection = list("opc", 40))pca_projected_2plot(pca_projected_2)# A partial least squares projection using the "opc" method# for the selection of the optimal number of componentspls_projected <- ortho_projection( Xr = train_x, Xu = test_x, Yr = train_y, method = "pls", pc_selection = list("opc", 40))pls_projectedplot(pls_projected)# A partial least squares projection using the "cumvar" method# for the selection of the optimal number of componentspls_projected_2 <- ortho_projection( Xr = train_x, Xu = test_x, Yr = train_y, method = "pls", pc_selection = list("cumvar", 0.99))Function for computing the overall variance of a matrix
Description
Computes the variance of a matrix. For internal use only!
Usage
overall_var(X)Arguments
X | a matrix. |
Value
a vector of standard deviation values.
Author(s)
Leonardo Ramirez-Lopez
Principal components based on the non-linear iterative partial least squares (nipals) algorithm
Description
Computes orthogonal socres partial least squares (opls) regressions with the NIPALS algorithm. It allows multiple response variables.For internal use only!
Usage
pca_nipals(X, ncomp, center, scale, maxiter, tol, pcSelmethod = "var", pcSelvalue = 0.01)Arguments
X | a matrix of predictor variables. |
ncomp | the number of pls components. |
scale | logical indicating whether |
maxiter | maximum number of iterations. |
tol | limit for convergence of the algorithm in the nipals algorithm. |
pcSelmethod | the method for selecting the number of components.Options are: |
pcSelvalue | a numerical value that complements the selected method ( |
Value
a list containing the following elements:
pc_scores: a matrix of principal component scores.pc_loadings: a matrix of of principal component loadings.variance: a matrix of the variance of the principal components.scale: alistconating two objects:centerandscale, which correspond to the vectors used to center and scale the input matrix.
Author(s)
Leonardo Ramirez-Lopez
Get the package version info
Description
returns package info.
Usage
pkg_info(pkg = "resemble")Arguments
pkg | the package name i.e "resemble" |
Plot method for an object of classmbl
Description
Plots the content of an object of classmbl
Usage
## S3 method for class 'mbl'plot(x, g = c("validation", "gh"), param = "rmse", pls_c = c(1,2), ...)Arguments
x | an object of class |
g | a character vector indicating what results shall be plotted.Options are: |
param | a character string indicating what validation statistics shall beplotted. The following options are available: |
pls_c | a numeric vector of length one or two indicating the pls factors to beplotted. Default is |
... | some arguments to be passed to the plot methods. |
Details
For plotting the pls scores from the pls score matrix (of more than one column),this matrix is first transformed from the Euclidean space to the Mahalanobisspace. This is done by multiplying the score matrix by the root square ofits covariance matrix. The root square of this matrix is estimated using asingular value decomposition.
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
See Also
Examples
library(prospectr)data(NIRsoil)Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]Xu <- Xu[!is.na(Yu), ]Yu <- Yu[!is.na(Yu)]Xr <- Xr[!is.na(Yr), ]Yr <- Yr[!is.na(Yr)]ctrl <- mbl_control(validation_type = "NNv")ex_1 <- mbl( Yr = Yr, Xr = Xr, Xu = Xu, diss_method = "cor", diss_usage = "none", gh = TRUE, mblCtrl = ctrl, k = seq(50, 250, 30))plot(ex_1)plot(ex_1, g = "gh", pls_c = c(2, 3))Plot method for an object of classortho_projection
Description
Plots objects of classortho_projection
Usage
## S3 method for class 'ortho_projection'plot(x, col = "dodgerblue", ...)Arguments
x | an object of class |
col | the color of the plots (default is "dodgerblue") |
... | arguments to be passed to methods. |
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
See Also
Cross validation for PLS regression
Description
for internal use only!
Usage
pls_cv( x, y, ncomp, method = c("pls", "wapls"), center = TRUE, scale, min_component = 1, new_x = matrix(0, 1, 1), weights = NULL, p = 0.75, number = 10, group = NULL, retrieve = TRUE, tune = TRUE, max_iter = 1, tol = 1e-06, seed = NULL, modified = FALSE)Prediction function for thegaussian_process function (Gaussian process regression with dot product covariance)
Description
Predicts response values based on a model generated by thegaussian_process function (Gaussian process regression with dot product covariance). For internal use only!.
Usage
predict_gaussian_process(Xz, alpha, newdata, scale, Xcenter, Xscale, Ycenter, Yscale)Arguments
newdata | a matrix containing the predictor variables |
scale | a logical indicating whether the matrix of predictors used to create the regression model(in the |
Xcenter | if |
Xscale | if |
Ycenter | if |
Yscale | if |
Value
a matrix of predicted values
Author(s)
Leonardo Ramirez-Lopez
Prediction function for theopls andfopls functions
Description
Predicts response values based on a model generated by either byopls or thefopls functions.For internal use only!.
Usage
predict_opls(bo, b, ncomp, newdata, scale, Xscale)Arguments
bo | a numeric value indicating the intercept. |
b | the matrix of regression coefficients. |
ncomp | an integer value indicating how may components must be used in the prediction. |
newdata | a matrix containing the predictor variables. |
scale | a logical indicating whether the matrix of predictors used to create the regression model was scaled. |
Xscale | if |
Value
a matrix of predicted values.
Author(s)
Leonardo Ramirez-Lopez
Print method for an object of classlocal_fit
Description
Prints the contents of an object of classlocal_fit
Usage
## S3 method for class 'local_fit'print(x, ...)Arguments
x | an object of class |
... | not yet functional. |
Author(s)
Leonardo Ramirez-Lopez
Print method for an object of classortho_diss
Description
Prints the content of an object of classortho_diss
Usage
## S3 method for class 'local_ortho_diss'print(x, ...)Arguments
x | an object of class |
... | arguments to be passed to methods (not yet functional). |
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
Print method for an object of classmbl
Description
Prints the content of an object of classmbl
Usage
## S3 method for class 'mbl'print(x, ...)Arguments
x | an object of class |
... | arguments to be passed to methods (not functional). |
Author(s)
Leonardo Ramirez-Lopez and Antoine Stevens
Print method for an object of classortho_projection
Description
Prints the contents of an object of classortho_projection
Usage
## S3 method for class 'ortho_projection'print(x, ...)Arguments
x | an object of class |
... | arguments to be passed to methods (not yet functional). |
Author(s)
Leonardo Ramirez-Lopez
Projection function for theopls function
Description
Projects new spectra onto a PLS space based on a model generated by either byopls or theopls2 functions.For internal use only!.
Usage
project_opls(projection_mat, ncomp, newdata, scale, Xcenter, Xscale)Arguments
projection_mat | the projection matrix generated by the |
ncomp | an integer value indicating how may components must be used in the prediction. |
newdata | a matrix containing the predictor variables. |
scale | a logical indicating whether the matrix of predictors used to create the regression model was scaled. |
Xcenter | a matrix of one row with the values that must be used for centering |
Xscale | if |
Value
a matrix corresponding to the new spectra projected onto the PLS space
Author(s)
Leonardo Ramirez-Lopez
Projection to pls and then re-construction
Description
Projects spectra onto a PLS space and then reconstructs it back.
Usage
reconstruction_error(x, projection_mat, xloadings, scale, Xcenter, Xscale, scale_back = FALSE)Arguments
x | a matrix to project. |
projection_mat | the projection matrix generated by the |
xloadings | the loadings matrix generated by the |
scale | logical indicating if scaling is required |
Xcenter | a matrix of one row with the centering values |
Xscale | a matrix of one row with the scaling values |
scale_back | compute the reconstruction error after de-centering thedata and de-scaling it. |
Value
a matrix of 1 row and 1 column.
Author(s)
Leonardo Ramirez-Lopez
A function to create calibration and validation sample sets forleave-group-out cross-validation
Description
for internal use only! This is stratified sampling based on thevalues of a continuous response variable (y). If group is provided, thesampling is done based on the groups and the average of y per group. Thisfunction is used to create calibration and validation groups forleave-group-out cross-validations (orleave-group-of-groups-out cross-validation if group argument is provided).
Usage
sample_stratified(y, p, number, group = NULL, replacement = FALSE, seed = NULL)Arguments
y | a matrix of one column with the response variable. |
p | the percentage of samples (or groups if group argument is used) toretain in the validation_indices set |
number | the number of sample groups to be crated |
group | the labels for each sample in |
replacement | A logical indicating sample replacements for thecalibration set are required. |
seed | an integer for random number generator (default |
Value
a list with two matrices (hold_in andhold_out) giving the indices of the observations in eachcolumn. The number of columns represents the number of sampling repetitions.
A function for searching in a given reference set the neighbors ofanother given set of observations (search_neighbors)
Description
This function searches in a reference set the neighbors of the observationsprovided in another set.
Usage
search_neighbors(Xr, Xu, diss_method = c("pca", "pca.nipals", "pls", "mpls", "cor", "euclid", "cosine", "sid"), Yr = NULL, k, k_diss, k_range, spike = NULL, pc_selection = list("var", 0.01), return_projection = FALSE, return_dissimilarity = FALSE, ws = NULL, center = TRUE, scale = FALSE, documentation = character(), ...)Arguments
Xr | a matrix of reference (spectral) observations where the neighborsearch is to be conducted. See details. |
Xu | an optional matrix of (spectral) observations for which itsneighbors are to be searched in |
diss_method | a character string indicating the spectral dissimilarity metricto be used in the selection of the nearest neighbors of each observation.
|
Yr | a numeric matrix of
|
k | an integer value indicating the k-nearest neighbors of eachobservation in |
k_diss | an integer value indicating a dissimilarity treshold.For each observation in |
k_range | an integer vector of length 2 which specifies the minimum(first value) and the maximum (second value) number of neighbors to beretained when the |
spike | a vector of integers (with positive and/or negative values)indicating what observations in |
pc_selection | a list of length 2 to be passed onto the
The default is Optionally, the |
return_projection | a logical indicating if the projection(s) must bereturned. Projections are used if the |
return_dissimilarity | a logical indicating if the dissimilarity matrixused for neighbor search must be returned. |
ws | an odd integer value which specifies the window size, when |
center | a logical indicating if the |
scale | a logical indicating if the |
documentation | an optional character string that can be used todescribe anything related to the |
... | further arguments to be passed to the |
Details
This function may be specially useful when the reference set (Xr) isvery large. In some cases the number of observations in the reference setcan be reduced by removing irrelevant observations (i.e. observations that are notneighbors of a particular target set). For example, this fucntion can beused to reduce the size of the reference set before before running thembl function.
This function uses thedissimilarity fucntion to compute thedissimilarities betweenXr andXu. Arguments todissimilarity as well as further arguments to the functionsused insidedissimilarity (i.e.ortho_disscor_dissf_disssid) can be passed tothose functions as additional arguments (i.e....).
If no matrix is passed toXu, the neighbor search is conducted for theobservations inXr that are found whiting that matrix. If a matrix ispassed toXu, the neighbors ofXu are searched in theXrmatrix.
Value
alist containing the following elements:
neighbors_diss: a matrix of theXrdissimilarity scorescorresponding to the neighbors of eachXrobservation (orXuobservation, in caseXuwas supplied).The neighbor dissimilarity scores are organized by columns and are sortedin ascending order.neighbors: a matrix of theXrindices corresponding tothe neighbors of each observation inXu. The neighbor indices areorganized by columns and are sorted in ascending order by theirdissimilarity score.unique_neighbors: a vector of the indices inXridentified as neighbors of any observation inXr(or inXu,in case it was supplied). This is obtained byconverting theneighborsmatrix into a vector and applying theuniquefunction.k_diss_info: adata.tablethat is returned only if thek_dissargument was used. It comprises three columns, the first one(Xr_indexorXu_index) indicates the index of the observationsinXr(or inXu, in case it was suppplied),the second column (n_k) indicates the number of neighbors found inXrand the third column (final_n_k) indicates the final numberof neighbors selected bounded byk_range.argument.dissimilarity: Ifreturn_dissimilarity = TRUEthedissimilarity object used (as computed by thedissimilarityfunction.projection: anortho_projectionobject. Only output ifreturn_projection = TRUEand ifdiss_method = "pca",diss_method = "pca.nipals"ordiss_method = "pls".
This object contains the projection used to computethe dissimilarity matrix. In case of local dissimilarity matrices,the projection corresponds to the global projection used to select theneighborhoods. (seeortho_dissfunction for furtherdetails).
Author(s)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R.,Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-searchmetrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
See Also
dissimilarityortho_disscor_dissf_disssidmbl
Examples
library(prospectr)data(NIRsoil)Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]Xu <- Xu[!is.na(Yu), ]Yu <- Yu[!is.na(Yu)]Xr <- Xr[!is.na(Yr), ]Yr <- Yr[!is.na(Yr)]# Identify the neighbor observations using the correlation dissimilarity and# default parameters# (In this example all the observations in Xr belong at least to the# first 100 neighbors of one observation in Xu)ex1 <- search_neighbors( Xr = Xr, Xu = Xu, diss_method = "cor", k = 40)# Identify the neighbor observations using principal component (PC)# and partial least squares (PLS) dissimilarities, and using the "opc"# approach for selecting the number of componentsex2 <- search_neighbors( Xr = Xr, Xu = Xu, diss_method = "pca", Yr = Yr, k = 50, pc_selection = list("opc", 40), scale = TRUE)# Observations that do not belong to any neighborhoodseq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex2$unique_neighbors]ex3 <- search_neighbors( Xr = Xr, Xu = Xu, diss_method = "pls", Yr = Yr, k = 50, pc_selection = list("opc", 40), scale = TRUE)# Observations that do not belong to any neighborhoodseq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex3$unique_neighbors]# Identify the neighbor observations using local PC dissimialrities# Here, 150 neighbors are used to compute a local dissimilarity matrix# and then this matrix is used to select 50 neighborsex4 <- search_neighbors( Xr = Xr, Xu = Xu, diss_method = "pls", Yr = Yr, k = 50, pc_selection = list("opc", 40), scale = TRUE, .local = TRUE, pre_k = 150)A function for computing the spectral information divergence betweenspectra (sid)
Description
This function computes the spectral information divergence/dissimilarity betweenspectra based on the kullback-leibler divergence algorithm (see details).
Usage
sid(Xr, Xu = NULL, mode = "density", center = FALSE, scale = FALSE, kernel = "gaussian", n = if(mode == "density") round(0.5 * ncol(Xr)), bw = "nrd0", reg = 1e-04, ...)Arguments
Xr | a matrix containing the spectral (reference) data. |
Xu | an optional matrix containing the spectral data of a second set ofobservations. |
mode | the method to be used for computing the spectral informationdivergence. Options are |
center | a logical indicating if the computations must be carried out onthe centred |
scale | a logical indicating if the computations must be carried out onthe variance scaled |
kernel | if |
n | if |
bw | if |
reg | a numerical value larger than 0 which indicates a regularizationparameter. Values (probabilities) below this threshold are replaced by thisvalue for numerical stability. Default is 1e-4. |
... | additional arguments to be passed to the |
Details
This function computes the spectral information divergence (distance)between spectra.Whenmode = "density", the function first computes the probabilitydistribution of each spectrum which result in a matrix of densitydistribution estimates. The density distributions of all the observations inthe datasets are compared based on the kullback-leibler divergence algorithm.Whenmode = "feature", the kullback-leibler divergence between allthe observations is computed directly on the spectral variables.The spectral information divergence (SID) algorithm (Chang, 2000) uses theKullback-Leibler divergence (\(KL\)) or relative entropy(Kullback and Leibler, 1951) to account for the vis-NIR information providedby each spectrum. The SID between two spectra (\(x_{i}\) and\(x_{j}\)) is computed as follows:
where \(k\) represents the number of variables or spectral features,\(p\) and \(q\) are the probability vectors of \(x_{i}\) and\(x_{i}\) respectively which are calculated as:
\[p = \frac{x_i}{\sum_{l=1}^{k} x_{i,l}}\]\[q = \frac{x_j}{\sum_{l=1}^{k} x_{j,l}}\]From the above equations it can be seen that the original SID algorithmassumes that all the components in the data matrices are nonnegative.Therefore centering cannot be applied whenmode = "feature". If adata matrix with negative values is provided andmode = "feature",thesid function automatically scales the matrix as follows:
or
\[X_{s} = \frac{X-min(X, Xu)}{max(X, Xu)-min(X, Xu)}\]\[Xu_{s} = \frac{Xu-min(X, Xu)}{max(X, Xu)-min(X, Xu)}\]ifXu is specified. The 0 values are replaced by a regularizationparameter (reg argument) for numerical stability.The default of thesid function is to compute the SID based on thedensity distributions of the spectra (mode = "density"). For eachspectrum inX the density distribution is computed using thedensity function of thestats package.The 0 values of the estimated density distributions of the spectra arereplaced by a regularization parameter ("reg" argument) for numericalstability. Finally the divergence between the computed spectral histogramasis computed using the SID algorithm. Note that ifmode = "density",thesid function will accept negative values and matrix centeringwill be possible.
Value
alist with the following components:
sid: if only"X"is specified (i.e.Xu = NULL),a square symmetric matrix of SID distances between all the components in"X". If both"X"and"Xu"are specified, a matrixof SID distances between the components in"X"and the componentsin"Xu") where the rows represent the objects in"X"and thecolumns represent the objects in"Xu"Xr: the (centered and/or scaled if specified) spectralXmatrixXu: the (centered and/or scaled if specified) spectralXumatrixdensityDisXr: ifmode = "density", the computeddensity distributions ofXrdensityDisXu: ifmode = "density", the computeddensity distributions ofXu
Author(s)
Leonardo Ramirez-Lopez
References
Chang, C.I. 2000. An information theoretic-based approach tospectral variability, similarity and discriminability for hyperspectralimage analysis. IEEE Transactions on Information Theory 46, 1927-1932.
See Also
Examples
library(prospectr)data(NIRsoil)Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]Xu <- Xu[!is.na(Yu), ]Xr <- Xr[!is.na(Yr), ]# Example 1# Compute the SID distance between all the observations in Xrxr_sid <- sid(Xr)xr_sid# Example 2# Compute the SID distance between the observations in Xr and the observations# in Xuxr_xu_sid <- sid(Xr, Xu)xr_xu_sidA function for evaluating dissimilarity matrices (sim_eval)
Description
This function searches for the most similar observation (closest neighbor) ofeach observation in a given dataset based on a dissimilarity (e.g. distancematrix). The observations are compared against their corresponding closestobservations in terms of their side information provided. The root meansquare of differences and the correlation coefficient are used for continuousvariables and for discrete variables the kappa index is used.
Usage
sim_eval(d, side_info)Arguments
d | a symmetric matrix of dissimilarity scores between observations ofa given dataset. Alternatively, a vector of with the dissimilarityscores of the lower triangle (without the diagonal values) can be used(see details). |
side_info | a matrix containing the side information corresponding tothe observations in the dataset from which the dissimilarity matrix wascomputed. It can be either a numeric matrix with one or multiplecolumns/variables or a matrix with one character variable (discrete variable).If it is numeric, the root mean square of differences is used for assessingthe similarity between the observations and their corresponding most similarobservations in terms of the side information provided. If it is a charactervariable, then the kappa index is used. See details. |
Details
For the evaluation of dissimilarity matrices this function uses sideinformation (information about one variable which is available for agroup of observations, Ramirez-Lopez et al., 2013). It is assumed that thereis a (direct or indirect) correlation between this side informative variableand the variables from which the dissimilarity was computed.Ifside_info is numeric, the root mean square of differences (RMSD)is used for assessing the similarity between the observations and theircorresponding most similar observations in terms of the side informationprovided. It is computed as follows:
where \(NN(xr_i, Xr^{-i})\) represents a function toobtain the index of the nearest neighbor observation found in \(Xr\)(excluding the \(i\)th observation) for \(xr_i\),\(y_{i}\) is the value of the side variable of the \(i\)thobservation, \(y_{j(i)}\) is the value of the side variable ofthe nearest neighbor of the \(i\)th observation and \(m\) isthe total number of observations.
Ifside_info is a factor the kappa index (\(\kappa\)) isused instead the RMSD. It is computed as follows:
where both \(p_o\) and \(p_e\) are two different agreementindices between the the side information of the observations and the sideinformation of their corresponding nearest observations (i.e. most similarobservations). While \(p_o\) is the relative agreement\(p_e\) is the the agreement expected by chance.
This functions accepts vectors to be passed to argumentd, in thiscase, the vector must represent the lower triangle of a dissimilarity matrix(e.g. as returned by thestats::dist() function ofstats).
Value
sim_eval returns a list with the following components:
"
eval: either the RMSD (and the correlation coefficient) orthe kappa indexfirst_nn: a matrix containing the original sideinformative variable in the first half of the columns, and the sideinformative values of the corresponding nearest neighbors in the second halfof the columns.
Author(s)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M.,Scholten, T. 2013a. The spectrum-based learner: A new local approach formodeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R.,Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-searchmetrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
Examples
library(prospectr)data(NIRsoil)sg <- savitzkyGolay(NIRsoil$spc, p = 3, w = 11, m = 0)# Replace the original spectra with the filtered onesNIRsoil$spc <- sgYr <- NIRsoil$Nt[as.logical(NIRsoil$train)]Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]# Example 1# Compute a principal components distancepca_d <- ortho_diss(Xr, pc_selection = list("manual", 8))$dissimilarity# Example 1.1# Evaluate the distance matrix on the baisis of the# side information (Yr) associated with Xrse <- sim_eval(pca_d, side_info = as.matrix(Yr))# The final evaluation resultsse$eval# The final values of the side information (Yr) and the values of# the side information corresponding to the first nearest neighbors# found by using the distance matrixse$first_nn# Example 1.2# Evaluate the distance matrix on the basis of two side# information (Yr and Yr2)# variables associated with XrYr_2 <- NIRsoil$CEC[as.logical(NIRsoil$train)]se_2 <- sim_eval(d = pca_d, side_info = cbind(Yr, Yr_2))# The final evaluation resultsse_2$eval# The final values of the side information variables and the values# of the side information variables corresponding to the first# nearest neighbors found by using the distance matrixse_2$first_nn# Example 2# Evaluate the distances produced by retaining different number of# principal components (this is the same principle used in the# optimized principal components approach ("opc"))# first project the datapca_2 <- ortho_projection(Xr, pc_selection = list("manual", 30))results <- matrix(NA, pca_2$n_components, 3)colnames(results) <- c("pcs", "rmsd", "r")results[, 1] <- 1:pca_2$n_componentsfor (i in 1:pca_2$n_components) { ith_d <- f_diss(pca_2$scores[, 1:i, drop = FALSE], scale = TRUE) ith_eval <- sim_eval(ith_d, side_info = as.matrix(Yr)) results[i, 2:3] <- as.vector(ith_eval$eval)}plot(results)# Example 3# Example 3.1# Evaluate a dissimilarity matrix computed using the correlation# methodcd <- cor_diss(Xr)eval_corr_diss <- sim_eval(cd, side_info = as.matrix(Yr))eval_corr_diss$evalSquare root of (square) symmetric matrices
Description
For internal use only
Usage
sqrt_sm(X, method = c("svd", "eigen"))A function to compute row-wise index of minimum values of a square distance matrix
Description
For internal use only
Usage
which_min(X)Arguments
X | a square matrix of distances |
Details
Used internally to find the nearest neighbors
Value
a vector of the indices of the minimum value in each row of the input matrix
Author(s)
Antoine Stevens
A function to compute indices of minimum values of a distance vector
Description
For internal use only
Usage
which_min_vector(X)Arguments
X | a vector of distances |
Details
Used internally to find the nearest neighbors.It searches in lower (or upper) triangular matrix. Therefore this must be the format of theinput data. The piece of code intlen = (sqrt(X.size()*8+1)+1)/2 generated an error in CRANsincesqrt cannot be applied to integers.
Value
a vector of the indices of the nearest neighbors
Author(s)
Antoine Stevens