Added the functionrf_importance(). It fits models withand without each predictor, compares them via spatial cross validationwithrf_evaluate(), and returns the increase/decrease inperformance when a given variable is included in the model.
The default random seed for all functions have changed fromNULL to1 to facilitate reproducibility.
The functionrf_evaluate() has a new argument namedgrow.testing.folds. When set toTRUE, it uses1 - training.fraction instead oftraining.fraction to grow the spatial folds, and then flipsthe names of the training and testing folds. As a result, the testingfolds are generally surrounded by the training folds (just the oppositeof the default behavior of the function), which might be beneficial forparticular spatial structures of the training data. Thanks toAleksandra Kulawska for the suggestion!
Overhaul of the methods used for parallelization. The functionsrf_spatial(),rf_repeat(),rf_evaluate(),rf_tuning(),rf_compare(), andrf_interactions() can nowaccept a cluster definition generated withparallel::makeCluster() via theclusterargument. Also, models resulting from these functions andrf() carry the cluster definition with themselves in theslotmodel$cluster, so the cluster definition can be passedfrom function to function using a pipe, as shown below:
library(spatialRF)library(magrittr)#loading the example datadata(plant_richness_df)data("distance_matrix")xy <- plant_richness_df[, c("x", "y")]dependent.variable.name <- "richness_species_vascular"predictor.variable.names <- colnames(plant_richness_df)[5:21]#creating clustermy.cluster <- parallel::makeCluster( 4, type = "PSOCK")#registering cluster (rf functions register it anyway)doParallel::registerDoParallel(cl = cluster) #fitting model m <- rf( data = plant_richness_df, dependent.variable.name = dependent.variable.name, predictor.variable.names = predictor.variable.names, distance.matrix = distance_matrix, xy = xy, cluster = my.cluster ) %>% rf_spatial() %>% rf_tuning() %>% rf_evaluate() %>% rf_repeat()#stopping clusterparallel::stopCluster(cl = my.cluster)The system works as follows: Ifcluster is notNULL andmodel is provided, the function looksinto the model. If there is a cluster definition there, it is used toparallelize computations, but the cluster is not stopped within thefunction. If there is not a cluster inmodel, then thefunction falls back to the argumentn.cores to generate acluster that is stopped when the function ends its operations.
These changes should improve performance when working with severalfunctions in the same script, becuase these functions do not have towaste time in generating their own clusters.
The functionrf_interactions() is now namedthe_feature_engineer().
The functioncluster_definition() is now namedbeowulf_cluster(), and returns a cluster instead of acluster definition to be used as input forparallel::makeCluster().
rf_repeat() now generates a proper “importance” slot for modelsfitted with rf_spatial(), and preserves the “evaluation” and “tuning”slots if they exist.
Simplified rf_spatial() by removing options to generate anrf_repeat() model on the fly. rf_repeat() should only be used now at theend of a workflow, as described in the documentation.
Fixed issue with the area of the violin plots generated byplot_importance().
Improved the function rf_interactions() with a new type ofinteraction (first factor of a PCA between two predictors), addedcriteria to reduce multicollinearity among interactions, and betweeninteractions and predictors, and now the function returns data helpfulto fit models right away.
Added new residuals diagnostics with the functionsresiduals_diagnostics() and plot_residuals_diagnostics(). This changedthe name of the slot “spatial.autocorrelation.residuals” to “residuals”,that now stores all the information relative to the residuals.
All plotting functions now allow to change the color of their keycomponents.
Changed the names of function arguments from ‘x’ to ‘model’ or‘distance.matrix’ for consistency. This might break code writtenpreviously, but I hope argument names are more self-explanatory now.
The function rf_spatial() now fits a non-spatial model first, andonly generates spatial predictors for these distance.thresholds thatshow positive spatial autocorrelation.
Added a new function named filter_spatial_predictors(), that removesredundant spatial predictors within rf_spatial(). It shouldn’t lead tochanges in the spatial models fitted with previous versions, but it willmake them more parsimonious.
Changed the style of the package’s boxplots.
When using rf_repeat(), the median of the variable importance scores,performance scores, and Moran’s I is reported, instead of the mean.
Added the functions plot_training_data() andplot_moran_training_data() to help explore the training data prior tomodeling.
Also fixed an issue where response variables could be identified asbinary by mistake.
A bug regarding the predictions generated byrf() thataffected every other function fitting models has been fixed. Previously,the model predictions came from the “predictions” slot produced byranger(). Such predictions are produced from the out-of-bagdata during model training, and are different and lead to lower Rsquared values than those produced with predict(). Now the predictionsyielded by rf() are generated with predict(), and therefore you mightnotice that now models fitted with spatialRF functions perform betterthan before, because they do.
The functionprint_evaluation() does not use huxtableany longer to print the evaluation results, and only shows the resultsof the testing model.
Added support for binary data (0 and 1). The functionrf() now tests if the data is binary, and if so, itpopulates thecase.weights argument ofrangerwith the new functioncase_weights() to minimize the sideeffects of unbalanced data.
Fixed an issue where rf() applied the wrong is.numeric check to theresponse variable and the predictors that caused issues withtibbles.
Removed the function scale_robust() from rf(), and replaced it withscale(). It was giving more troubles than benefits.
Simplified rf_spatial().
Modified rf_tuning() to better tune models fitted withrf_spatial().
Minor fixes in several other functions.
All ‘sf’ dependencies removed from the package.