
The optRF package provides tools for optimizing the number of treesin a random forest to improve model stability and reproducibility. Sincerandom forest is a non-deterministic method, variable importance andprediction results can vary between runs. The optRF package estimatesthe stability of random forest based on the number of trees and helpsusers determine the optimal number of trees required for reliablepredictions and variable selection.
To install the optRF R package from CRAN, just run
install.packages("optRF")R version >= 3.6 is required.
You can install the development version of optRF fromGitHub usingdevtools with:
devtools::install_github("tmlange/optRF")The optRF package includes theSNPdata data set fordemonstration purposes. The two main functions are:
opt_prediction – Finds the optimal number of trees forstable predictions.opt_importance – Finds the optimal number of trees forstable variable importance estimates.library(optRF)# Load example data setdata(SNPdata)# Optimise random forest for predicting the first column in SNPdataresult_optpred=opt_prediction(y = SNPdata[,1],X=SNPdata[,-1])summary(result_optpred)# Optimise random forest for calculating variable importanceresult_optimp=opt_importance(y = SNPdata[,1],X=SNPdata[,-1])summary(result_optimp)For detailed examples and explanations, refer to the packagevignettes:
optRF – General package overviewopt_prediction – Optimizing random forestpredictionsopt_importance – Optimizing random forest variableimportance estimationIf you use optRF in your research, please cite:
Lange, T.M.,Gültas, M., Schmitt, A.O. & Heinrich, F. optRF: Optimising randomforest stability by determining the optimal number of trees. BMCBioinformatics 26, 95 (2025).