| Maintainer: | Martin Maechler |
| Contact: | Martin.Maechler at R-project.org |
| Version: | 2023-07-01 |
| URL: | https://CRAN.R-project.org/view=Robust |
| Source: | https://github.com/cran-task-views/Robust/ |
| Contributions: | Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see theContributing guide. |
| Citation: | Martin Maechler (2023). CRAN Task View: Robust Statistical Methods. Version 2023-07-01. URL https://CRAN.R-project.org/view=Robust. |
| Installation: | The packages from this task view can be installed automatically using thectv package. For example,ctv::install.views("Robust", coreOnly = TRUE) installs all the core packages orctv::update.views("Robust") installs all packages that are not yet installed and up-to-date. See theCRAN Task View Initiative for more details. |
Robust (or “resistant”) methods for statistics modelling have been available in S from the very beginning in the 1980s; and then in R in packagestats. Examples aremedian(),mean(*, trim =. ),mad(),IQR(), or alsofivenum(), the statistic behindboxplot() in packagegraphics) orlowess() (andloess()) for robust nonparametric regression, which had been complemented byrunmed() in 2003. Much further important functionality has been made available in recommended (and hence present in all R versions) packageMASS (by Bill Venables and Brian Ripley, seethe bookModern Applied Statistics with S). Most importantly, they providerlm() for robust regression andcov.rob() for robust multivariate scatter and covariance.
This task view is about R add-on packages providing newer or faster, more efficient algorithms and notably for (robustification of) new models.
Please send suggestions for additions and extensions via e-mail to the maintainer or submit an issue or pull request in the GitHub repository linked above.
An international group of scientists working in the field of robust statistics has made efforts (since October 2005) to coordinate several of the scattered developments and make the important ones available through a set of R packages complementing each other. These should build on a basic package with “Essentials”, coinedrobustbase with (potentially many) other packages building on top and extending the essential functionality to particular models or applications. Since 2020 and the 2nd edition ofRobust Statistics: Theory and Methods ,RobStatTM covers its estimators and examples, notably by importing fromrobustbase andrrcov. Further, there is the quite comprehensive packagerobust, a version of the robust library of S-PLUS, as an R package now GPLicensed thanks to Insightful and Kjell Konis. Originally, there has been much overlap betweenrobustbase androbust, nowrobustdepends onrobustbase andrrcov, whererobust provides convenient routines for the casual user whilerobustbase andrrcov contain the underlying functionality, and provide the more advanced statistician with a large range of options for robust modeling.
We structure the packages roughly into the following topics, and typically will first mention functionality in packagesrobustbase,rrcov androbust.
Linear Regression:lmrob() (robustbase) andlmRob() (robust) where the former uses the latest of the fast-S algorithms and heteroscedasticity and autocorrelation corrected (HAC) standard errors, the latter makes use of the M-S algorithm of Maronna and Yohai (2000), automatically when there are factors among the predictors (where S-estimators (and hence MM-estimators) based on resampling typically badly fail). TheltsReg() andlmrob.S() functions are available inrobustbase, but rather for comparison purposes.rlm() fromMASS had been the first widely available implementation for robust linear models, and also one of the very first MM-estimation implementations.robustreg provides very simple M-estimates for linear regression (in pure R). Note that Koenker’s quantile regression packagequantreg contains L1 (aka LAD, least absolute deviations)-regression as a special case, doing so also for nonparametric regression via splines. Packagemblm’s functionmblm() fits median-based (Theil-Sen or Siegel’s repeated) simple linear models.
Note that a location (and scale) model is a regression with only an intercept and may be approached by e.g.,lmrob(y ~ 1). For very small samples, locationrobLoc() and scalerobScale() are also provided byrevss.
Generalized Linear Models (GLM s) for Regression:
GLMs are provided both viaglmrob() (robustbase) andglmRob() (robust).drgee fits “Doubly Robust” Generalized Estimating Equations (GEEs),complmrob does robust linear regression with compositional data as covariates.
Generalized Smooth/Additive (GAM-like) Regression:
PackageGJRM’sgamlss() function with optiongamlss(*, robust = TRUE) allows fitting many model families robustly (wrapped inside the LSS “location-scale-shape” transformation scope).
Nonlinear / Smooth (Nonparametric Function) Regression:
Robust Nonlinear model fitting is available throughrobustbase’snlrob().
Mixed-Effects (Linear and Nonlinear) Regression:
Quantile regression (and hence L1 or LAD) for mixed effect models, is available in packagelqmm. Rank-based mixed effect fitting from packagerlme, whereas anMM-like approach for robust linearmixed effects modeling is available from packagerobustlmm. More recently,skewlmm provides robust linear mixed-effects modelsLMM via scale mixtures of skew-normal distributions.
Depends) onrobustbase provides nice S4 class based methods, more methods for robust multivariate variance-covariance estimation, and adds robust PCA methodology.NA) data, and byrrcovHD, providing robust multivariate methods forHigh Dimensional data.princomp(), e.g.,X <- stackloss; pc.rob <- princomp(X, covmat= MASS::cov.rob(X))covMcd() thanrobust’sfastmcd(), and similarly forcovOGK(). On the other hand,robust’scovRob() has automatically chosen methods, notablypairwiseQC() for large dimensionality p. PackagerobustX for experimental, or other not yet established procedures, containsBACON() andcovNCC(), the latter providing the neighbor variance estimation (NNVE) method of Wang and Raftery (2002), also available (slightly less optimized) incovRobust.FastQn().pam() implementing “partioning around medians” is partly robust (medians instead of very unrobust k-means) but isnot good enough, as e.g., the k clusters could consist of k-1 outliers one cluster for the bulk of the remaining data.BACON() (inrobustX) should be applicable for larger (n,p) than traditional robust covariance based outlier detectors.boxplot.stats(), etc mentioned aboverunmed() providesmost robust running median filtering.vcov(lmrob()) also uses a version of HAC standard errors for its robustly estimated linear models. See also the CRAN task viewEconometrics