Movatterモバイル変換

NEWS	R Documentation

News for Packagecaret

Changes in version 7.0-1

CRAN mandated update.
caret will be 20 years old in March of 2026. The package is currently in maintenance mode; the author will fix bugs and make CRAN releases as needed, but there will not be any major features in the package. It will stay on CRAN long-term; it's not going away.

Changes in version 6.0-94

Bug fix in how some S3 signatures were designed (for R-devel).
Adrián Panella fixed a bug with 'method = "svmRadial" that occured when the SVM probability model failed(issue 1327)
Theodore Pak fixed a bug in glmnet prediction when sparse matrices are used(issue 1315)
Carlos Cinelli fixed two bugs (one for GAMs and another for ranger models) where an error would occur when usingtrainControl(method = "none")(issue 1307)(issue 1308)

Changes in version 6.0-93

CRAN required changes for legacy S typedef Sint.
Xabriel J. Collazo Mojica added check for class probabilities when PR curves are requested(issue 1274).

Changes in version 6.0-92

Small maintenance release with required C changes for R 4.2.

Changes in version 6.0-91

Small maintenance release with required changes for R 4.2.

Changes in version 6.0-90

Aptype element was added totrain objects that records the trianing set predictor columns and their types using a zero-row slice.
Updated the internal object containing the subsampling information. The old package dependencies were being used.

Changes in version 6.0-89

SMOTE subsampling is now computed via thethemis package(issue 1226).
For SA feature selection, a better warning is given when there are too few iterations for computing differences(issue 1247).
Better error message when pacakges are missing(issue 1246).
e1071 was promoted from Suggests to Imports(issue 1238).

Changes in version 6.0-88

Fixed cases where the "corr" filter was not run inpreProcess().
Prediction type bug for Poisson glm model was fixed(issue 1231).
Fixed aPreProcess() bug related to a single PCA component(issue 1181).
Fixed random forest bugs related torfe()(issue #1077),(issue 1062).
RuleFit was added via thepre package(issue 1218).
Bugs were fixed where MAE was not treated as a minimization metric(issue 1224).

Changes in version 6.0-87

TheordinalNet model was given an additional tuning parametermodeltype.

Changes in version 6.0-86

Small release forstringsAsFactors = TRUE in R-4.0

Changes in version 6.0-85

Internal changes required by r-devel for new matrix class structure.
Michael Mayer contributed a faster version ofgroupKFold().(issue 1108)
A typo in a variable name was fixed.(issue 1087)
Removed some warnings related to contrasts.
Temporarily moved ROC calculations back to thepROC package related toJackStat/ModelMetrics#30.(issue 1105)

Changes in version 6.0-84

Another new version was related to character encodings.

Changes in version 6.0-83

A new version was requested by CRAN since en dashes were used in the documentation.
A bug was fixed where, for some recipes that involve class imbalance sampling, the resampling indicies were computed incorrectly(issue 1030).
train now removes duplicate models in the tuning grid. Duplicates could occur for models with discrete parameters.

Changes in version 6.0-82

Immediate and required updates related to the different behavior ofsample in R >= 3.6.
sbf,gafs, andsafs now accept recipes as inputs. A few sections of documentation were added to thebookdown site
A bug was fixed in simulated annealing feature selection where the number of variables perturbed was based on the total number of variables instead of the more appropriate number of variables in the current subset.
Convenience functionsggplot.gafs andggplot.safs were added.
learning_curve_dat now has a real name.
Theearth andctree models were udpdate to fix bugs(issue 1022)(issue 1018).
When a model has the same value for its resamples,plot.resamples andggplot.resamples now produce an estimate and missing values for the intervals (instead of failing)(issue 1007)

Changes in version 6.0-81

Theblackboost code gained a dependency onpartykit due to changes inmboost.
The internals were updated to work with the latest version of therecipes package.
Jason Muhlenkamp added better error messages for misspecified tuning parameters(issue 956).
Two bugs in random forests with RFE were fixed in(issue 942).
When correlation filters are used inpreProcess, constant (i.e. zero-variance) columns are first removed to avoid errors(issue 966).
A bug was fixed whentrain models using weights were updated(issue 935).
Benjamin Allévius added more statistics to the output ofthresholder(issue 938).
A bug was fixed that occurred whenindexFinal was used where the recipe that was saved was created using the entire training set(issue 928).

Changes in version 6.0-80

Two bugs associated withvarImp inlm(issue 858) and inbartMachine were fixed byhadjipantelis.
SMOTE sampling now works with tibbles(issue 875)
rpart now a dependency due to new CRAN policies.
Added aggplot method forresamples that produces confidence intervals.
hadjipantelis added some fixes formxnet models(issue 887).

Changes in version 6.0-79

keras andmxnet models have better initialization of parameterspr 765
The range preprocessing method can scale the data to an arbitrary range. Thanks to Sergey Korop.
The spatial sign transformation will now operation on all non-missing predictors. Thanks to Markus Peter Auer(issue 789).
A variety of small changes were made to work with the new version of thegam package(issue 828).
The package vignette was changed to HTML.
A big was fixed for computing variable importance scores with the various PLS methods(issue 848).
Fixed adrop = FALSE bug that occurred when computing class probabilities(issue 849).
An issue with predicting probabilities withmultinom and one observation was fixed(issue 827).
A bug in the threshold calculation for choosing the number of PCA components was resolved(issue 825).
ModelsmlpML andmlpWeightDecayML now ignore layers with zero units. For example, if the number of layers was specified to bec(5, 0, 3) a warning is issued and the architecture is converted toc(5, 3)(issue 829).
svmLinearWeights2 andsvmLinear3 may have chosen the incorrect SVM loss function. This was found by Dirk Neumann(issue 826)
bnclassify modelsawtan andawnb were updated since they previously used deprecated functions. Allbnclassify models now require version 0.3.3 of that package or greater(issue 815).
confusionMatrix.default not requiresdata andreference to be factors and will throw an error otherwise. Previously, the vectors were converted to factors but this resulted in too many bug reports and misuse.
xyplot.resample did not pass the dots to the underlying plot function(issue 853).
A bug with modelxgbDART was fixed byhadjipantelis.

Changes in version 6.0-78

A number of changes were made to the underlying model code to repair problems caused by the previous version. In essence, unless the modeling package was formally loaded, the model code would fail in some cases. In the vast majority of cases,train will not load the package (but will load the namespace). There are some exceptions where this is not possible, includingbam,earth,gam,gamLoess,gamSpline,logicBag,ORFlog,ORFpls,ORFridge,ORFsvm,plsRglm,RSimca,rrlda,spikeslab, and others. These are noted in?models and in the model code itself. The regression tests now catch these issues.
The option to control the minimum node size to modelsranger andRborist was added byhadjipantelis(issue 732).
The rule-based modelGFS.GCCL was removed from the model library.
A bug was fixed affecting models using thesparsediscrim package (i.e.dda andrlda) where the class probability values were reversed.(issue 761).
Thekeras models now clear the session prior to each model fit to avoid problems. Also, on the last fit, the model is serialized so that it can be used between sessions. Thepredict code will automatically undo this encoding so that the user does not have to manually intervene.
A bug intwoClassSummary was fixed that prevents failure when the class level includes "y"(issue 770).
ThepreProcess function can now scale variables to a range where the user can set the high and low values(issue 730). Thanks to Sergey Korop.
Erwan Le Pennec fixed some issues whentrain was run using some parallel processing backends (e.g.doFuture anddoAzureParallel)(issue 748).
Waleed Muhanna found and fixed a bug intwoClassSim when irrelevant variables were generated.(issue 744).
hadjipantelis added the DART model (aka "Dropouts meet Multiple Additive Regression Trees") with the model codexgbDART(issue 742).
Vadim Khotilovich updatedpredict.dummyVars to run faster with large datasets with many factors(issue 727).
spatialSign now has the option of removing missing data prior to computing the norm(issue 789).
The variousearth models have been updated to work with recent versions of that package, including multi-classglm models(issue 779).

Changes in version 6.0-77

Two neural network models (containing up to three hidden layers) usingmxnet were added;mxnet (optimiser: SGD) andmxnetAdam (optimiser: ADAM).
A new method was added fortrain so thatrecipes can be used to specify the model terms and preprocessing. Alexis Sardá provided a great deal of help converting the bootstrap optimism code to the new workflows. A new chapter was added to the package website related to recipes.
The Yeo-Johnson transformation parameter estimation code was rewritten and not longer requires thecar package.
The leave-one-out cross-validation workflow fortrain has been harmonized with the other resampling methods in terms of fault tolerance and prediction trimming.
train now uses different random numbers to make resamples. Previously, setting the seed prior to callingtrain should result in getting the same resamples. However, iftrain loaded or imported a namespace from another package, and that startup process used random numbers, it could lead to different random numbers being used. See(issue 452) for details. Now,train gets a separate (and more reproducible) seed that will be used to generate the resamples. However, this may effect random number reproducibility between this version and previous versions. Otherwise, this change should increase the reproducibility of results.
Erwan Le Pennec conducted the herculean task of modifying all of the model code to call by namespace (instead of fully loading each required package). This should reduce naming conflicts(issue 701).
MAE was added as output metric for regression tasks throughpostResample anddefaultSummary by hadjipantelis. The function is now exposed to the users.(issue 657).
More average precision/recall statistics were added tomultiClassSummary(issue 697).
The package website code was updated to use version 4 of the D3 JS library and now usesheatmaply to make the interactive heatmap.
Added aggplot method for lift objects (and fixed a bug in thelattice version of the code) for(issue 656).
Vadim Khotilovich made a change to speed uppredict.dummyVars(issue 727).
The model code forordinalNet was updated for recent changes to that package.
oblique.tree was removed from the model library.
The default grid generation for rotation forest models now provides better values ofK.
The parameter ranges forAdaBag andAdaBoost.M1 were changed; the number of iterations in the default grids have been lowered.
Switched to non-formula interface in ranger. Also, another tuning parameter was added to ranger (splitrule) that can be used to change the splitting procedure and includes extremely randomized trees. This requires version 0.8.0 of theranger package.(issue 581)
A simple "null model" was added. For classification, it predictors using the most prevalent level and, for regression, fits an intercept only model.(issue 694)
A functionthresholder was added to analyze the resample results for two class problems to choose an appropriate probability cutoff a lahttps://topepo.github.io/caret//using-your-own-model-in-train.html#Illustration5 (issue 224).
Two neural network models (containing a single hidden layers) usingtensorflow/keras were added.mlpKerasDecay uses standard weight decay whilemlpKerasDropout uses dropout for regularization. Both use RMSProp optimizer and have a lot of tuning parameters. Two additional models,mlpKerasDecayCost andmlpKerasDropoutCost, are classification only and perform cost-sensitive learning. Note that these models will not run in parallel usingcaret's parallelism and also will not give reproducible results from run-to-run (seehttps://github.com/rstudio/keras3/issues/42).
The range for one parameter (gamma) was modified in themlpSGD model code.
A bug in classification models with all missing predictions was fixed (found by andzandz11).(issue 684)
A bug preventing preprocessing to work properly when the preprocessing transformations are related to individual columns only fixed by Mateusz Kobos in(issue 679).
A prediction bug inglm.nb that was found by jpclemens0 was fixed(issue 688).
A bug was fixed in Self-Organizing Maps viaxyf for regression models.
A bug was fixed inrpartCost related to how the tuning parameter grid was processed.
A bug in negative-binomial GLM models (found by jpclemens0) was fixed(issue 688).
IntrainControl, ifrepeats is used on methods other than"repeatedcv" or"adaptive_cv", a warning is issued. Also, for method other than these two, a new default (NA) is given torepeats.(issue 720).
rfFuncs now computes importance on the first and last model fit.(issue 723)

Changes in version 6.0-76

Monotone multi-layer perceptron neural network models from themonmlp package were added(issue 489).
A new resampling function (groupKFold) was added(issue 540).
The bootstrap optimism estimate was added by Alexis Sarda(issue 544).
Bugs inglm,glm.nb, andlm variable importance methods that occur when a single variable is in the model(issue 543).
A bug infilterVarImp was fixed where the ROC curve AUC could be much less than 0.50 because the directionality of the predictor was not taken into account. This will artificially increase the importance of some non-informative predictors. However, the bug might report the AUC for an important predictor to be 0.20 instead of 0.80(issue 565).
multiClassSummary now reports the average F score(issue 566).
TheRMSE andR2 are now (re)exposed to the users(issue 563).
Acaret bug was discovered by Jiebiao Wang whereglmboost,gamboost, andblackboost models incorrectly reported the class probabilities(issue 560).
Training data weights support was added toxgbTree model by schistyakov.
Regularized logistic regression through Liblinear (LiblineaR::LiblineaR) using L1 or L2 regularization were added by hadjipantelis.
A bug related to the ordering of axes labels in the heatmap plot of training results was fixed by Mateusz Dziedzic in(issue 620).
A variable importance method for model averaged neural networks was added.
More logic was added so that thepredict method behaves well when a variable is subtracted from a model formula from(issue 574).
More documentation was added for theclass2ind function ((issue 592)).
Fixed the formatting of the design matrices in thedummyVars man file.
A note was added to?trainControl about using custom resampling methods ((issue 584)).
A bug was fixed related to SMOTE and ROSE sampling with one predictor ((issue 612)).
Due to changes in thekohonen package, thebdk model is no longer available and the code behind thexyf model has changes substantially (including the tuning parameters). Also, when usingxyf, a check is conducted to make sure that a recent version of thekohonen package is being used.
Changes toxgbTree andxgbLinear to help with sparse matrix inputs for(issue 593). Sparse matrices are not allowed when preprocessing or subsampling are used.
Several PLS models were using the classical orthogonal scores algorithm when discriminant analysis was conducted (despite usingsimpls,widekernelpls, orkernelpls). Now, the PLSDA model estimation method is consistent with the method requested ((issue 610)).
Added Multi-Step Adaptive MCP-Net (method = "msaenet") for(issue 561).
The variable importance score for linear regression was modified so that missing values in the coefficients are converted to zero.
Intrain,x is now required to have column names.

Changes in version 6.0-73

Negative binomial generalized linear models (MASS:::glm.nb) were added(issue 476)
mnLogLoss now returns a named vector ((issue 514), bug found by Jay Qi)
A bunch of method/class related bugs induced by the previous version were fixed.

Changes in version 6.0-72

The inverse hyperbolic sine transformation was added topreProcess(issue 56)
Tyler Hunt moved the ROC code from thepROC package to theModelMetrics package which should make the computations more efficient(issue 482).
train does a better job of respecting the original format of the input data(issue 474)
A bug inbdk andxyf models was fixed where the appropriate number of parameter combinations are tested during random search.
A bug inrfe was fixed related to neural networks found by david-machinelearning(issue 485)
Neural networks via stochastic gradient descent (method = "mlpSGD") was adapted for classification and a variable importance calculation was added.
h2o versions of glmnet and gradient boosting machines were added with methods"glmnet\_h2o" and"gbm\_h2o". These methods are not currently optimized.(issue 283)
The fuzzy rule-based models (WM,SLAVE,SBC,HYFIS,GFS.THRIFT,GFS.LT.RS,GFS.GCCL,GFS.FR.MOGUL,FS.HGD,FRBCS.W,FRBCS.CHI,FIR.DM,FH.GBML,DENFIS, andANFIS) were modified so that the user can pass in the predictor ranges using therange.data argument to those functions.(issue 498)
A variable importance method was added for boosted generalized linear models(issue 493)
preProcess now has an option to filter out highly correlated predictors.
trainControl now has additional options to modify the parameters of near-zero variance and correlation filters. See thepreProcOptions argument.
TherotationForest androtationForestCp methods were revised to evaluate onlyfeasible values of the parameterK (the number of variable subsets). The underlyingrotationForest function reduces this parameter until values ofK divides evenly into the number of parameters.
Theskip option fromcreateTimeSlices was added totrainControl(issue 491)
xgb.train's optionsubsample was added to thexgbTree model(issue 464)

Changes in version 6.0-71

Precision, recall, and F measure functions were added along with one calledprSummary that is analogous totwoClassSummary. Also,confusionMatrix gains an argument calledmode that dictates what output is shown.
schistyakov added additional tuning parameters to the robust linear model code(issue 454). Also forrlm andlm schistyakov added the ability to tune over the intercept/no intercept model.
Generalized additive models for very large datasets (bam inmgcv) was added(issue 453)
Two more linear SVM models were added from theLiblineaR package with model codessvmLinear3 andsvmLinearWeights2 ((issue 441))
Thetau parameter was added to all of the least square SVM models ((issue 415))
A new data set (calledscat) on animal droppings was added.
A significant bug was fixed where the internals of how R creates a model matrix was ignoringna.action when the default was set tona.fail(issue 461). This means thattrain will now immediately fail if there are any missing data. To use imputation, usena.action = na.pass and the imputation method of your choice in thepreProcess argument. Also, a warning is issued if the user asks for imputation but uses the formula method and excludes missing data inna.action

Changes in version 6.0-70

Based on a comment by Alexis Sarda,method = "ctree2" does not fixmincriterion = 0 and tunes over this parameter. For a fixed depth,mincriterion can further prune the tree(issue 409).
A bug in KNN imputation was fixed (found by saviola777) that occurred when a factor predictor was in the data set(issue 404).
Infrastructure changes were made so thattrain tries harder to respect the original class of the outcome. For example, if an ordered factor is used as the outcome with a modeling function that treats is as an unordered factor, the model still produces an ordered factor during prediction.
Theranger code now allows for case weights(issue 414).
twoClassSim now has an option to compute ordered factors.
High-dimensional regularized discriminant analysis and, regularized linear discriminant analysis, and several variants of diagonal discriminant analysis from thesparsediscrim package were added (method = "hdrda",method = "rlda", andmethod = "dda", respectively)(issue 313).
A neural network regression model optimized by stochastic gradient decent from theFCNN4R package was added. The model code ismlpSGD.
Several models for ordinal outcomes were added:rpartScore (from therpartScore package),ordinalNet (ordinalNet),vglmAdjCat (VGAM),vglmContRatio (VGAM), andvglmCumulative (VGAM). Note that, for models that loadVGAM, there is a conflict such that thepredictors class code fromcaret is masked. To use that method, you can usecaret:::predictors.train() instead ofpredictors().
Another high performance random forest package (Rborist) was exposed throughcaret. The model code ismethod = "Rborist"(issue 418)
Xavier Robin fixed a bug related to the area under the ROC curve in(issue 431).
A bug inprint.train was fixed when LOO CV was used(issue 435)
With RFE, a better error message drafted by mikekaminsky is printed when the number of importance measures is off(issue 424)
Another bug was fixed in estimating the prediction time when the formula method was used(issue 420).
A linear SVM model was added that uses class weights.
The linear SVM model using thee1071 package (method = "svmLinear2") had thegamma parameter for the RBF kernel removed.
Xavier Robin committed changes to make sure that the area under the ROC is accurately estimated(issue 431)

Changes in version 6.0-68

print.train no longer shows the standard deviation of the resampled values unless the new option is used (print.train(, showSD = TRUE)). When shown, they are within parentheses (e.g. "4.24 (0.493)").
An adjustment the innards of adaptive resampling was changed so that the test for linear dependencies is more stringent.
A bug in the bootstrap 632 estimate was found and fixed by Alexis Sarda(issue 349)(issue 353).
Thecforest module'soob element was modified based on another bug found by Alexis Sarda(issue 351).
The methods forbagEarth,bagEarthGCV,bagFDA,bagFDAGCV,earth,fda, andgcvEarth models have been updates so that case-weights can be used.
Therda module contained a bug found by Eric Czech(issue 369).
A bug was fixed for printing out the resampling details with LGOCV found by github user zsharpm(issue 366)
A new data set was added (data(Sacramento)) with sale prices of homes.
Another adaboost algorithm (method = "adaboost" from thefastAdaboost package) was added(issue 284).
Yet another boosting algorithm (method = "deepboost" from thedeepboost package) was added(issue 388).
Alexis Sarda made changes to the confusion matrix code fortrain,rfe, andsbf objects that more rationally normalizes the resampled tables(issue 355).
A bug in howRSNNS perceptron models were tuned (found by github user smlek) was fixed(issue 392).
A bug in computing the bootstrap 632 estimate was fixed (found by Stu)(issue 382).
John Johnson contributed an update toxgbLinear(issue 372).
Resampled confusion matrices are not automatically computed when there are 50 or more classes due to the storage requirements ((issue 356)). However, the relevant functions have been updated to use the out-of-sample predictions instead (when the user asks for them to be returned by the function).
Some changes were made topredict.train to error trap (and fix) cases when predictions are requested without referencing anewdata object(issue 347).
Github user pverspeelt identified a bug in our model code forglmboost (andgamboost) related to themstop function modifying the model object in memory. It was fixed(issue 396).
For(issue 346), an option to select which samples are used to fit the final model, calledindexFinal, was added totrainControl.
For issue(issue 390) found by JanLauGe, a bug was fixed indummyVars related to the names of the resulting data set.
Modelsrknn andrknnBel were removed since their package is no longer on CRAN.

Changes in version 6.0-66

Model averaged naive Bayes (method = "manb") from thebnclassify package was added.
blackboost was updated to work with outcomes with 3+ classes.
A new modelrpart1SE was added. This has no tuning parameters and resamples the internalrpart procdure of pruning using the one standard error method.
Another model (svmRadialSigma) tunes over the cost parameter and the RBF kernel parameter sigma. In the latter case, usingtuneLength will, at most, evaluate six values of the kernel parameter. This enables a broad search over the cost parameter and a relatively narrow search oversigma.
Additional model tags for "Accepts Case Weights", "Two Class Only", "Handle Missing Predictor Data", "Categorical Predictors Only", and "Binary Predictors Only" were added. In some cases, a new model element called "notes" was added to the model code.
A pre-processing method called "conditionalX" was added that eliminates predictors where the conditional distribution (X|Y) for that predictor has a single value. See thecheckConditionalX function for details. This is only used for classification.(issue 334)
A bug in the naive Bayes prediction code was found by github user pverspeelt and was fixed.(issue 345)
Josh Brady (doublej2) found and fixed an issue withDummyVars(issue 344)
A bug related to recent changes to theranger package was fixed(issue 320)
Dependencies on external software can now be checked in the model code. SeepythonKnnReg for an example. This also removes the overall package dependency onrPython (issue 328).
The tuning parameter grid forenpls andenpls.fs were changed to avoid errors.
A bug was fixed(issue 342) where the data used for prediction was inappropriately converted from its original class.
Matt (aka washcycle) added option to return column names tonearZeroVar function
Homer Strong fixedvarImp forglmnet models so that they return the absolute value of the regression coefficients(issue 173)(issue 190)
The basic naive Bayes method (method = "nb") gained a tuning parameter,adjust, that adjusts the bandwidth (see?density). The parameter is ignored whenusekernel = FALSE.

Changes in version 6.0-62

From therandomGLM package, a model of the same name was added.
Frommonomvn package, models for the Bayesian lasso and ridge regression were added. In the latter case, two methods were added.blasso creates predictions using the mean of the posterior distributions but sets some parameters specifically to zero based on the tuning parameter calledsparsity. For example, whensparsity = .5, only coefficients where at least half the posterior estimates are nonzero are used. The other model,blassoAveraged, makes predictions across all of the realizations in the posterior distribution without coercing any coefficients to zero. This is more consistent with Bayesian model averaging, but is unlikely to produce very sparse solutions.
From thespikeslab package, a regression model was added that emulates the procedure used bycv.spikeslab where the tuning variable is the number of retained predictors.
A bug was fixed in adaptive resampling (found by github user elephann)(issue 304)
Fixed another adaptive resampling bug flagged by github user elephann related to the latest version of theBradleyTerry2 package. Thanks to Heather Turner for the fix(issue 310)
Yuan (Terry) Tang added more tuning parameters toxgbTree models.
ModelsvmRadialWeights was updated to allow for class probabilities. Previously,kernlab did not change the probability estimates when weights were used.
Aggplot2 method forvarImp.train was added(issue 231)
Changes were made for the package to work with the next version ofggplot2 (issue 317)
Github userfjeze added new modelsmlpML andmlpWeightDecayML that extend the existingRSNNS models to multiple layers.fjeze also added thegamma parameter to thesvmLinear2 model.
A function for generating data for learning curves was added.
The range of SVM cost values explored in random search was expanded.

Changes in version 6.0-58

A major bug was fixed (found by Harlan Harris) where pre-processing objects created from versions of the package prior to 6.0-57 can give incorrect results when run with 6.0-57(issue 282).
preProcess can now remove predictors using zero- and near zero-variance filters via (method values of"zv" and"nzv"). When used, these filters are applied to numeric predictors prior to all other pre-processing operations.
train now throws an error for classification tasks where the outcome has a factor level with no observed data(issue 260).
Character outcomes passed totrain are not converted to factors.
A bug was found and fixed in this package's class probability code forgbm models when a single multinomial observation is predicted(issue 274).
A new option toggplot.train was added that highlights the optimal tuning parameter setting in the cases where grid search is used (thanks to Balaji Iyengar (github: bdanalytics)).
IntrainControl, the argumentsavePredictions can now be character values ("final","all" or"none"). Logicals can still be used and match to"all" or"none".

Changes in version 6.0-57

Hyperparameter optimization via random search is now availible. See the newhelp page for examples and syntax.
preProcess now allows (but ignores) non-numeric predictor columns.
Models were added for optimal weighted and stabilized nearest neighbor classifiers from thesnn package were added with model codessnn andownn
Random forests using the excellentranger package were added (method = "ranger")
An additional variation of rotation forests was added (rotationForest2) that also tunes overcp. Unfortunately, the sub-model trick can't be utilized in this instance.
Kernelized distance weighted discriminant analysis models fromkerndwd where added (dwdLieanr,dwdPoly, anddwdRadial)
A bug was fixed withrfe whentrain was used to generate a classification model but class probabilities were not (or could not be) generated(issue 234).
Can Candan added a python modelsklearn.neighbors.KNeighborsRegressor that can be accessed viatrain using therPython package. The python modulessklearn andpandas are required for this to run.
Jason Aizkalns fixed a bunch of typos.
MarwaNabil found a bug withlift and missing values(issue 225). This was fixed such that missing values are removed prior to the calculations (within each model)
Additional options were added toLPH07_1 so that two class data can also be simulated and predictors are converted to factors.
The model-specific code for computing out-of-bag performance estimates were moved into the model code library(issue 230).
A variety of naive Bayes and tree augmented naive Bayes classifier from thebnclassify package were added. Variations include simple models (methods labeled as"nbDiscrete" and"tan"), models using attribute weighting ("awnb" and"awtan"), and wrappers that use search methods to optimize the network structure ("nbSearch" and"tanSearch"). In each case, the predictors and outcomes must all be factor variables; for that reason, using the non-formula interface totrain (e.g.train(x, y)) is critical to preserve the factor structure of the data.
A function calledmultiClassSummary was added to compute performance values for problems with three or more classes. It works with or without predicted class probabilities(issue 107).
confusionMatrix was modified to deal with name collisions between this package andRSNNS (issue 256).
A bug in how the LVQ tune grid is filtered was fixed.
A bug inpreProcess for ICA and PCA was fixed.
Bugs inavNNet andpcaNNet when predicting class probabilities were fixed(issue #261).

Changes in version 6.0-52

A new model using therandomForest andinTrees packages calledrfRules was added. A basic random forest model is used and then is decomposed into rules (of user-specified complexity). TheinTrees package is used to prune and optimize the rules. Thanks to Mirjam Jenny who suggested the workflow.
Other new models (and their packages):bartMachine (bartMachine),rotationForest (rotationForest),sdwd (sdwd),loclda (klaR),nnls (nnls),svmLinear2 (e1071),rqnc (rqPen), andrqlasso (rqPen)
When specifying your own resampling indices, a value ofmethod = "custom" can be used withtrainControl for better printing.
Tim Lucas fixed a bug inavNNet whenbag = TRUE
Fixed a bug found byruggerorossi inmethod = "dnn" with classification.
A new option calledsampling was added totrainControl that allows users to subsample their data in the case of a class imbalance. Anotherhelp page was added to explain the features.
Class probabilities can be computed forextraTrees models now.
When PCA pre-processing is conducted, the variance trace is saved in an object calledtrace.
More error traps were added for common mistakes (e.g. bad factor levels in classification).
An internal function (class2ind) that can be used to make dummy variables for a single factor vector is now documented and exported.
A bug was fixed in thexyplot.lift where the reference line was incorrectly computed. Thanks to Einat Sitbon for finding this.
A bug related to calculating the Box-Cox transformation found by John Johnson was fixed.
github userEdwinTh developed a faster version offindCorrelation and found a bug in the original code.findCorrelation has two new arguments, one of which is calledexact which defaults to use the original (fixed) function. Usingexact = FALSE uses the faster version. The fixed version of the "exact" code is, on average, 26-fold slower than the current version (for 250x250 matrices) although the average time for matrices of this size was only 26s. The exact version yields subsets that are, one average, 2.4 percent smaller than the other versions. This difference will be more significant for smaller matrices. The faster ("approximate") version of the code is 8-fold faster than the current version.
github userslyuee found a bug in thegam model fitting code.
Chris Kennedy fixed a bug in thebartMachine variable importance code.

Changes in version 6.0-47

CHAID from the R-Forge packageCHAID
ModelsxgbTree amdxgbLinear from thexgboost package were added. That package is not on CRAN and can be installed from github using thedevtools package andinstall_github('dmlc/xgboost',subdir='R-package').
dratewka enabledrbf models for regression.
A summary function for the multinomial likelihood calledmnLogLoss was added.
The total object size forpreProces objects that used bagged imputation was reduced almost 5-fold.
A new option totrainControl calledtrim was added where, if implemented, will reduce the model's footprint. However, features beyond simple prediction may not work.
A rarely occurring bug ingbm model code was fixed (thanks to Wade Cooper)
splom.resamples now respects themodels argument
A new argument tolift calledcuts was added to allow more control over what thresholds are used to calculate the curve.
Thecuts argument ofcalibration now accepts a vector of cut points.
Jason Schadewald noticed and fixed a bug in the man page fordummyVars
Call objects were removed from the following models:avNNet,bagFDA,icr,knn3,knnreg,pcaNNet, andplsda.
An argument was added tocreateTimeSlices to thin the number of resamples
The RFE-related functionslrFuncs,lmFuncs, andgamFuncs were updated so thatrfe accepts a matrixx argument.
Using the default grid generation withtrain andglmnet, an initialglmnet fit is created withalpha = 0.50 to define thelambda values.
train models for"gbm","gam","gamSpline", and"gamLoess" now allow their respective arguments for the outcome probability distribution to be passed to the underlying function.
A bug inprint.varImp.train was fixed.
train now returns an additional column calledrowIndex that is exposed when calling the summary function during resampling.
The ability to compute class probabilities was removed from therpartCost model since they are unlikely to agree with the class predictions.
extractProb no longer redundantly callsextractPrediction to generate the class predictions.
A new function calledvar_seq was added that finds a sequence of integers that can be useful for some tuning parameters such as random forestsmtry. Model modules were update to use the new function.
n.minobsinnode was added as a tuning parameter togbm models.
For models using out-of-bag resampling,train now properly checks themetric argument against the names of the measured outcomes.
BothcreateDataParition andcreateFolds were modified to better handle cases where one or more class have very low numbers of data points.

Changes in version 6.0-41

The license was changed to GPL (>= 2) to accommodate new code from the GA package.
New feature selection functionsgafs andsafs were added, along with helper functions and objects, were added. The package HTML was updated to expand more about feature selection.
From theadabag package, two new models were added:AdaBag andAdaBoost.M1.
Weighted subspace random forests from thewsrf package was added.
Additional bagged FDA and MARS models were added (model codesbagFDAGCV andbagEarthGCV) were added that use the GCV statistic to prune the model. This leads to memory reductions during training.
The model code forada had a bug fix applied and the code was adapted to use the "sub-model trick" so it should train faster.
A bug was fixed related to imputation when the formula method is used withtrain
The olddrop = FALSE bug was fixed ingetTrainPerf
A bug was fixed for custom models with no labels.
A bug fix was made for bagged MARS models when predicting probabilities.
Intrain, the argumentlast was being incorrectly set for the last model.
Reynald Lescarbeau refactoredfindCorrelation to make it faster.
The apparent performance values are not reported byprint.train when the bootstrap 632 estimate is used.
When a required package is missing, the code stops earlier with a more explicit error message.

Changes in version 6.0-37

Brenton Kenkel added ordered logistic or probit regression totrain usingmethod = "polr" fromMASS
LPH07_1 now encodes the noise variables as binary
Bothrfe andsbf get arguments forindexOut for their control functions.
A reworked version ofnearZerVar based on code from Michael Benesty was added the old version is now callednzv that uses less memory and can be used in parallel.
The adaptive mixture discriminant model from theadaptDA package was added as well as a robust mixture discriminant model from therobustDA package.
The multi-class discriminant model using binary predictors in thebinda package was added.
Ensembles of partial least squares models (via theenpls) package was added.
A bug usinggbm with Poisson data was fixed (thanks to user eriklampa)
sbfControl now has amultivariate option where all the predictors are exposed to the scoring function at once.
A functioncompare_models was added that is a simple comparison of models viadiff.resamples).
The row names for thevariables component ofrfe objects were simplified.
Philipp Bergmeir found a bug that was fixed wherebag would not run in parallel.
predictionBounds was not implemented during resampling.

Changes in version 6.0-35

A few bug fixes topreProcess were made related to KNN imputation.
The parameter labels for polynomial SVM models were fixed
The tags fordnn models were fixed.
The following functions were removed from the package:generateExprVal.method.trimMean,normalize.AffyBatch.normalize2Reference,normalize2Reference, andPLS. The original code and the man files can be found athttps://github.com/topepo/caret/tree/master/deprecated.
A number of changes to comply with section 1.1.3.1 of "Writing R Extensions" were made.

Changes in version 6.0-34

For the input datax totrain, we now respect the class of the input value to accommodate other data types (such as sparse matrices). There are some complications though; for pre-processing we throw awarning if the data are not simple matrices or data frames since there is some infrastructure that does not exist for other classes( e.g.complete.cases). We also throw a warning ifreturnData <- TRUE and it cannot be converted to a data frame. This allows the use of sparse matrices and text corpus to be used as inputs into that function.
plsRglm was added.
From thefrbs, the following rule-based models were added:ANFIS,DENFIS,FH.GBML,FIR.DM,FRBCS.CHI,FRBCS.W,FS.HGD,GFS.FR.MOGAL,GFS.GCCL,GFS.LTS,GFS.THRIFT,HYFIS,SBC andWM. Thanks to Lala Riza for suggesting these and facilitating their addition to the package.
From thekernlab package, SVM models using string kernels were added:svmBoundrangeString,svmExpoString,svmSpectrumString
A functionupdate.rfe was added.
cluster.resamples was added to the namespace.
An option to choose themetric was added tosummary.resamples.
prcomp.resamples now passed... toprcomp. Also the call toprcomp uses the formula method so thatna.action can be used.
The functionresamples was enhanced so thattrain andrfe models that usedreturnResamp="all" subsets the resamples to get the appropriate values and issues a warning. The function also fills in missing model names if one or more are not given.
Several regression simulation functions were added:SLC14_1,SLC14_2,LPH07_1 andLPH07_2
print.train was re-factored so thatformat.data.frame is now used. This should behave better when usingknitr.
The error message intrain.formula was improved to provide more helpful feedback in cases where there is at least one missing value in each row of the data set.
ggplot.train was modified so that groups are distinguished by color and shape.
Options were added toplot.train andggplot.train callednameInStrip that will print the name and value of any tuning parameters shown in panels.
A bug was fixed by Jia Xu within the knn imputation code used bypreProcess.

Changes in version 6.0-30

A missing piece of documentation intrainControl for adaptive models was filled in.
A warning was added toplot.train andggplot.train to note that the relationship between the resampled performance measures and the tuning parameters can be deceiving when using adaptive resampling.
A check was added totrainControl to make sure that a value ofmin makes sense when using adaptive resampling.

Changes in version 6.0-29

A man page with the list of models available viatrain was added back into the package. See?models.
Thoralf Mildenberger found and fixed a bug in the variable importancecalculation for neural network models.
The output ofvarImp forpamr models was updated to clarify the ordering of the importance scores.
getModelInfo was updated to generate a more informative error message if the user looks for a model that is not in the package's model library.
A bug was fixed related to how seeds were set inside oftrain.
The model"parRF" (parallel random forest) was added back into the library.
When case weights are specified intrain, the hold-out weights are exposed when computing the summary function.
A check was made to convert adata.table given totrain to a data frame (seehttps://stackoverflow.com/questions/23256177/r-caret-renames-column-in-data-table-after-training).

Changes in version 6.0-25

Changes were made that stopped execution oftrain if there are no rows in the data (changes suggested by Andrew Ziem)
Andrew Ziem also helped improve the documentation.

Changes in version 6.0-24

Several models were updated to work with case weights.
A bug inrfe was found where the largest subset size have the same results as the full model. Thanks to Jose Seoane for reporting the bug.

Changes in version 6.0-22

For some parallel processing technologies, the package now exportmore internal functions.
A bug was fixed inrfe that occurred when LOO CV was used.
Another bug was fixed that occurred for some models whentuneGrid contained only a single model.

Changes in version 6.0-21

A new system for user-defined models has been added.
When creating the grid of tuning parameter values, the columnnames no longer need to be preceded by a period. Periods can still beused as before but are not required. This isn't guaranteed to breakbackwards compatibility but it may in some cases.
trainControl now has amethod = "none" resamplingoption that bypasses model tuning and fits the model to the entiretraining set. Note that if more than one model is specified an errorwill occur.
logicForest models were removed since the package isnow archived.
CSimca andRSimca models from therrcovHDpackage were added.
Modelelm from theelmNNpackage was added.
Modelsrknn andrknnBel from therknnpackage were added
Modelbrnn from thebrnnpackage was added.
panel.lift2 andxyplot.lift now have an argumentcalledvalues that show the percentages of samples found forthe specified percentages of samples tested.
train,rfe andsbf should no longer throwa warning that "executing
Aggplot method fortrain was added.
Imputation via medians was added topreProcess by Zachary Mayer.
A small change was made torpart models. Previously, when thefinal model is determined, it would be fit by specifying the model using thecp argument ofrpart.control. This could lead to duplicated Cpvalues in the final list of possible Cp values. The current version fits thefinal model slightly different. An initial model is fit usingcp = 0then it is pruned usingprune.rpart to the desired depth. Thisshouldn't be different for the vast majority of data sets. Thanks to JeffEvans for pointing this out.
The method for estimating sigma for SVM and RVM models was slightlychanged to make them consistent with howksvm andrvm does theestimation.
The default behavior forreturnResamp inrfeControl andsbfControl is nowreturnResamp = "final".
cluster was added as a general class with a specific methodforresamples objects.
The refactoring of model code resulted in a number of packages beingeliminated from the depends field. Additionally, a few were moved to exports.

Changes in version 5.17-07

A bug inspatialSign was fixed for data frames witha single column.
Pre-processing was not applied to the training data setprior to grid creation. This is now done but only for modelsthat use the data when defining the grid. Thanks to Brad Buchsbaumfor finding the bug.
Some code was added torfe to truncate the subsetsizes in case the user over-specified them.
A bug was fixed ingamFuncs for therfefunction.
Option intrainControl,rfeControl andsbfControl were added so that the user can set theseed at each resampling iteration (most useful for parallelprocessing). Thanks to Allan Engelhardt for the recommendation.
Some internal refactoring of the data was done to preparefor some upcoming resampling options.
predict.train now has an explicitna.actionargument defaulted tona.omit. If imputation is used intrain, thenna.action = na.pass is recommended.
A bug was fixed indummyVars that occured whenmissing data were innewdata. The functioncontr.dummy is now deprecated andcontr.ltfrshould be used (if you are using it at all). Thanks tostackexchange user mchangun for finding the bug.
A check is now done insidedummyVars whenlevelsOnly = TRUE to see if any predictors share commonlevels.
A new optionfullRank was added todummyVars.When true,contr.treatment is used. Otherwise,contr.ltfr is used.
A bug intrain was fixed withgbm models(thanks to stackoverflow user screechOwl for finding it).

Changes in version 5.16-24

Theprotoclass function in theprotoclasspackage was added. The model uses a distance matrix as input andthetrain method also uses theproxy package tocompute the distance using the Minkowski distance. The two tuningparameters is the neighborhood size (eps) and the Minkowskidistance parameter (p).
A bug was (hopefully) fixed that occurred when some type ofparallel processing was used withtrain. The problem isthat themethods package was not being loaded in the workers.While reproducible, it is unknown why this occurs and why it isonly for some technologies and systems. Themethods packageis now a formal dependency and we coerce the workers to load itremotely.
A bug was fixed where some calls were printed twice.
Forrpart,C5.0 andksvm, cost-sensitiveversions of these models for two classes were added totrain.The method values arerpartCost,C5.0Cost andsvmRadialWeights.
The prediction code for theksvm models was changed. Thereare some cases where the class predictions and the predicted classprobabilities disagree. This usually happens when the probabilities areclose to 0.50 (in the two class case). Akernlab bug has beenfiled. In the meantime, if theksvm model uses a probabilitymodel, the class probabilities are generated first and the predictedclass is assigned to the probability with the largest value. Thanks toKjell Johnson for finding that one.
print.train was changed so that tune parameters that arelogicals are printed well.

Changes in version 5.16-13

Added a few exemptions to the logic that determines whether a model call should be scrubbed.
An error trap was created to catch issues with missing importance scores inrfe.

Changes in version 5.16-03

A functiontwoClassSim was added for benchmarking classification models.
A bug was fixed inpredict.nullModel related to predicted class probabilities.
The version requirement forgbm was updated.
The functiongetTrainPerf was made visible.
The automatic tuning grid forsda models from thesda package was changed to includelambda.
WhenrandomForests is used withtrain andtuneLength == 1, therandomForests default value formtry is used.
Maximum uncertainty linear discriminant analysis (Mlda) and factor-based linear discriminant analysis (RFlda) from theHiDimDA package were added totrain.

Changes in version 5.15-87

Added the Yeo-Johnson power transformation from thecarpackage to thepreProcess function.
Atrain bug was fixed for therrlda model (foundby Tiago Branquinho Oliveira).
TheextraTrees model in theextraTrees package wasadded.
Thekknn.train model in thekknn package wasadded.
A bug was fixed inlrFuncs where the class threshold wasimproperly set (thanks to David Meyer).
A bug related to newer versions of thegbm package were fixed.Anothergbm bug was fixed related to using non-Bernoulli distributionswith two class outcomes (thanks to Zachary Mayer).
The old funcitongetTrainPerf was finally made visible.
Some models are created using "do.call" and may contain theentire data set in the call object. A function to "scrub" some model callobjects was added to reduce their size.
The tuning process forsda:::sda models was changed toadd thelambda parameter.

Changes in version 5.15-60

A bug inpredictors.earth, discovered by Katrina Bennett,was fixed.
A bug induced by version 5.15-052 for the bootstrap 632 rule wasfixed.
The DESCRIPTION file as of 5.15-048 should have used aversion-specific lattice dependency.
lift can compute gain and lift charts (and defaults togain)
Thegbm model was updated to handle 3 or more classes.
For bagged trees usingipred, the code intraindefaults tokeepX = FALSE to save space. Pass inkeepX =TRUE to use out-of-bag sampling for this model.
Changes were made to support vector machines for classificationmodels due to bugs with class probabilities in the latest version ofkernlab. Theprob.model will default to the value ofclassProbs in thetrControl function. Ifprob.model is passed in as an argument totrain, thisspecification over-rides the default. In other words, to avoidgenerating a probability model, set eitherclassProbs = FALSEorprob.model = FALSE.

Changes in version 5.15-052

Addedbayesglm from thearm package.
A few bugs were fixed inbag, thanks to KeithWoolner. Most notably, out-of-bag estimates are now computed when theprediction function includes a column calledpred.
Parallel processing was implemented inbag andavNNet, which can be turned off using an optional arguments.
train,rfe,sbf,bag andavNNet were given an additional argument in their respectivecontrol files calledallowParallel that defaults toTRUE. WhenCode, the code will be executed in parallelif a parallel backend (e.g.doMC) is registered. WhenallowParallel = FALSE, the parallel backend is alwaysignored. The use case is whenrfe orsbf callstrain. If a parallel backend with P processors is being used,the combination of these functions will create P^2 processes. Sincesome operations benefit more from parallelization than others, theuser has the ability to concentrate computing resources for specificfunctions.
A new resampling function calledcreateTimeSlices wascontributed by Tony Cooper that generates cross-validation indices fortime series data.
A few more options were added totrainControl.initialWindow,horizon andfixedWindow are applicable for whenmethod ="timeslice". Another,indexOut is an optional list ofresampling indices for the hold-out set. By default, these values arethe unique set of data points not in the training set.
A bug was fixed in multiclassglmnet models whengenerating class probabilities (thanks to Bradley Buchsbaum forfinding it).

Changes in version 5.15-048

The three vignettes were removed and two things were added: asmaller vignette and a large collection of help pages.
Minkoo Seo found a bug wherena.action was not being properlyset with train.formula().
parallel.resamples was changed to properly account formissing values.
Some testing code was removed fromprobFunction andpredictionFunction.
Fixed a bug insbf exposed by a new version ofplyr.
Changed the package dependency onreshape toreshape2.
To be more consistent with recent versions oflattice,theparallel.resamples function was changed toparallelplot.resamples.
Sinceksvm now allows probabilities when class weightsare used, the default behavior intrain is to setprob.model = TRUE unless the user explicitly sets it toFALSE. However, I have reported a bug inksvm that givesinconsistent results with class weights, so this is not advised atthis point in time.
Bugs were fix inpredict.bagEarth andpredict.bagFDA.
When usingrfeControl(saveDetails = TRUE) orsbfControl(saveDetails = TRUE) an additional column isadded toobject$pred calledrowIndex. This indicates therow from the original data that is being held-out.

Changes in version 5.15-045

A bug was fixed that inducedNA values in SVM model predictions.

Changes in version 5.15-042

Many examples are wrapped in dontrun to speed up cran checking.
Thescrda methods were removed from the package (on6/30/12, R Core sent an email that "since we haven't got fixes forlong standing warnings of the rda packages since more than half a yearnow, we set the package to ORPHANED.")
C50 was added (model codesC5.0,C5.0Tree andC5.0Rules).
Fixed a bug intrain with NaiveBayes whenfL != 0was used
The output oftrain withverboseIter = TRUE wasmodified to show the resample label as well as logging when the workerstarted and stopped the task (better when using parallel processing).
Added a long-hidden functiondownSample for class imbalances
AnupSample function was added for class imbalances.
A new file, aaa.R, was added to be compiled first that tries toeliminate the dreaded 'no visible binding for global variable' falsepositives. Specific namespaces were used with several functions foravoid similar warnings.
A bug was fixed withicr.formula that was so ridiculous,I now know that nobody has ever used that function.
Fixed a bug when usingmethod = "oob" withtrain
Some exceptions were added toplot.train so that sometuning parameters are better labeled.
dotplot.resamples andbwplot.resamples now orderthe models using the first metric.
A few of the lattice plots for theresamples class werechanged such that when only one metric is shown: the strip is notshown and the x-axis label displays the metric
When usingtrainControl(savePredictions = TRUE) anadditional column is added toobject$pred calledrowIndex. This indicates the row from the original data that isbeing held-out.
A variable importance function fornnet objects wascreated based on Gevrey, M., Dimopoulos, I., & Lek, S. (2003). Reviewand comparison of methods to study the contribution of variables inartificial neural network models. ecological modelling, 160(3),249–264.
Thepredictor function forglmnet was update and avariable importance function was also added.
Raghu Nidagal found a bug inpredict.avNNet that wasfixed.
sensitivity andspecificity were given anna.rm argument.
A first attempt at fault tolerance was added totrain. Ifa model fit fails, the predictions are set toNA and a warningis issued (eg "model fit failed for Fold04: sigma=0.00392,C=0.25"). WhenverboseIter = TRUE, the warning is also printedto the log. Resampled performance is calculated on only thenon-missing estimates. This can also be done during predictions, butmust be done on a model by model basis. Fault tolerance was added forkernlab models only at this time.
lift was modified in two ways. First,cuts is nolonger an argument. The function always uses cuts based on the numberof unique probability estimates. Second, a new argument calledlabel is available to use alternate names for the models(e.g. names that are not valid R variable names).
A bug inprint.bag was fixed.
Class probabilities were not being generated for sparseLDAmodels.
Bugs were fixed in the new varImp methods for PART and RIPPER
Starting using namespaces forctree andcforest toavoid conflicts between duplicate function names in thepartyandpartykit package
A set of functions for RFE and logistic regression(lrFuncs) was added.
A bug intrain withmethod="glmStepAIC" was fixedso thatdirection and otherstepAIC arguments werehonored.
A bug was fixed inpreProcess where the number of ICAcomponents was not specified. (thanks to Alexander Lebedev)
Another bug was fixed for oblique random forest methods intrain. (thanks to Alexander Lebedev)

Changes in version 5.15-023

The list of models that can accept factor inputs directly wasexpanded to include theRWeka models,ctree,cforest and custom models.
Added modellda2, which tunes by the number of functionsused during prediction.
predict.train allows probability predictions for custommodels now (thanks to Peng Zhang)
confusionMatrix.train was updated to use the defaultconfusionMatrix code whennorm = "none" and only asingle hold-out was used.
Added variable importance metrics for PART and RIPPER in theRWeka package.
vignettes were moved from /inst/doc to /vignettes

Changes in version 5.14-023

The model details in?train was changed to be morereadable
Added two models from theRRF package.RRF uses apenalty for each predictor based on the scaled variable importancescores from a prior random forest fit.RRFglobal sets a common,global penalty across all predictors.
Added two models from theKRLS package:krlsRadialandkrlsPoly. Both have kernel parameters (sigma anddegree) and a common regularization parameterlambda. The default forlambda isNA, letting thekrls function estimate it internally.lambda can also bespecified viatuneGrid.
twoClassSummary was modified to wrap the call topROC:::roc in atry command. In cases where the hold-outdata are only from one class, this produced an error. Now it generatesNA values for the AUC when this occurs and a general warning isissued.
The underlying workflows fortrain were modified so thatmissing values for performance measures would not throw an error (butwill issue a warning).

Changes in version 5.13-037

Modelsmlp,mlpWeightDecay,rbf andrbfDDA were added fromRSNNS.
Functionsroc,rocPoint andaucRoc finallymet their end. The cake was a lie.
This NEWS file was converted over to Rd format.

Changes in version 5.13-020

lift was expanded intolift.formulafor calculating the plot points andxyplot.lift tocreate the plot.
The package vignettes were altered to stop loading externalRData files.
A fewmatch.call changes were made to pass new R CMDcheck tests.
calibration,calibration.formula andxyplot.calibration were created to make probabilitycalibration plots.
Model typesxyf andbdk from thekohonenpackage were added.
update.train was added so that tuning parameterscan be manually set if the automated approach to setting theirvalues is insufficient.

Changes in version 5.11-006

When usingmethod = "pls" intrain, theplsr function used the default PLS algorithm("kernelpls"). Now, the full orthogonal scores method is used. Thisresults in the same model, but a more extensive set of values arecalculated that enable VIP calculations (without much of a loss incomputational efficient).
A check was added topreProcess to ensure validvalues ofmethod were used.
A new method,kernelpls, was added.
residuals andsummary methods were added totrain objects that pass the final model to theirrespective functions.

Changes in version 5.11-006

Bugs were fixed that prevented hold-out predictions from beingreturned.

Changes in version 5.11-003

A bug inroc was found when the classes were completelyseparable.
The ROC calculations fortwoClassSummary andfilterVarImp were changed to use thepROCpackage. This, and other changes, have increased efficiency. ForfilterVarImp on the cell segmentation data lead to a54-fold decrease in execution time. For the Glass data in themlbench package, the speedup was 37-fold. Warnings wereadded forroc,aucRoc androcPoint regarding their deprecation.
random ferns (packagerFerns) were added
Another sparse LDA model (from the penalizedLDA) was also added

Changes in version 5.09-002

Fixed a bug which occurred whenplsda models were used with classprobabilities
As of 8/15/11, theglmnet function wasupdated to return a character vector. Because of this,train required modification and a version requirementwas put in the package description file.

Changes in version 5.09-006

Shea X made a suggestion and provided code to improve the speedof prediction when sequential parameters are used forgbm models.
Andrew Ziem suggested an error check withmetric = "ROC" andclassProbs = FALSE.
Andrew Ziem found a bug in howtrain obtainedearth class probabilities

Changes in version 5.08-011

Andrew Ziem found another small bug with parallel processing andtrain (functions in the caret namespace cannot be found).
Ben Hoffman found a bug inpickSizeTolerance that was fixed.
Jiaye Yu found (and fixed) a bug in getting predictions back fromrfe

Changes in version 5.07-024

UsingsaveDetails = TRUE insbfControl orrfeControl will save the predictions on the hold-outsets (Jiaye Yu wins the prize for finding that one).
trainControl now has a logical to save the hold-out predictions.

Changes in version 5.07-005

type = "prob" was added foravNNet prediction.
A warning was added when a model fromRWeka is used withtrain and (it appears that)multicore is beingused for parallel processing. The session will crash, so don't dothat.
A bug was fixed where the extrapolation limits were beingapplied inpredict.train but not inextractPrediction. Thanks to Antoine Stevens forfinding this.
Modifications were made to some of the workflow code to exposeinternal functions. When parallel processing was used withdoMPI ordoSMP,foreach did not find somecaret internals (butdoMC did).

Changes in version 5.07-001

changed calls topredict.mvr since thepls package now has anamespace.

Changes in version 5.06-002

a beta version of custom models withtrain is included. The"caretTrain" vignette was updated with a new section that defineshow to make custom models.

Changes in version 5.05-004

laying some of the groundwork for custom models
updates to get away from deprecated (mean and sd on data frames)
The pre-processing intrain bug of the lastversion was not entirely squashed. Now it is.

Changes in version 5.04-007

panel.lift was moved out of the examples in?lift and into thepackage along with another function,panel.lift2.
lift now usespanel.lift2 by default
Added robust regularized linear discriminant analysis from therrlda package
Addedevtree fromevtree
A weird bug was fixed that occurred when some models were run withsequential parameters that were fixed to single values (thanks toAntoine Stevens for finding this issue).
item Another bug was fixed where pre-processing withtrain could fail

Changes in version 5.03-003

pre-processing intrain did not occur for the final model fit

Changes in version 5.02-011

A function,lift, was added to create latticeobjects for lift plots.
Several models were added from theobliqueRF package:'ORFridge' (linear combinations created using L2 regularization),'ORFpls' (using partial least squares), 'ORFsvm' (linear supportvector machines), and 'ORFlog' (using logistic regression). As ofnow, the package only support classification.
Added regression modelssimpls andwidekernelpls. These are new models since bothtrain andplsr have an argumentcalledmethod, so the computational algorithm could not bepassed through using the three dots.
Modelrpart was added that usescp as the tuningparameter. To make the model codes more consistent,rpartandctree correspond to the nominal tuning parameters(cp andmincriterion, respectively) andrpart2andctree2 are the alternate versions usingmaxdepth.
The text forctree's tuning parameter was changed to '1 -P-Value Threshold'
The argumentcontrols was not being properly passedthrough in modelsctree andctree2.

Changes in version 5.01-001

controls was not being set properly forcforestmodels intrain
The print methods fortrain,rfe andsbf did not recognize LOOCV
avNNet sometimes failed with categorical outcomes withbag = FALSE
A bug inpreProcess was fixed that was triggered by matrices withoutdimnames (found by Allan Engelhardt)
bagged MARS models with factor outcomes now work
cforest was using the argumentcontrol instead ofcontrols
A few bugs for class probabilities were fixed forslda,hdda,glmStepAIC,nodeHarvest,avNNet andsda
When looping over models and resamples, theforeachpackage is now being used. Now, when using parallel processing, thecaret code stays the same and parallelism is invoked usingone of the "do" packages (eg.doMC,doMPI, etc). Thisaffectstrain,rfe andsbf. Their respective man pages have been revised toillustrate this change.
The order of the results produced bydefaultSummary were changedso that the ROC AUC is first
A few man and C files were updated to eliminate R CMD check warnings
Now that we are using foreach, the verbose option intrainControl,rfeControl andsbfControl are now defaulted toFALSE
rfe now returns the variable ranks in a single data frame (previouslythere were data frames in lists of lists) for each of use. This willwill break code from previous versions. The built-in RFE functionswere also modified
confusionMatrix methods forrfe andsbf were added
NULL values of 'method' inpreProcess are no longer allowed
a model for ridge regression was added (method = 'ridge') based onenet.

Changes in version 4.98

A bug was fixed in a few of the bagging aggregationfunctions (found by Harlan Harris).
Fixed a bug spotted by Richard Marchese Robinson in createFoldswhen the outcome was numeric. The issue is thatcreateFolds is trying to randomizen/4 numericsamples tok folds. With less than 40 samples, it could notalways do this and would generate less thank folds in somecases. The change will adjust the number of groups based onn andk. For small samples sizes, it will not usestratification. For larger data sets, it will at most group thedata into quartiles.
A functionconfusionMatrix.train was added to get an averageconfusion matrices across resampled hold-outs when using thetrain function for classification.
Added another model,avNNet, that fits several neural networksvia thennet package using different seeds, then averages thepredictions of the networks. There is an additional baggingoption.
The default value of the 'var' argument ofbag was changed.
As requested, most options can be passed fromtrain topreProcess. ThetrainControl function was re-factored and severaloptions (e.g.k,thresh) were combined into a singlelist option calledpreProcOptions. The default is consistentwith the original configuration:preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5)
nother option was added topreProcess. ThepcaCompoption can be used to set exactly how many components are used(as opposed to just a threshold). It defaults toNULL so thatthe threshold method is still used by default, but a non-nullvalue ofpcaComp over-ridesthresh.
When created withintrain, the call forpreProcess is nowmodified to be a text string ("scrubed") because the call couldbe very large.
Removed two deprecated functions:applyProcessing andprocessData.
A new version of the cell segmentation data was saved and theoriginal version was moved to the package website (seesegmentationData for location). First, severaldiscrete versions of some of the predictors (with the suffix"Status") were removed. Second, there are several skewedpredictors with minimum values of zero (that would benefit fromsome transformation, such as the log). A constant value of 1 wasadded to these fields:AvgIntenCh2,FiberAlign2Ch3,FiberAlign2Ch4,SpotFiberCountCh4 andTotalIntenCh2.

Changes in version 4.92

Some tweaks were made toplot.train in a effort to get the groupkey to look less horrid.
train,rfe andsbf arenow able to estimate the time that these models take to predict newsamples. Their respective control objects have a new option,timingSamps, that indicates how many of the training set samplesshould be used for prediction (the default of zero means do notestimate the prediction time).
xyplot.resamples was modified. A new argument,what, has values:"scatter" plots the resampledperformance values for two models;"BlandAltman" plots thedifference between two models by the average (aka a MA plot) for twomodels;"tTime","mTime","pTime" plot the totalmodel building and tuning; time ("t") or the final modelbuilding time ("m") or the time to produce predictions("p") against a confidence interval for the averageperformance. 2+ models can be used.
Three new model types were added totrain usingregsubsets in theleaps package:"leapForward","leapBackward" and"leapSeq". Thetuning parameter,nvmax, is the maximum number of terms in thesubset.
The seed was accidentally set whenpreProcess used ICA (spottedby Allan Engelhardt)
preProcess was always being called (even to do nothing)(found by Guozhu Wen)

Changes in version 4.91

Added a few new models associated with thebst package: bstTree,bstLs and bstSm.
A model denoted as"M5" that combines M5P and M5Rules from theRWeka package. This new model uses either of these functionsdepending on the tuning parameter"rules".

Changes in version 4.90

Fixed a bug withtrain andmethod = "penalized". Thanks toFedor for finding it.

Changes in version 4.89

A new tuning parameter was added forM5Rules controlling smoothing.
The Laplace correction value for Naive Bayes was also added as atuning parameter.
varImp.RandomForest was updated to work. It now requires a recentversion of theparty package.

Changes in version 4.88

A variable importance method was created forCubist models.

Changes in version 4.87

Altered the earth/MARS/FDA labels to be more exact.
Added cubist models from theCubist package.
A new option totrainControl was added to allowusers to constrain the possible predicted values of the model to therange seen in the training set or a user-defined range. One-sidedranges are also allowed.

Changes in version 4.85

Two typos fixed inprint.rfe andprint.sbf (thanks to Jan Lammertyn)

Changes in version 4.83

dummyVars failed with formulas using"."(all.vars does not handle this well)
tree2 was failing for some classification models
When SVM classification models are used withclass.weights, theoptionsprob.model is automatically set toFALSE (otherwise, itis always set toTRUE). A warning is issued that the model willnot be able to create class probabilities.
Also for SVM classification models, there are cases when theprobability model generates negative class probabilities. Inthese cases, we assign a probability of zero then coerce theprobabilities to sum to one.
Several typos in the help pages were fixed (thanks to Andrew Ziem).
Added a new model,svmRadialCost, that fits the SVM modeland estimates thesigma parameter for each resample (toproperly capture the uncertainty).
preProcess has a new method called"range" that scales the predictorsto [0, 1] (which is approximate for new samples if the training setranges is narrow in comparison).
A check was added totrain to make sure that, when the user passesa data frame totuneGrid, the names are correct and complete.
print.train prints the number of classes and levels for classificationmodels.

Changes in version 4.78

Added a few bagging modules. See ?bag.
Added basic timings of the entire call totrain,rfe andsbfas well as the fit time of the final model. These are stored in an elementcalled "times".
The data files were updated to use better compression, which added ahigher R version dependency.
plot.train was pretty much re-written to more effectively use trellis themedefaults and to allow arguments (e.g. axis labels, keys, etc) to be passedin to over-ride the defaults.
Bug fix for lda bagging function
Bug fix forprint.train whenpreProc isNULL
predict.BoxCoxTrans would go all klablooey if there were missingvalues
varImp.rpart was failing with some models (thanks to Maria Delgado)

Changes in version 4.77

A new class was added or estimating and applying the Box-Coxtransformation to data called BoxCoxTrans. This is also included as anoption to transform predictor variables. Although the Box-Tidwelltransformation was invented for this purpose, the Box-Cox transformationis more straightforward, less prone to numerical issues and just aseffective. This method was also added topreProcess.
Fixed mis-labelled x axis inplot.train when atransformation is applied for models with three tuning parameters.
When plotting atrain object withmethod =="gbm" and multiple values of the shrinkage parameter, the ordering ofpanels was improved.
Fixed bugs for regression prediction usingpartDSA andqrf.
Another bug, reported by Jan Lammertyn, related toextractPrediciton with a single predictor was alsofixed.

Changes in version 4.76

Fixed a bug where linear SVM models were not working for classification

Changes in version 4.75

'gcvEearth' which is the basic MARS model. The pruning procedureis the nominal one based on GCV; only the degree is tuned bytrain.
'qrnn' for quantile regression neural networks from theqrnn package.
'Boruta' for random forests models with feature selection via theBoruta package.

Changes in version 4.74

Some changes toprint.train: the call is not automaticallyprinted (but can be whenprint.train is explicitly invoked); the"Selected" column is also not automatically printed (but can be);non-table text now respectsoptions("width"); only significantdigits are now printed when tuning parameters are kept at aconstant value

Changes in version 4.73

Bug fixes topreProcess related to complete.cases and a single predictor.
For knn models (knn3 and knnreg), added automatic conversion of data framesto matrices

Changes in version 4.72

A new function forrfe withgam was added.
"Down-sampling" was implemented withbag so that, forclassification models, each class has the same number of classesas the smallest class.
Added a new class,dummyVars, that creates an entire set ofbinary dummy variables (instead of the reduced, full rank set).The initial code was suggested by Gabor Grothendieck on R-Help.The predict method is used to create dummy variables for anydata set.
AddedR2 andRMSE functions for evaluating regression models
varImp.gam failed to recognize objects frommgcv
a small fix to test a logical vectorfilterVarImp
Whendiff.resamples calculated the number of comparisons,the"models" argument was ignored.
predict.bag was ignoringtype = "prob"
Minor updates to conform to R 2.13.0

Changes in version 4.70

Added a warning totrain when class levels are notvalid R variable names.
Fixed a bug in the variable importance function formultinom objects.
Added p-value adjustments tosummary.diff.resamples. Confidence intervals indotplot.diff.resamples are adjusted accordingly if theBonferroni is used.
Fordotplot.resamples, no point was plotted whenthe upper and/or lower interval values were NaN. Now, the point isplotted but without the interval bars.
Updatedprint.rfe to correctly describe newresampling methods.

Changes in version 4.69

Fixed a bug inpredict.rfe where an error wasthrown even though the required predictors were innewdata.
ChangedpreProcess so that centering and scaling are both automaticwhen PCA or ICA are requested.

Changes in version 4.68

Added two functions,checkResamples andcheckConditionalX that identify predictor data withdegenerate distributions when conditioned on a factor.
Added a high content screening data set (segmentedData) from Hill etal. Impact of image segmentation on high-content screening data qualityfor SK-BR-3 cells. BMC bioinformatics (2007) vol. 8 (1) pp. 340.
Fixed bugs in howsbf objects were printed (when using repeatedCV) and classification models withearth andclassProbs = TRUE.

Changes in version 4.67

Addedpredict.rfe
Added imputation using bagged regression trees topreProcess.
Fixed bug invarImp.rfe that caused incorrectresults (thanks to Lawrence Mosley for the find).

Changes in version 4.65

Fixed a bug wheretrain would not allow knn imputation.
filterVarImp androc now check for missing values anduse complete data for each predictor (instead of case-wise deletion across all predictors).

Changes in version 4.64

Fixed bug introduced in the last version withcreateDataPartition(... list = FALSE).
Fixed a bug predicting class probabilities when usingearth/glm models
Fixed a bug that occurred whentrain was used withctree ortree2 methods.
Fixed bugs inrfe andsbf when running inparallel; not all the resampling results were saved

Changes in version 4.63

A p-value from McNemar's test was added toconfusionMatrix.
Updatedprint.train so that constant parameters are notshown in the table (but a note is written below the tableinstead). Also, the output was changed slightly to bemore easily read (I hope)
AdaptedvarImp.gam to work with eithermgcv orgam packages.
Expanded the tuning parameters forlvq.
Some of the examples in the Model Building vignette were changed
Added bootstrap 632 rule and repeated cross-validationtotrainControl.
A new function,createMultiFolds, isused to generate indices for repeated CV.
The various resampling functions now have *named* listsas output (with prefixes "Fold" for cv and repeated cvand "Resample" otherwise)
Pre-processing has been added totrain with thepreProcess argument. This has been tested when caretfunction are used withrfe andsbf (viacaretFuncs andcaretSBF, respectively).
WhenpreProcess(method = "spatialSign"), centering andscaling is done automatically too. Also, a bug was fixedthat stopped the transformation from being executed.
knn imputation was added topreProcess. TheRANN packageis used to find the neighbors (the knn impute function inthe impute library was consistently generating segmentationfaults, so we wrote our own).
Changed the behavior ofpreProcess in situations wherescaling is requested but there is no variation in thepredictor. Previously, the method would fail. Now awarning is issued and the value of the standarddeviation is coerced to be one (so that scaling hasno effect).

Changes in version 4.62

Addedgam frommgcv (with smoothing splines and featureselection) andgam fromgam (with basic splines and loess)smoothers. For these models, a formula is derivedfrom the data where "near zero variance" predictors(seenearZerVar) are excluded and predictors withless than 10 distinct values are entered as linear(i.e. unsmoothed) terms.

Changes in version 4.61

Changedearth fit for classification models to use theglm argument with a binomial family.
AddedvarImp.multinom, which is based on the absolutevalues of the model coefficients

Changes in version 4.60

The feature selection vignette was updated slightly (again).

Changes in version 4.59

Updatedrfe andsbf to include class probabilitiesin performance calculations.
Also, the names of the resampling indices were harmonizedacrosstrain,rfe andsbf.
The feature selection vignette was updated slightly.

Changes in version 4.58

Added the ability to include class probabilities inperformance calculations. SeetrainControl andtwoClassSummary.
Updated and restructured the main vignette.

Changes in version 4.57

Internal changes related to how predictions from models arestored and summarized. With the exception of loo, the modelperformance values are calculated by the workers instead ofthe main program. This should reduce i/o and lay somegroundwork for upcoming changes.
The default grid forrelaxo models were changed based onand initial model fit.
partDSA model predictions were modified; there were caseswhere the user might request X partitions, but the modelonly produced Y < X. In these cases, the partitions formissing models were replaced with the largest modelthat was fit.
The functionmodelLookup was put in the namespace anda man file was added.
The names of the resample indices are automaticallyreset, even if the user specified them.

Changes in version 4.56

Fixed a bug generated a few versions ago wherevarImpforplsda andfda objects crashed.

Changes in version 4.55

When computing the scale parameter for RBF kernels, theoption to automatically scale the data was changed toTRUE

Changes in version 4.54

Addedlogic.bagging inlogicFT withmethod = "logicBag"

Changes in version 4.53

Fixed a bug invarImp.train related to nearest shrunkencentroid models.
Added logic regression and logic forests

Changes in version 4.51

Added an option tosplom.resamples so that the variables in thescatter plots are models or metrics.

Changes in version 4.50

Addeddotplot.resamples plus acknowledgements to Hothorn et al.(2005) and Eugster et al. (2008)

Changes in version 4.49

Enhanced thetuneGrid option to allow a functionto be passed in.

Changes in version 4.48

Added aprcomp method for theresamples class

Changes in version 4.47

Extendedresamples to work withrfe andsbf

Changes in version 4.46

Cleaned up some of the man files for the resamples classand addedparallel.resamples.
Fixed a bug indiff.resamples where... werenot being passed to the test statistic function.
Added more log messages intrain when running verbose.
Added the German credit data set.

Changes in version 4.45

Added a general framework for bagging models via thebag function. Also, model type"hdda" from theHDclassif package was added.

Changes in version 4.44

Addedneuralnet,quantregForest andrda(fromrda) totrain. Since there is a namingconflict withrda frommda, therda model wasgiven a method value of"scrda".

Changes in version 4.43

Tthe resampling estimate of the standard deviation givenbytrain since v 4.39 was wrong
A new field was added tovarImp.mvr called"estimate". In cases where the mvr model had multipleestimates of performance (e.g. training set, CV, etc) the user cannow select which estimate they want to be used in the importancecalculation (thanks to Sophie Bréand for finding this)

Changes in version 4.42

Addedpredict.sbf and modified the structure ofthesbf helper functions. The"score" functiononly computes the metric used to filter and the filter function doesthe actual filtering. This was changed so that FDR corrections orother operations that use all of the p-values can be computed.
Also, the formatting of p-values inprint.confusionMatrixwas changed
An argument was added tomaxDissimso that the variable name is returned instead of the index.
Independent component analysis was added to the list ofpre-processing operations and a new model ("icr") wasadded to fit a pcr-like model with the ICA components.

Changes in version 4.40

Addedhda and cleaned up thecaret training vignette

Changes in version 4.39

Added several classes for examining the resampling results. Thereare methods for estimating pair-wise differences and latticefunctions for visualization. The training vignette has a newsection describing the new features.

Changes in version 4.38

AddedpartDSA andstepAIC for linear models andgeneralized linear models

Changes in version 4.37

Fixed a new bug in how resampling results are exported

Changes in version 4.36

Added penalized linear models from thefoba package

Changes in version 4.35

Addedrocc classification and fixed a typo.

Changes in version 4.34

Added two new data sets:dhfr andcars

Changes in version 4.33

Added GAMens (ensembles using gams)
Fixed a bug inroc that, for some data cases, would reverse the "positive"class and report sensitivity as specificity and vice-versa.

Changes in version 4.32

Added a parallel random forest method intrain using theforeach package.
Also added penalized logistic regression using theplr function in thestepPlr package.

Changes in version 4.31

Added a new feature selection function,sbf (for selection by filter).
Fixed bug inrfe that did not affect the results, but did producea warning.
A new model function,nullModel, was added. This model fits either themean only model for regression or the majority class model for classification.
Also, ldaFuncs had a bug fixed.
Minor changes to Rd files

Changes in version 4.30

For whatever reason, there is now a function in thespls packageby the name of splsda that does the same thing. A few functionsand a man page were changed to ensure backwards compatibility.

Changes in version 4.29

Added stepwise variable selection forlda andqda using thestepclass function inklaR

Changes in version 4.28

Added robust linear and quadratic discriminant analysis functionsfromrrcov.
Also added another column to the output ofextractProb andextractPrediction thatsaves the name of the model object so that you can have multiplemodels of the same type and tell which predictions came from whichmodel.
Changes were made toplotClassProbs: new parameters were addedand densityplots can now be produced.

Changes in version 4.27

AddednodeHarvest

Changes in version 4.26

Fixed a bug incaretFunc that led to NaN variable rankings, sothat the first k terms were always selected.

Changes in version 4.25

Added parallel processing functionality forrfe

Changes in version 4.24

Added the ability to use custom metrics withrfe

Changes in version 4.22

Many Rd changes to work with updated parser.

Changes in version 4.21

Re-saved data in more compressed format

Changes in version 4.20

Addedpcr as a method

Changes in version 4.19

Weights argument was added totrain for models that accept weights
Also, a bug was fixed for lasso regression (wrong lambdaspecification) and other for prediction in naive Bayes modelswith a single predictor.

Changes in version 4.18

Fixed bug in newnearZeroVar and updatedformat.earth so that itdoes not automatically print the formula

Changes in version 4.17

Added a new version ofnearZeroVar from Allan Engelhardt that ismuch faster

Changes in version 4.16

Fixed bugs inextractProb (for glmnet) andfilterVarImp.
For glmnet, the user can now pass in their own value of family totrain (otherwisetrain will set it depending on the mode of theoutcome). However, glmnet doesn't have much support for families atthis time, so you can't change links or try other distributions.

Changes in version 4.15

Fixed bug increateFolds when the smallest y value is more than 25of the data

Changes in version 4.14

Fixed bug inprint.train

Changes in version 4.13

Added vbmp fromvbmp package

Changes in version 4.12

Added additional error check toconfusionMatrix
Fixed an absurd typo inprint.confusionMatrix

Changes in version 4.11

Added: linear kernels for svm, rvm and Gaussian processes;rlm fromMASS; a knn regression model, knnreg
A set of functions (class "classDist") to computes the classcentroids and covariance matrix for a training set fordetermining Mahalanobis distances of samples to each classcentroid was added
a set of functions (rfe) for doing recursive feature selection(aka backwards selection). A new vignette was added for moredetails

Changes in version 4.10

AddedOneR andPART fromRWeka

Changes in version 4.09

Fixed error in documentation forconfusionMatrix. The old doc had"Detection Prevalence = A/(A+B)" and the new one has"Detection Prevalence =(A+B)(A+B+C+D)". The underlying code was correct.
Addedlars (fraction andstep as parameters)

Changes in version 4.08

Updatedtrain andbagEarth to allowearthfor classification models

Changes in version 4.07

Addedglmnet models

Changes in version 4.06

Added code for sparse PLS classification.
Fix a bug in prediction forcaTools::LogitBoost

Changes in version 4.05

Updated again for more stringent R CMD check tests in R-devel 2.9

Changes in version 4.04

Updated for more stringent R CMD check tests in R-devel 2.9

Changes in version 4.03

Significant internal changes were made to how the models arefit. Now, the function used to compute the models is passed in as aparameter (defaulting tolapply). In this way, users can usetheir own parallel processing software without new versions ofcaret. Examples are given intrain.
Also, fixed a bug where the MSE (instead of RMSE) was reportedfor random forest OOB resampling
There are more examples intrain.
Changes toconfusionMatrix,sensitivity,specificity and the predictive value functions: each was mademore generic with default andtable methods;confusionMatrix "extractor" functions for matrices and tableswere added; the pos/neg predicted value computations were changed toincorporate prevalence; prevalence was added as an option to severalfunctions; detection rate and prevalence statistics were added toconfusionMatrix; and the examples were expanded in the helpfiles.
This version of caret will break compatibility withcaretLSF andcaretNWS. However, these packages will not beneeded now and will be deprecated.

Changes in version 3.51

Updated the man files and manuals.

Changes in version 3.50

Addedqda,mda andpda.

Changes in version 3.49

Fixed bug inresampleHist. Also added a check in thetrain functionsthat error trapped withglm models and > 2 classes

Changes in version 3.48

Addedglms. Also, addedvarImp.bagEarth to thenamespace.

Changes in version 3.47

Addedsda from thesda package. There was a namingconflict betweensda::sda andsparseLDA:::sda. Themethod value forsparseLDA was changed from "sda" to"sparseLDA".

Changes in version 3.46

Addedspls from thespls package

Changes in version 3.45

Added caching ofRWeka objects to that they can be savedto the file system and used in other sessions. (changes per KurtHornik on 2008-10-05)

Changes in version 3.44

Addedsda from thesparseLDA package (not onCRAN).
Also, a bug was fixed where the ellipses were not passed into afew of the newer models (such aspenalized andppr)

Changes in version 3.43

Added the penalized model from thepenalized package. Incaret, it is regression only although the package allows forclassification via glm models. However, it does not allow the user topass the classes in (just an indicator matrix). Because of this, itdoesn't really work with the rest of the classification tools in thepackage.

Changes in version 3.42

Added a little more formatting toprint.train

Changes in version 3.41

Forgbm, let the user over-ride the default value of thedistribution argument (brought us by Peter Tait via RHelp).

Changes in version 3.40

Changedpredict.preProcess so that it doesn't crash ifnewdata does not have all of the variables used to originallypre-process *unless* PCA processing was requested.

Changes in version 3.39

Fixed bug invarImp.rpart when the model had only primarysplits.
Minor changes to the Affy normalization code
Changed typo inpredictors man page

Changes in version 3.38

Added a new class calledpredictors that returns thenames of the predictors that were used in the final model.
Also addedppr from thestats package.
Minor update to the project web page to deal with IE issues

Changes in version 3.37

Added the ability oftrain to use custom made performancefunctions so that the tuning parameters can be chosen on the basis ofthings other than RMSE/R-squared and Accuracy/Kappa.
A new argument was added totrainControl called"summaryFunction" that is used to specify the function used tocompute performance metrics. The default function preserves thefunctionality prior to this new version
a new argument totrain is "maximize" which is a logicalfor whether the performance measure specified in the "metric"argument totrain should be maximized or minimized.
The selection function specified intrainControl carriesthe maximize argument with it so that customized performancemetrics can be used.
A bug was fixed inconfusionMatrix (thanks to GaborGrothendieck)
Another bug was fixed related to predictions from least squareSVMs

Changes in version 3.36

Addedsuperpc from thesuperpc package. One note:thedata argument that is passed tosuperpc is saved inthe object that results fromsuperpc.train. This is used laterin the prediction function.

Changes in version 3.35

Addedslda fromipred.

Changes in version 3.34

Fixed a few bugs related to the lattice plots from version 3.33.
Also added the ripper (akaJRip) and logistic model treesfromRWeka

Changes in version 3.33

Addedxyplot.train,densityplot.train,histogram.train andstripplot.train. These are allfunctions to plot the resampling points. There is some overlap betweenthese functions,plot.train andresampleHist.plot.train gives the average metrics onlywhile these plot all of the resampled performancemetrics.resampleHist could plot all of the points, but onlyfor the final optimal set of predictors.
To use these functions, there is a new argument intrainControl calledreturnResamp which should havevalues "none", "final" and "all". The default is "final" to beconsistent with previous versions, but "all" should be specified touse these new functions to their fullest.

Changes in version 3.32

The functionspredict.train andpredict.list wereadded to use as alternatives to theextractPrediction andextractProbs functions.
Added C4.5 (akaJ48) and rules-based models (M5 prime) fromRWeka.
Also addedlogitBoost from thecaToolspackage. This package doesn't have a namespace andRWeka has afunction with the same name. It was suggested to use the "::" prefixto differentiate them (but we'll see how this works).