Fixes issue with running predictions for Decision Trees in Spark(#1309)
Updates to some boosting tuning parameter information:(#1306)
max_leaves andl2_leaf_reg.Enable generalized random forest (grf) models forclassification, regression, and quantile regression modes.(#1288)
surv_reg() is now defunct and will error if called.Please usesurvival_reg() instead (#1206).
Enable parsnip to work with xgboost version > 2.0.0.0.(#1227)
Bug fix in how tunable parameters were configured for bruleeneural networks.
A change to make linear SVM models more quiet.
A few default parameter ranges were changed for brulee neuralnetwork models.
Switch to base R pipe
Requires changes for CRAN’s “No Suggests” check.
Avoid issues with reading from package files. (#1271)
A new model mode ("quantile regression") was added.Including:
linear_reg() engine for"quantreg".?set_mode.Updates for sparse data formats:
fit_xy() can now take dgCMatrix input forx argument (#1121).fit_xy() can now take sparse tibbles as data values(#1165).predict() can now take dgCMatrix and sparse tibbleinput fornew_data argument, and error informatively whenmodel doesn’t support it (#1167).Newextract_fit_time() method has been added thatreturns the time it took to train the model (#853).
mlp() withkeras engine now work forall activation functions currently supported bykeras(#1127).
mlp() now has abrulee_two_layerengine.
Transitioned package errors and warnings to use cli (#1147 and#1148 by@shum461,#1153 by@RobLBakerand@wright13, #1154by@JamesHWade,#1160, #1161, #1081).
fit_xy() currently raises an error forgen_additive_mod() model specifications as the defaultengine ("mgcv") specifies smoothing terms in modelformulas. However, some engines specify smooths via additionalarguments, in which case the restriction onfit_xy() isexcessive. parsnip will now only raise an error when fitting agen_additive_mod() withfit_xy() when usingthe"mgcv" engine (#775).
Alignednull_model() with other model types; themodel type now has an engine argument that defaults to"parsnip" and is checked with the same machinery thatchecks other model types in the package (#1083).
If linear regression is requested with a Poisson family, an errorwill occur and refer the user topoisson_reg()(#1219).
The deprecated functionrpart_train() was removedafter its deprecation period (#1044).
Make sure that parsnip does not convert ordered factorpredictions to be unordered.
Ensure thatknit_engine_docs() has the requiredpackages installed (#1156).
Fixed bug where some models fit usingfit_xy()couldn’t predict (#1166).
Fixed bug related to using local (non-package) models(#1229)
tunable() now references a dials object for themixture parameter (#1236)
For quantile prediction, thequantile argument topredict() has been deprecate in facor ofquantile_levels. This does not affect models with mode"quantile regression".
The quantile regression prediction type was disabled for thedeprecatedsurv_reg() model.
NULL is no longer accepted as an engine(#1242).
Added a missingtidy() method for survival analysisglmnet models (#1086).
A few changes were made to achive more speed-ups (#1075) (#1073)(#1072)
Tightened logic for outcome checking. This resolves issues—someerrors and some silent failures—when atomic outcome variables have anattribute (#1060, #1061).
Fixed bug in fitting some model types with the"spark" engine (#1045).
Fixed issues in metadata for the"brulee" enginewhere several arguments were mistakenly protected. (#1050,#1054)
Fixed documentation formlp(engine = "brulee"): thedefault values forlearn_rate andepochs wereswapped (#1018).
Fixed a bug in the integration with workflows where using a modelformula with a formula preprocessor could result in a double intercept(#1033).
We no longer addeval_time arguments to theprediction specification for the engine (#1039).
parsnip now lets the engines for [mlp()] check for acceptablevalues of the activation function (#1019)
rpart_train() has been deprecated in favor of usingdecision_tree() with the"rpart" engine orrpart::rpart() directly (#1044).
.filter_eval_time() was moved to the survivalstandalone file.
Improved errors and documentation related to special terms informulas. See?model_formula to learn more. (#770,#1014)
Improved errors in cases where the outcome column ismis-specified. (#1003)
Thenew_data argument for thepredict()method forcensoring_model_reverse_km objects has beendeprecated (#965).
When computing censoring weights, the resulting vectors are nolonger named (#1023).
Thepredict() method forcensoring_model_reverse_km objects now checks that... are empty (#1029).
Fixed bug where prediction on rank deficientlm()models produced.pred_res instead of.pred.(#985)
Fixed bug where sparse data was being coerced to non-sparseformat doingpredict().
For BART models with thedbarts engine,predict() can now also return the standard error forconfidence and prediction intervals (#976).
augment() now works for censored regressionmodels.
A few censored regression helper functions were exported:.extract_surv_status() and.extract_surv_time() (#973, #980).
Fixed bug whereboost_tree() models couldn’t be fitwith 1 predictor ifvalidation argument was used.(#994)
This release of parsnip contains a number of new features and bugfixes, accompanied by several optimizations that substantially decreasethe time tofit() andpredict() with thepackage.
"glmnet" engine interfacesglmnet models fitted with base-R family objects are now supportedforlinear_reg(),logistic_reg(), andmultinomial_reg() (#890).
multi_predict() methods forlinear_reg(),logistic_reg(), andmultinom_reg() models fitted with the"glmnet"engine now check thetype better and error accordingly(#900).
.organize_glmnet_pred() now expects predictions fora single penalty value (#876).
Thetime argument topredict_survival()andpredict_hazard() is deprecated in favor of the neweval_time argument (#936).
Added several internal functions (to help work withSurv objects) as a standalone file that can be used inother packages viausethis::use_standalone("tidymodels/parsnip"). Thesechanges provide tooling for downstream packages to handle inverseprobability censoring weights (#893, #897, #937).
An internal method for generating inverse probability ofcensoring weights (IPCW) of Grafet al (1999) is available via.censoring_weights_graf().
Madefit() behave consistently with respect tomissingness in the classification setting. Previously,fit() erroneously raised an error about the class of theoutcome when there were no complete cases, and now always passes alongcomplete cases to be handled by the modeling function (#888).
Fixed bug where model fits withengine = "earth"would fail when the package’s namespace hadn’t been attached(#251).
Fixed bug where model fits with factor predictors andengine = "kknn" would fail when the package’s namespacehadn’t been attached (#264).
Fixed bug with prediction from a boosted tree model fitted with"xgboost" using a custom objective function(#875).
Implemented a number of optimizations in parsnip’s backend thatsubstantiallydecrease evaluation time tofit() andpredict() (#901, #902, #910, #921, #929, #923, #931, #932,#933).
logistic_reg() will now warn atfit()when the outcome has more than two levels (#545).
Rather than being implemented in each method, the check for thenew_data argument being mistakenly passed asnewdata tomulti_predict() now happens in thegeneric. Packages re-exporting themulti_predict() genericand implementing now-duplicate checks may see new failures and canremove their own analogous checks. This check already existed in allpredict() methods (viapredict.model_fit())and all parsnipmulti_predict() methods (#525).
Functions now indicate what class the outcome was if the outcomeis the wrong class (#887).
The minimum version for R is now 3.5 (#926).
Moved forward with the deprecation ofreq_pkgs() infavor ofrequired_pkgs(). The function will now error(#871).
Transitioned all soft-deprecations that were at least a year oldto warn-deprecations. These changes apply tofit_control(),surv_reg(),varying(),varying_args(), and the"liquidSVM"engine.
Various bug fixes and improvements to documentation.
For censored regression models, a “reverse Kaplan-Meier” curve iscomputed for the censoring distribution. This can be used whenevaluating this type of model (#855).
The model specification methods forgenerics::tune_args() andgenerics::tunable()are now registered unconditionally (tidymodels/workflows#192).
Adds documentation and tuning infrastructure for the newflexsurvspline engine for thesurvival_reg()model specification from thecensored package (
The matrix interface for fittingfit_xy() now worksfor the"censored regression" mode (#829).
Thenum_leaves argument ofboost_tree()slightgbm engine (via the bonsaipackage) is now tunable.
A change in our data checking code resulted in about a 3-foldspeed-up in parsnip (#835)
A bagged neural network model was added (bag_mlp()).Engine implementations will live in the baguette package.
Fixed installation failures due to undocumented knitrinstallation dependency (#785).
fit_xy() now fails when the model mode isunknown.
brulee engine-specific tuning parameters were updated. Thesechanges can be used with dials version > 1.0.0.
fit() andfit_xy() doesn’t erroranymore ifcontrol argument isn’t acontrol_parsnip() object. Will work as long as the objectpassed tocontrol includes the same elements ascontrol_parsnip().
Improved prompts related to missing (or not loaded) extensionpackages as well as better handling of model mode conflicts.
boost_tree() engine. To supply engine-specific argumentsthat are documented inxgboost::xgb.train() as arguments tobe passed viaparams, supply the list elements directly asnamed arguments toset_engine(). Read more in?details_boost_tree_xgboost (#787).Enable the use of case weights for models that supportthem.
show_model_info() now indicates which models canutilize case weights.
Model type functions will now message informatively if a neededparsnip extension package is not loaded (#731).
Refactored internals of model specification printing functions.These changes are non-breaking for extension packages, but the newprint_model_spec() helper is exported for use in extensionsif desired (#739).
Fixed bug where previously set engine arguments would propagatethroughupdate() methods despitefresh = TRUE(#704).
Fixed a bug where an error would be thrown if arguments to modelfunctions were namespaced (#745).
predict(type = "prob") will now provide an error ifthe outcome variable has a level called"class"(#720).
An inconsistency for probability type predictions for two-classGAM models was fixed (#708)
Fixed translated printing fornull_model()(#752)
Added aglm_grouped() function to convert long datato the grouped format required byglm() for logisticregression.
xgb_train() now allows for case weights
Addedctree_train() andcforest_train()wrappers for the functions in the partykit package. Engines for thesewill be added to other parsnip extension packages.
Exportedxgb_predict() which wraps xgboost’spredict() method for use with parsnip extension packages(#688).
Added a developer function,.model_param_name_keythat translates names of tuning parameters.
Fixed a major bug in spark models induced in the previous version(#671).
Updated the parsnip add-in with new models and engines.
Updated parameter ranges for sometunable() methodsand added a missing engine argument for brulee models.
Added information about how to install the mixOmics package forPLS models (#680)
Bayesian additive regression trees (BART) were added via thebart() function.
Added the"glm" engine forlinear_reg()for numeric outcomes (#624).
Addedbrulee engines forlinear_reg(),logistic_reg(),multinom_reg() andmlp().
A bug for class predictions of two-class GAM models was fixed(#541)
Fixed a bug forlogistic_reg() with the LiblineaRengine (#552).
The list column produced when creating survival probabilitypredictions is now always called.pred (with.pred_survival being used inside of the listcolumn).
Fixed outcome type checking affecting a subset of regressionmodels (#625).
Prediction usingmultinom_reg() with thennet engine with a single row no longer fails(#612).
When the xy interface is used and the underlying model expects touse a matrix, a better warning is issued when predictors containnon-numeric columns (including dates).
The fit time is only calculated when theverbosityargument ofcontrol_parsnip() is 2L or greater. Also, thecall tosystem.time() now usesgcFirst = FALSE. (#611)
fit_control() is soft-deprecated in favor ofcontrol_parsnip().
Newextract_parameter_set_dials() method to extractparameter sets from model specs.
Newextract_parameter_dials() method to extract asingle parameter from model specs.
Argumentinterval was added for prediction: Fortypes"survival" and"quantile", estimates forthe confidence or prediction interval can be added if available(#615).
set_dependency() now allows developers to createpackage requirements that are specific to the model’s mode(#604).
varying() is soft-deprecated in favor oftune().
varying_args() is soft-deprecated in favor oftune_args().
Anautoplot() method was added for glmnet objects,showing the coefficient paths versus the penalty values (#642).
parsnip is now more robust working with keras and tensorflow fora larger range of versions (#596).
xgboost engines now use the newiterationrangeparameter instead of the deprecatedntreelimit(#656).
devtools::load_all() (#653).A model function (gen_additive_mod()) was added forgeneralized additive models.
Each model now has a default engine that is used when the modelis defined. The default for each model is listed in the help documents.This also adds functionality to declare an engine in the modelspecification function.set_engine() is still required ifengine-specific arguments need to be added. (#513)
parsnip now checks for a valid combination of engine and mode(#529)
The default engine formultinom_reg() was changed tonnet.
The helper functions.convert_form_to_xy_fit(),.convert_form_to_xy_new(),.convert_xy_to_form_fit(), and.convert_xy_to_form_new() for converting between formulaand matrix interface are now exported for developer use (#508).
Fix bug inaugment() when non-predictor, non-outcomevariables are included in data (#510).
New article “Fitting and Predicting with parsnip” which containsexamples for various combinations of model type and engine. (#527)
A new linear SVM modelsvm_linear() is now availablewith theLiblineaR engine (#424) and thekernlab engine (#438), and theLiblineaRengine is available forlogistic_reg() as well (#429).These models can use sparse matrices viafit_xy() (#447)and have atidy method (#474).
For models withglmnet engines:
penalty (either a singlenumeric value or a value oftune()) (#481).path_values can be used toset thelambda path as a specific set of numbers(independent of the value ofpenalty). A pure ridgeregression models (i.e.,mixture = 1) will generateincorrect values if the path does not include zero. See issue #431 fordiscussion (#486).TheliquidSVM engine forsvm_rbf() wasdeprecated due to that package’s removal from CRAN. (#425)
The xgboost engine for boosted trees was translatingmtry to xgboost’scolsample_bytree. We now mapmtry tocolsample_bynode since that is moreconsistent with how random forest works.colsample_bytreecan still be optimized by passing it in as an engine argument.colsample_bynode was added to xgboost after theparsnip package code was written. (#495)
For xgboost,mtry andcolsample_bytreecan be passed as integer counts or proportions, whilesubsample andvalidation should always beproportions.xgb_train() now has a new optioncounts (TRUE orFALSE) thatstates which scale formtry andcolsample_bytree is being used. (#461)
Re-licensed package from GPL-2 to MIT. Seeconsent fromcopyright holders here.
set_mode() now checks ifmode iscompatible with the model class, similar tonew_model_spec() (set_mode() andset_engine() now error forNULL or missing arguments (#503).
Re-organized model documentation:
update methods were moved out of the model help files(#479).generics::required_pkgs() was extended forparsnip objects.
Prediction functions now give a consistent error when a user usesan unavailable value oftype (#489)
Theaugment() method was changed to avoid failing ifthe model does not enable class probabilities. The method now returnstibbles despite the input data class (#487) (#478)
xgboost engines now respect theevent_level optionfor predictions (#460).
An RStudio add-in is available that makes writing multipleparsnip model specifications to the source window. It canbe accessed via the IDE addin menus or by callingparsnip_addin().
Forxgboost models, users can now passobjective toset_engine("xgboost").(#403)
Changes to test for cases when CRAN cannot getxgboost to work on their Solaris configuration.
There is now anaugument() method for fitted models.Seeaugment.model_fit. (#401)
Column names forx are now required whenfit_xy() is used. (#398)
There is now anevent_level argument for thexgboost engine. (#420)
New mode “censored regression” and new prediction types“linear_pred”, “time”, “survival”, “hazard”. (#396)
Censored regression models cannot usefit_xy() (usefit()). (#442)
show_engines() will provide information on thecurrent set for a model.
For three models (glmnet,xgboost, andranger), enable sparse matrix use viafit_xy()(#373).
Some added protections were added for function arguments that aredependent on the data dimensions (e.g.,mtry,neighbors,min_n, etc). (#184)
Infrastructure was improved for runningparsnipmodels in parallel using PSOCK clusters on Windows.
Aglance() method formodel_fit objectswas added (#325)
Specifictidy() methods forglmnetmodels fit viaparsnip were created so that thecoefficients for the specific fittedparsnip model arereturned.
glmnet models were fitting two intercepts(#349)
The variousupdate() methods now work withengine-specific parameters.
parsnip now has options to set specific types ofpredictor encodings for different models. For example,ranger models run usingparsnip andworkflows do the same thing bynot creatingindicator variables. These encodings can be overridden using theblueprint options inworkflows. As aconsequence, it is possible to get a different model fit that previousversions ofparsnip. More details about specific encodingchanges are below. (#326)tidyr >= 1.0.0 is now required.
SVM models produced bykernlab now use the formulamethod (see breaking change notice above). This change was due to howksvm() made indicator variables for factor predictors (withone-hot encodings). Since the ordinary formula method did not do this,the data are passed as-is toksvm() so that the results arecloser to what one would get ifksmv() were calleddirectly.
MARS models produced byearth now use the formulamethod.
Forxgboost, a one-hot encoding is used whenindicator variables are created.
Under-the-hood changes were made so that non-standard dataarguments in the modeling packages can be accommodated. (#315)
A new main argument was added toboost_tree() calledstop_iter for early stopping. Thexgb_train()function gained arguments for early stopping and a percentage of data toleave out for a validation set.
Iffit() is used and the underlying model uses aformula, theactual formula is pass to the model (instead of aplaceholder). This makes the model call better.
A function namedrepair_call() was added. This canhelp change the underlying modelscall object to betterreflect what they would have obtained if the model function had beenused directly (instead of viaparsnip). This is only usefulwhen the user chooses a formula interface and the model uses a formulainterface. It will also be of limited use when a recipes is used toconstruct the feature set inworkflows ortune.
Thepredict() function now checks to see if requiredmodeling packages are installed. The packages are loaded (but notattached). (#249) (#308) (tidymodels/workflows#45)
The functionreq_pkgs() is a user interface todetermining the required packages. (#308)
liquidSVM was added as an engine forsvm_rbf() (#300)tidy() was broken on R 4.0.glmnet was removed as a dependency since the newversion depends on 3.6.0 or greater. Keeping it would constrainparsnip to that same requirement. Allglmnettests are run locally.
A set of internal functions are now exported. These are helpfulwhen creating a new package that registers new modelspecifications.
nnet was added as an engine tomultinom_reg()#209parsnip and the underlying model function) forspark boosted trees and somekeras models. See897c927.The time elapsed during model fitting is stored in the$elapsed slot of the parsnip model object, and is printedwhen the model object is printed.
Some default parameter ranges were updated for SVM, KNN, and MARSmodels.
The modeludpate() methods gained aparameters argument for cases when the parameters arecontained in a tibble or list.
fit_control() is soft-deprecated in favor ofcontrol_parsnip().
Abug was fixed standardizing the output column types ofmulti_predict andpredict formultinom_reg.
Abug was fixed related to using data descriptors andfit_xy().
A bug was fixed related to the column names generated bymulti_predict(). The top-level tibble will always have acolumn named.pred and this list column contains tibblesacross sub-models. The column names for these sub-model tibbles willhave names consistent withpredict() (which was previouslyincorrect). See43c15db.
Abug was fixed standardizing the column names ofnnetclass probability predictions.
Test case update due to CRAN running extra tests(#202)
Unplanned release based on CRAN requirements for Solaris.
The method thatparsnip stores the model informationhas changed. Any custom models from previous versions will need to usethe new method for registering models. The methods are detailed in?get_model_env and thepackagevignette for adding models.
The mode needs to be declared for models that can be used formore than one mode prior to fitting and/or translation.
Forsurv_reg(), the engine that uses thesurvival package is now calledsurvivalinstead ofsurvreg.
Forglmnet models, the full regularization path isalways fit regardless of the value given topenalty.Previously, the model was fit with passingpenalty toglmnet’slambda argument and the model couldonly make predictions at those specific values.(#195)
add_rowindex() can create a column called.row to a data frame.
If a computational engine is not explicitly set, a default willbe used. Each default is documented on the corresponding model page. Awarning is issued at fit time unless verbosity is zero.
nearest_neighbor() gained amulti_predict method. Themulti_predict()documentation is a little better organized.
A suite of internal functions were added to help with upcomingmodel tuning features.
Aparsnip object always saved the name(s) of theoutcome variable(s) for proper naming of the predicted values.
Small release driven by changes insample() in thecurrent r-devel.
A “null model” is now available that fits a predictor-free model(using the mean of the outcome for regression or the mode forclassification).
fit_xy() can take a single column data frame ormatrix fory without error
varying_args() now has afull argumentto control whether the full set of possible varying arguments isreturned (as opposed to only the arguments that are actuallyvarying).
fit_control() not returns an S3 method.
For classification models, an error occurs if the outcome dataare not encoded as factors (#115).
The prediction modules (e.g. predict_class,predict_numeric, etc) were de-exported. These were internalfunctions that were not to be used by the users and the users were usingthem.
An event time data set (check_times) was includedthat is the time (in seconds) to runR CMD check using the“r-devel-windows-ix86+x86_64` flavor. Packages that errored arecensored.
varying_args() now uses the version from thegenerics package. This means that the first argument,x, has been renamed toobject to align withgenerics.
For the recipes step method ofvarying_args(), thereis now error checking to catch if a user tries to specify an argumentthatcannot be varying as varying (for example, theid) (#132).
find_varying(), the internal function for detectingvarying arguments, now returns correct results when a size 0 argument isprovided. It can also now detect varying arguments nested deeply into acall (#131, #134).
For multinomial regression, the.pred_ prefix is nowonly added to prediction column names once (#107).
For multinomial regression using glmnet,multi_predict() now pulls the correct default penalty(#108).
Confidence and prediction intervals for logistic regression wereonly computed the intervals for a single level. Both are now computed.(#156)
First CRAN release
set_engine(). There is noengine argumentothers has been replaced by...regularization was changed topenalty in afew models to be consistent withthischange.predict methods, theearthpackage will need to be attached to be fully operational.snake_case,newdatawas changed tonew_data.predict_raw method was added.fit interface was previously used to cover both thex/y interface as well as the formula interface. Now,fit()is the formula interface andfit_xy()is for the x/y interface.NEWS.md file to track changes to thepackage.predict methods wereoverhauled tobeconsistent.