int_pctl() wouldn’t work onlast_fit() outcomes when future parallelism was enabled.(#1099)tune_grid().A major rewrite/refactor of the underlying code that runstune_grid(). This was an upgrade to add postprocessing andto modernize our parallel processing infrastructure.
The pattern of.config values has changed.
Preprocessor{num}_Model{num} topre{num}_mod{num}_post{num}. The numbers include a zerowhen that element was static. For example, a value ofpre0_mod3_post4 means no preprocessors were tuned and themodel and postprocessor(s) had at least three and four candidates,respectively.iter{num}instead ofIter{num} and the numbers are now zero padded tosort better. For example, if there between 10 and 99 iterations, thefirst.config value is nowiter01 instead ofIter1.The package will now log a backtrace for errors and warnings thatoccur during tuning. When a tuning process encounters issues, see thenewtrace column in thecollect_notes(.Last.tune.result) output to find preciselywhere the error occurred (#873).
Postprocessors can now be tuned. Currently, we support the tailorpackage.
Introduced support for parallel processing with mirai in additionto the currently supported framework future. See?parallelism to learn more (#1028).
Sequential and parallel processing all use the same L’Ecuyer-CMRGseeds (conditional onparallel_over) (#1033).
Theforeach package is no longer supported. Instead,use the future or mirai packages.
The parallel backend(s) and the methods of constructing seeds forworkers have changed. There will be a lack of reproducibility betweenobjects created in this version of tune and previous versions.
int_pctl() now includes an option(keep_replicates) to retain the individual bootstrapestimates. It also processes the resamples more efficiently(#1000).
Amin_grid() methods was added forproportional_hazards models so that their submodels areprocessed appropriately.
Post-processing: newschedule_grid() for schedulinga grid including post-processing (#988).
Removed functions deprecated since tune version 1.6.0 (circa2021-07-21).
The package will now warn when parallel processing has beenenabled with foreach but not with future. See?parallelismto learn more about transitioning your code to future (#878, #866). Thenext version of tune will move to a pure future implementation.
When automatic grids are used,dials::grid_space_filling() is now used (instead ofdials::grid_latin_hypercube()). Overall, the new functionproduces optimized designs (not depending on random numbers). When usingBayesian models, we will use a Latin Hypercube since we produce 5,000candidates, which is too slow to do with pre-optimized designs.
Addressed issue inint_pctl() where the functionwould error when parallelized usingmakePSOCKcluster()(#885).
Addressed issue where tuning functions would raise the errorobject 'iteration' not found withplan(multisession) and the control optionparallel_over = "everything" (#888).
tune now fully supports models in the “censored regression” mode.These models can be fit, tuned, and evaluated like the regression andclassification modes.tidymodels.orghas more information and tutorials on how to work with survival analysismodels.
Introduced support for parallel processing using thefuture framework. The tunepackage previously supported parallelism with foreach, and users can useeither framework for now. In a future release, tune will begin thedeprecation cycle for parallelism with foreach, so we encourage users tobegin migrating their code now. SeetheParallel Processing section in the “Optimizations” articleto learn more (#866).
Added atype argument tocollect_metrics() to indicate the desired output format.The default,type = "long", returns output as before, whiletype = "wide" pivots the output such that each metric hasits own column (#839).
Added a new function,compute_metrics(), that allowsfor computing new metrics after evaluating against resamples. Thearguments and output formats are closely related to those fromcollect_metrics(), but this function requires that theinput be generated with the control optionsave_pred = TRUEand additionally takes ametrics argument with a metric setfor new metrics to compute. This allows for computing new performancemetrics without requiring users to re-fit and re-predict from each model(#663).
A method for rsample’sint_pctl() function that willcompute percentile confidence intervals on performance metrics forobjects produced byfit_resamples(),tune_*(),andlast_fit().
The Brier score is now part of the default metric set forclassification models.
last_fit() will now error when supplied a fittedworkflow (#678).
Fixes bug where.notes entries were sorted in thewrong order in tuning results for resampling schemes with IDs thataren’t already in alphabetical order (#728).
Fixes bug where.config entries in the.extracts column intune_bayes() output didn’talign with the entries they ought to in the.metrics and.predictions columns (#715).
Metrics from apparent resamples are no longer included whenestimating performance withestimate_tune_results() (andthus withcollect_metrics(..., summarize = TRUE) andcompute_metrics(..., summarize = TRUE), #714).
Handles edge cases fortune_bayes()’iter argument more soundly. Foriter = 0, theoutput oftune_bayes() should matchtune_grid(), andtune_bayes() will now errorwheniter < 0.tune_bayes() will now alterthe state of RNG slightly differently, resulting in changed Bayesianoptimization search output (#720).
augment() methods totune_results,resample_results, andlast_fit objects nowalways return tibbles (#759).
Improved error message when needed packages aren’t installed(#727).
augment() methods totune_results,resample_results, andlast_fit objects nowalways returns tibbles (#759).
Improves documentation related to the hyperparameters associatedwith extracted objects that are generated from submodels. See the“Extracting with submodels” section of?collect_extracts tolearn more.
eval_time andeval_time_targetattribute was added to tune objects. There are also.get_tune_eval_times() and.get_tune_eval_time_target() functions.
collect_predictions() now reorders the columns sothat all prediction columns come first (#798).
augment() methods totune_results,resample_results, andlast_fit objects nowreturn prediction results in the first columns (#761).
autoplot() will now meaningfully error if only 1grid point is present, rather than producing a plot (#775).
Added notes on case weight usage to several functions(#805).
For iterative optimization routines,autoplot() willuse integer breaks whentype = "performance" ortype = "parameters".
Several functions gained aneval_time argument forthe evaluation time of dynamic metrics for censored regression. Theplacement of the argument breaks passing-by-position for one or moreother arguments toautoplot.tune_results() and thedeveloper-focusedcheck_initial() (#857).
Ellipses (…) are now used consistently in the package to requireoptional arguments to be named. For functions that previously hadellipses at the end of the function signature, they have been moved tofollow the last argument without a default value: this applies toaugment.tune_results(),collect_predictions.tune_results(),collect_metrics.tune_results(),select_best.tune_results(),show_best.tune_results(), and the developer-focusedestimate_tune_results(),load_pkgs(), andencode_set(). Several other functions that previously didnot have ellipses in their signatures gained them: this applies toconf_mat_resampled() and the developer-focusedcheck_workflow(). Optional arguments previously passed byposition will now error informatively prompting them to be named. Thesechanges don’t apply in cases when the ellipses are currently in use toforward arguments to other functions (#863).
last_fit() now works with the 3-way validation splitobjects fromrsample::initial_validation_split().last_fit() andfit_best() now have a newargumentadd_validation_set to include or exclude thevalidation set in the dataset used to fit the model (#701).
Disambiguates theverbose andverbose_iter control options to better align withdocumented functionality. The former controls logging for generalprogress updates, while the latter only does so for the Bayesian searchprocess. (#682)
collect_()functions where the.iter column was dropped.tune 1.1.0 introduces a number of new features and bug fixes,accompanied by various optimizations that substantially decrease thetotal evaluation time to tune hyperparameters in the tidymodels.
Introduced a new functionfit_best() that provides ashorthand interface to fit a final model after parameter tuning.(#586)
Refined machinery for logging issues during tuning. Rather thanprinting out warnings and errors as they appear, the package will nowonly print unique tuning issues, updating a dynamic summary message thatmaintains counts of each unique issue. This feature is only enabled fortuning sequentially and can be manually toggled with theverbose option. (#588)
Introducedcollect_extracts(), a function forcollecting extracted objects from tuning results. The format of resultsclosely mirrorscollect_notes(), where the extractedobjects are contained in a list-column alongside the resample ID andworkflow.config. (#579)
Fixed bug inselect_by_pct_loss() where the modelwith the greatest loss within the limit was returned rather than themost simple model whose loss was within the limit. (#543)
Fixed bug intune_bayes() where.Last.tune.result would return intermediate tuning results.(#613)
Extendedshow_best(),select_best(),select_by_one_std_error(),select_by_pct_loss() to accommodate metrics with a targetvalue of zero (notably,yardstick::mpe() andyardstick::msd()). (#243)
Implemented various optimizations in tune’s backend thatsubstantiallydecrease the total evaluation time to tune hyperparameters with thetidymodels. (#634, #635, #636, #637, #640, #641, #642, #648, #649, #653,#656, #657)
Allowed users to supply list-columns ingridarguments. This change allows for manually specifying grid values thatmust be contained in list-columns, like functions or lists.(#625)
Clarified error messages inselect_by_* functions.Error messages now only note entries in... that are likelycandidates for failure toarrange(), and those errormessages are no longer duplicated for each entry in....
Improved condition handling for errors that occur duringextraction from workflows. While messages and warnings wereappropriately handled, errors occurring due to misspecifiedextract() functions being supplied tocontrol_*() functions were silently caught. As withwarnings, errors are now surfaced both during execution and atprint() (#575).
Moved forward with the deprecation ofparameters()methods forworkflows,model_specs, andrecipes. Each of these methods will now warn on every usageand will be defunct in a later release of the package. (#650)
Various bug fixes and improvements to documentation.
last_fit(),fit_resamples(),tune_grid(), andtune_bayes() do notautomatically error if the wrong type ofcontrol object ispassed. If the passed control object is not a superset of the one thatis needed, the function will still error. As an example, passingcontrol_grid() totune_bayes() will fail butpassingcontrol_bayes() totune_grid() willnot. (#449)
Thecollect_metrics() method for racing objects wasremoved (and is now in the finetune package).
Improved prompts related to parameter tuning. When tuningparameters are supplied that are not compatible with the given engine,tune_*() functions will now error. (#549)
Thecontrol_bayes() got a new argumentverbose_iter that is used to control the verbosity of theBayesian calculations. This change means that theverboseargument is being passed totune_grid() to control itsverbosity.
Thecontrol_last_fit() function gained an argumentallow_par that defaults toFALSE. This changeaddresses failures afterlast_fit() using modeling enginesthat require native serialization, and we anticipate little to noincrease in time-to-fit resulting from this change. (#539,tidymodels/bonsai#52)
show_notes() does a better jobs of… showing notes.(#558)
show_notes() is a new function that can better helpunderstand warnings and errors.
Logging that occurs using the tuning and resampling functions nowshow multi-line error messages and warnings in multiple lines.
Whenfit_resamples(),last_fit(),tune_grid(), ortune_bayes() complete withouterror (even if models fail), the results arealso available via.Last.tune.result.
last_fit() now accepts acontrolargument to allow users to control aspects of the last fitting processviacontrol_last_fit() (#399).
Case weights are enabled for models that can use them.
Some internal functions were exported for use by otherpackages.
A check was added tofit_resamples() andlast_fit() to give a more informative error message when apreprocessor or model have parameters marked for tuning.
outcome_names() works correctly when recipe has NAroles. (#518)
The.notes column now contains information on thetype of note (error or warning), the location where it occurred, and thenote. Printing a tune result has different output describing thenotes.
collect_notes() can be used to gather any notes to atibble. (#363)
Parallel processing with PSOCK clusters is now more efficient,due to carefully avoiding sending extraneous information to each worker(#384, #396).
The engine arguments for xgboostalpha,lambda, andscale_pos_weight are nowtunable.
When the Bayesian optimization data contain missing values, theseare removed before fitting the GP model. If all metrics are missing, noGP is fit and the current results are returned. (#432)
Movedtune() from tune to hardhat (#442).
Theparameters() methods forrecipe,model_spec, andworkflow objects have beensoft-deprecated in favor ofextract_parameter_set_dials()methods (#428).
When usingload_pkgs(), packages that use randomnumbers on start-up do not affect the state of the RNG. We also addedmore control of the RNGkind to make it consistent with the user’sprevious value (#389).
Newextract_*() functions have been added thatsupersede many of the the existingpull_*() functions. Thisis part of a larger move across the tidymodels packages towards a familyof genericextract_*() functions. Manypull_*() functions have been soft-deprecated, and willeventually be removed. (#378)
Fixed a bug where the resampled confusion matrix is transposedwhenconf_mat_resamped(tidy = FALSE) (#372)
False positive warnings no longer occur when using thedoFuture package for parallel processing (#377)
Fixed an issue infinalize_recipe() which failedduring tuning of recipe steps that contain multipletune()parameters in an single step.
Changedconf_mat_resampled() to return the same typeof object asyardstick::conf_mat() whentidy = FALSE (#370).
The automatic parameter machinery forsample_sizewith the C5.0 engine was changes to usedials::sample_prop().
Thersample::pretty() methods were extended totune_results objects.
Addedpillar methods for formattingtune objects in list columns.
A method for.get_fingerprint() was added. Thishelps determine iftune objects used the sameresamples.
collect_predictions() was made generic.
The default tuning parameter for the SVM polynomial degree wasswitched fromdials::degree() todials::prod_degree() since it must be an integer.
last_fit() andworkflows::fit() willnow give identical results for the same workflow when the underlyingmodel uses random number generation (#300).
Fixed an issue where recipe tuning parameters could be randomlymatched to the tuning grid incorrectly (#316).
last_fit() no longer accidentally adjusts the randomseed (#264).
Fixed two bugs in the acquisition function calculations.
Newparallel_over control argument to adjust theparallel processing method that tune uses.
The.config column that appears in the returnedtibble from tuning and fitting resamples has changed slightly. It is nowalways of the form"Preprocessor<i>_Model<j>".
predict() can now be called on the workflow returnedfromlast_fit() (#294, #295, #296).
tune now supports setting theevent_level optionfrom yardstick through the control objects(i.e. control_grid(event_level = "second")) (#240,#249).
tune now supports workflows created with the newworkflows::add_variables() preprocessor.
Better control the random number streams in parallel fortune_grid() andfit_resamples() (#11)
Allow... to pass options fromtune_bayes() toGPfit::GP_fit().
Additional checks are done for the initial grid that is given totune_bayes(). If the initial grid is small relative to thenumber of model terms, a warning is issued. If the grid is a singlepoint, an error occurs. (#269)
Formatting of some messages created bytune_bayes()now respect the width and wrap lines using the newmessage_wrap() function.
tune functions (tune_grid(),tune_bayes(), etc) will now error if a model specificationor model workflow are given as the first argument (the soft deprecationperiod is over).
Anaugment() method was added for objects generatedbytune_*(),fit_resamples(), andlast_fit().
autoplot.tune_results() now requires objects made byversion 0.1.0 or higher of tune.
tune objects no longer keep thersetclass that they have from theresamples argument.
autoplot.tune_results() now produces a differentplot when the tuning grid is a regular grid (i.e. factorial or nearlyfactorial in nature). If there are 5+ parameters, the standard plot isproduced. Non-regular grids are plotted in the same way (although seenext bullet point). See?autoplot.tune_results for moreinformation.
autoplot.tune_results() now transforms the parametervalues for the plot. For example, if thepenalty parameterwas used for a regularized regression, the points are plotted on thelog-10 scale (its default transformation). For non-regular grids, thefacet labels show the transformation type(e.g. "penalty (log-10)" or"cost (log-2)").For regular grid, the x-axis is scaled usingscale_x_continuous().
Finally,autoplot.tune_results() now shows theparameterlabels in a plot. For example, if a k-nearestneighbors model was used withneighbors = tune(), theparameter will be labeled as"# Nearest Neighbors". When anID was used, such asneighbors = tune("K"), this is used toidentify the parameter.
In other plotting news,coord_obs_pred() has beenincluded for regression models. When plotting the observed and predictedvalues from a model, this forces the x- and y-axis to be the same rangeand uses an aspect ratio of 1.
The outcome names are saved in an attribute calledoutcomes to objects with classtune_results.Also, several accessor functions (named `.get_tune_*()) were added tomore easily access such attributes.
conf_mat_resampled() computes the average confusionmatrix across resampling statistics for a single model.
show_best(), and theselect_*()functions will now use the first metric in the metric set if no metricis supplied.
filter_parameters() can trim the.metrics column of unwanted results (as well as columns.predictions and.extracts) fromtune_* objects.
In concert withdials > 0.0.7, tuningengine-specific arguments is possible. Many known engine-specific tuningparameters and handled automatically.
If a grid is given, parameters do not need to be finalized to beused in thetune_*() functions.
Added asave_workflow argument tocontrol_* functions that will result in the workflow objectused to carry out tuning/fitting (regardless of whether a formula orrecipe was given as input to the function) to be appended to theresultingtune_results object in aworkflowattribute. The new.get_tune_workflow() function can beused to access the workflow.
Many of the output columns in atune_results objecthave an additional column called.config. This is meant tobe a unique, qualitative value that used for sorting and merging. Thesevalues also correspond to the messages in the logging produced whenverbose = TRUE.
tune_grid(),tune_bayes(), etc) have beenreordered to better align with parsnip’sfit(). The firstargument to all these functions is now a model specification or modelworkflow. The previous versions are soft-deprecated as of 0.1.0 and willbe deprecated as of 0.1.2.Added more packages to be fully loaded in the workers when run inparallel usingdoParallel (#157), (#159), and(#160)
collect_predictions() gains two new arguments.parameters allows for pre-filtering of the hold-outpredictions by tuning parameters values. If you are only interested inone sub-model, this makes things much faster. The other option issummarize and is used when the resampling method hastraining set rows that are predicted in multiple holdout sets.
select_best(),select_by_one_std_err(),andselect_by_pct_loss() no longer have a redundantmaximize argument (#176). Each metric set in yardstick nowhas a direction (maximize vs. minimize) built in.
tune_bayes() no longer errors with a recipe, which hastuning parameters, in combination with a parameter set, where thedefaults contain unknown values (#168).CRAN release.
Changed license to MIT
The... arguments oftune_grid() andtune_bayes() have been moved forward to force optionalarguments to be named.
Newfit_resamples() for fitting a set of resamplesthat don’t require any tuning.
Changedsummarise.tune_results() back toestimate.tune_results()
NEWS.md file to track changes to thepackage.