retrieve_rds() is a documentation utility function toeither retrieve a serialized RDS file from a URL or else run a codeblock if the URL retrieval fails. It is used in vignettes and otherdocumentation to create or retrieve large objects that are too slow tocreate on the spot.ALEPlots by appendingggplotlayers with thecustomize() function.invert_probs() inverts probabilities(subtracts from 1) forALE andALEpDistobjects.We have deeply rethought the vision of this package and havecompletely rewritten the entire package to support existing, new, andfuture planned functionality. The changes are so radical that there isno continuity with the previous version 0.3.1. Thus, we’ve skipped aversion number and now are at version 0.5.0.
Honestly, we can’t keep track of all the changes; experienced usersare advised to rerun the vignettes to get up to speed with the newversion. We apologize for the discontinuity but we trust that the latestversion is easier to use and much more functional. What follows is alist of some of the most notable changes we’ve kept track of.
ale package objects has been completely rewritten. Thelatest objects are not compatible with earlier versions. However, thenew structure supports the roadmap of future functionality, so we hopethat there will be minimal changes in the future that interrupt backwardcompatibility.{S7} classes to representdifferent kinds ofale package objects:ALE: the coreale package object thatholds ALE data for a model (replaces the formerale() andale_ixn() functions).ModelBoot: results of full-model bootstrapping(replaces the formermodel_bootstrap() function).ALEPlots: store ALE plots generated from eitherALE orModelBoot with convenientprint() andplot() methods.ALEpDist: p-value distribution information (replacesthe formercreate_p_dist() function).{ALEPlot} package code and so now claim full authorship ofthe code. One of the most significant implications of this is that wehave decided to change the package license from the GPL 2 to MIT, whichpermits maximum dissemination of our algorithms.ale_ixn() has been eliminated and now both 1D and 2DALE are calculated with theALE() constructor.ALE object constructor no longer produces plotsdirectly. ALE plots are now created asALEPlots objectsusing the newly addedplot() methods that create allpossible plots from the ALE data fromALE orModelBoot objects. Thus, serializingALEobjects now avoids the previous problems of environment bloat of theincludedggplot objects.rug_sample_size argument of theALE constructor tosample_size. Now itreflects the size ofdata that should be sampled in theale object, which can be used not only for rug plots butfor other purposes.x_cols argument inALE() now supportsa complex syntax for specifying which specific columns for 1D ALE orpairs of columns for 2D interactions are desired. It also supportsspecification using standard R formula syntax.get() methods now provide convenient access toALE,ModelBoot, andALEPlotsobjects.plot() methods, eliminated thecompact_plots toale().print() andplot() methods have been addedto theale_plots object.print() method has been added to theALEobject.model_bootstrap() has added various model performancemeasures that are validated using bootstrap validation with the .632correction.p_funs has been completely changed; ithas now been converted to an object namedale_p and thefunctions are separated from the object as internal functions. Thefunctioncreate_p_funs() has been renamedcreate_p_dist().ALEpDist() now produces three types of p-values:“exact” (very slow) with at least 1000 random iterations on the originalmodel; “approx” for 100 to 999 iterations on the original model; and“surrogate” for much faster but less reliable p-values based on asurrogate linear model. SeeALEpDist() for details.We have dealt with innumerable bugs during our development journeybut, fortunately, very few publicly signalled bugs. Only fixes forpublicly reported bugs are indicated here.
One of the most fundamental changes is not directly visible butaffects how some ALE values are calculated. In certain very specificcases, the ALE values are now slightly different from those of thereference{ALEPlot} package. These are only fornon-numerical variables for some prediction types other than predictionsscaled on the response variable. (E.g., a binary or categorical variablefor a logarithmic prediction not scaled to the same scale as theresponse variable.) We made this change for two reasons:
{ALEPlot} implementation. These cases are not covered atall in the base ALE scientific article and they are poorly documented inthe{ALEPlot} code. We cannot help users to interpretresults that we do not understand ourselves.{ALEPlot}reference implementation is not scalable: custom code must be writtenfor each type and each degree of interaction.Other than for these edge cases, our implementation continues to giveidentical results to the reference{ALEPlot} package.
Other notable changes that might not be readily visible to users:
{staccuracy}.{rlang} and{cli} packages. Reduced the imported functions to aminimum.{cli}.{assertthat} with custom validation functionsthat adapt some{assertthat} code.helper.R test files so that some testing objectsare available to the loaded package.{future} parallelization code to restoreoriginal values on exit.ale_p objects.{ggplot2}3.5.{covr}.The most significant updates are the addition of p-values for the ALEstatistics, the launching of a pkgdown website which will henceforthhost the development version of the package, and parallelization of corefunctions with a resulting performance boost.
One of the key goals for theale package is that itwould be truly model-agnostic: it should support any R object that canbe considered a model, where a model is defined as an object that makesa prediction for each input row of data that it is provided. Towardsthis goal, we had to adjust the custom predict function to make it moreflexible for various kinds of model objects. We are happy that ourchanges now enable support fortidymodels objects andvarious survival models (but for now, only those that returnsingle-vector predictions). So,in addition to taking requiredobject andnewdata arguments, the custompredict functionpred_fun in theale()function now also requires an argument fortype to specifythe prediction type, whether it is used or not. This changebreaks previous code that used custom predict functions, but it allowsale to analyze many new model types than before. Code thatdid not require custom predict functions should not be affected by thischange. See the updated documentation of theale() functionfor details.
Another change that breaks former code is that the arguments formodel_bootstrap() have been modified. Instead of acumbersomemodel_call_string,model_bootstrap() now uses the{insight} package to automatically detect many R models anddirectly manipulate the model object as needed. So, the secondargument is now themodel object. However, for non-standardmodels that{insight} cannot automatically parse, amodifiedmodel_call_string is still available to assuremodel-agnostic functionality. Although this change breaks former codethat ranmodel_bootstrap(), we believe that the newfunction interface is much more user-friendly.
A slight change that might break some existing code is that theconf_regions output associated with ALE statistics has beenrestructured. The new structure provides more useful information. Seehelp(ale) for details.
pkgdown website locatedathttps://tripartio.github.io/ale/. This is wherethe most recent development features will be documented.create_p_funs() function for details and anexample.vignette('ale-statistics') fordetails. The vignette has been expanded with more details on how toproperly interpret normalized ALE statistics.vignette('ale-statistics') for details.{furrr} library. In our tests, practically, we typicallyfound speed-ups ofn – 2 wheren is the numberof physical cores (machine learning is generally unable to use logicalcores). For example, a computer with 4 physical cores should see atleast ×2 speed-up and a computer with 6 physical cores should see atleast ×4 speed-up. However, parallelization is tricky with ourmodel-agnostic design. When users work with models that follow standardR conventions, theale package should be able toautomatically configure the system for parallelization. But for somenon-standard models users may have to explicitly list the model’spackages in the newmodel_packages argument so that eachparallel thread can find all necessary functions. This is only a concernif you get weird errors. Seehelp(ale) for details.ale() function. Seehelp(ale) for details.median_band_pct argument toale() nowtakes a vector of two numbers, one for the inner band and one for theouter.{gridExtra} with{patchwork} forexamples and vignettes for printing plots.ale() function documentation fromale-package documentation.alt tags to describe plots for accessibility.{insight} package to automatically detecty_col and model call objects when possible; this increases the range ofautomatic model detection of theale package ingeneral.{progressr} package forprogress bars. With thecli progression handler, thisenables accurate estimated times of arrival (ETA) for long procedures,even with parallel computing. A message is displayed once per sessioninforming users of how to customize their progress bars. For details,seehelp(ale), particularly the documentation on progressbars and thesilent argument.{ggplot2} from a dependency to an import. So, itis no longer automatically loaded with the package.var_summary()function. In particular, encodes whether the user is using p-values(ALER band) or not (median band).validation.R file.compact_plots to plotting functionsto strip plot environments to reduce the size of returned objects. Seehelp(ale) for details.package_scope environment.ale_ixn()).ale_ixn()).ale() does not yet support multi-output modelprediction types (e.g., multi-class classification and multi-timesurvival probabilities).This version introduces various ALE-based statistics that let ALE beused for statistical inference, not just interpretable machine learning.A dedicated vignette introduces this functionality (see “ALE-basedstatistics for statistical inference and effect sizes” from thevignettes link on the main CRAN page athttps://CRAN.R-project.org/package=ale). We introducethese statistics in detail in a working paper: Okoli, Chitu. 2023.“Statistical Inference Using Machine Learning and Classical TechniquesBased on Accumulated Local Effects (ALE).” arXiv.https://doi.org/10.48550/arXiv.2310.09877. Please notethat they might be further refined after peer review.
ale() andmodel_bootstrap() now output thesestatistics. (ale_ixn() will come later.)ale package withthe reference{ALEPlot} package:“Comparison between{ALEPlot} andalepackages” (available from the vignettes link on the main CRAN page athttps://CRAN.R-project.org/package=ale).var_cars is a modified version of mtcars that featuresmany different types of variables.census is a polished version of the adult incomedataset used for a vignette in the{ALEPlot} package.silent = TRUE toale(),ale_ixn(), ormodel_bootstrap().seed argument toale(),ale_ixn(), ormodel_bootstrap().By far the most extensive changes have been to assure the accuracyand stability of the package from a software engineering perspective.Even though these are not visible to users, they make the package morerobust with hopefully fewer bugs. Indeed, the extensive data validationmay help users debug their own errors.
{assertthat} package; ifnot, the function fails quickly with an appropriate error message.{testthat} package is now used for testingthe outputs of each user-facing function. This should help the code baseto be more robust going forward with future developments.{ALEPlot} package. Thesetests should ensure that any future code that breaks the accuracy of ALEcalculations will be caught quickly.ale_ixn()).ale_ixn()).This is the first CRAN release of theale package. Hereis its official description with the initial release:
Accumulated Local Effects (ALE) were initially developed as amodel-agnostic approach for global explanations of the results ofblack-box machine learning algorithms. (Apley, Daniel W., and JingyuZhu. “Visualizing the effects of predictor variables in black boxsupervised learning models.” Journal of the Royal Statistical SocietySeries B: Statistical Methodology 82.4 (2020): 1059-1086doi:10.1111/rssb.12377.)ALE has two primary advantages over other approaches like partialdependency plots (PDP) and SHapley Additive exPlanations (SHAP): itsvalues are not affected by the presence of interactions among variablesin a model and its computation is relatively rapid. This packagerewrites the original code from the ‘ALEPlot’ package for calculatingALE data and it completely reimplements the plotting of ALE values.
(This package uses the same GPL-2 license as the{ALEPlot} package.)
This initial release replicates the full functionality of the{ALEPlot} package and a lot more. It currently presentsthree functions:
ale(): create data for and plot one-way ALE (singlevariables). ALE values may be bootstrapped.ale_ixn(): create data for and plot two-way ALEinteractions. Bootstrapping of the interaction ALE values has not yetbeen implemented.model_bootstrap(): bootstrap an entire model, not justthe ALE values. This function returns the bootstrapped model statisticsand coefficients as well as the bootstrapped ALE values. This is theappropriate approach for small samples.This release provides more details in the following vignettes (theyare all available from the vignettes link on the main CRAN page athttps://CRAN.R-project.org/package=ale):
ale packageale() function handling of various datatypes for x