
Modeling or machine learning in R can result in fitted model objectsthat take up too much memory. There are two main culprits:
As a result, fitted model objects contain components that are oftenredundant and not required for post-fit estimation activities. Thebutcher package provides tooling to “axe” parts of the fitted outputthat are no longer needed, without sacrificing prediction functionalityfrom the original model object.
Install the released version from CRAN:
install.packages("butcher")Or install the development version fromGitHub:
# install.packages("pak")pak::pak("tidymodels/butcher")As an example, let’s wrap anlm model so it contains alot of unnecessary stuff:
library(butcher)our_model<-function() { some_junk_in_the_environment<-runif(1e6)# we didn't know aboutlm(mpg~ .,data = mtcars)}This object is unnecessarily large:
library(lobstr)obj_size(our_model())#> 8.02 MBWhen, in fact, it should only be:
small_lm<-lm(mpg~ .,data = mtcars)obj_size(small_lm)#> 22.22 kBTo understand which part of our original model object is taking upthe most memory, we leverage theweigh() function:
big_lm<-our_model()weigh(big_lm)#> # A tibble: 25 × 2#> object size#> <chr> <dbl>#> 1 terms 8.01#> 2 qr.qr 0.00666#> 3 residuals 0.00286#> 4 fitted.values 0.00286#> 5 effects 0.0014#> 6 coefficients 0.00109#> 7 call 0.000728#> 8 model.mpg 0.000304#> 9 model.cyl 0.000304#> 10 model.disp 0.000304#> # ℹ 15 more rowsThe problem here is in theterms component of ourbig_lm. Because of howlm() is implemented inthestats package, the environment in which our model wasmade is carried along in the fitted output. To remove the (mostly)extraneous component, we can usebutcher():
cleaned_lm<-butcher(big_lm,verbose =TRUE)#> ✔ Memory released: 8.00 MB#> ✖ Disabled: `print()`, `summary()`, and `fitted()`Comparing it against oursmall_lm, we find:
weigh(cleaned_lm)#> # A tibble: 25 × 2#> object size#> <chr> <dbl>#> 1 terms 0.00771#> 2 qr.qr 0.00666#> 3 residuals 0.00286#> 4 effects 0.0014#> 5 coefficients 0.00109#> 6 model.mpg 0.000304#> 7 model.cyl 0.000304#> 8 model.disp 0.000304#> 9 model.hp 0.000304#> 10 model.drat 0.000304#> # ℹ 15 more rowsAnd now it will take up about the same memory on disk assmall_lm:
weigh(small_lm)#> # A tibble: 25 × 2#> object size#> <chr> <dbl>#> 1 terms 0.00763#> 2 qr.qr 0.00666#> 3 residuals 0.00286#> 4 fitted.values 0.00286#> 5 effects 0.0014#> 6 coefficients 0.00109#> 7 call 0.000728#> 8 model.mpg 0.000304#> 9 model.cyl 0.000304#> 10 model.disp 0.000304#> # ℹ 15 more rowsTo make the most of your memory available, this package provides fiveS3 generics for you to remove parts of a model object:
axe_call(): To remove the call object.axe_ctrl(): To remove controls associated withtraining.axe_data(): To remove the original training data.axe_env(): To remove environments.axe_fitted(): To remove fitted values.When you runbutcher(), you execute all of these axingfunctions at once. Any kind of axing on the object will append abutchered class to the current model object class(es) as well as a newattribute namedbutcher_disabled that lists any post-fitestimation functions that are disabled as a result.
Check out thevignette("available-axe-methods") to seebutcher’s current coverage. If you are working with a new model objectthat could benefit from any kind of axing, we would love for you to makea pull request! You can visit thevignette("adding-models-to-butcher") for more guidelines,but in short, to contribute a set of axe methods:
new_model_butcher(model_class = "your_object", package_name = "your_package")weigh() andlocate() to decide what to axeR/your_object.R andtests/testthat/test-your_object.RThis project is released with aContributorCode of Conduct. By contributing to this project, you agree to abideby its terms.
For questions and discussions about tidymodels packages,modeling, and machine learning, pleaseposton RStudio Community.
If you think you have encountered a bug, pleasesubmit anissue.
Either way, learn how to create and share areprex(a minimal, reproducible example), to clearly communicate about yourcode.
Check out further details oncontributing guidelinesfor tidymodels packages andhow to get help.