Movatterモバイル変換

butcher

Overview

Modeling or machine learning in R can result in fitted model objectsthat take up too much memory. There are two main culprits:

Heavy usage of formulas and closures that capture the enclosingenvironment in model training
Lack of selectivity in the construction of the model objectitself

As a result, fitted model objects contain components that are oftenredundant and not required for post-fit estimation activities. Thebutcher package provides tooling to “axe” parts of the fitted outputthat are no longer needed, without sacrificing prediction functionalityfrom the original model object.

Installation

Install the released version from CRAN:

install.packages("butcher")

Or install the development version fromGitHub:

# install.packages("pak")pak::pak("tidymodels/butcher")

Butchering

As an example, let’s wrap anlm model so it contains alot of unnecessary stuff:

library(butcher)our_model<-function() {  some_junk_in_the_environment<-runif(1e6)# we didn't know aboutlm(mpg~ .,data = mtcars)}

This object is unnecessarily large:

library(lobstr)obj_size(our_model())#> 8.02 MB

When, in fact, it should only be:

small_lm<-lm(mpg~ .,data = mtcars)obj_size(small_lm)#> 22.22 kB

To understand which part of our original model object is taking upthe most memory, we leverage theweigh() function:

big_lm<-our_model()weigh(big_lm)#> # A tibble: 25 × 2#>    object            size#>    <chr>            <dbl>#>  1 terms         8.01#>  2 qr.qr         0.00666#>  3 residuals     0.00286#>  4 fitted.values 0.00286#>  5 effects       0.0014#>  6 coefficients  0.00109#>  7 call          0.000728#>  8 model.mpg     0.000304#>  9 model.cyl     0.000304#> 10 model.disp    0.000304#> # ℹ 15 more rows

The problem here is in theterms component of ourbig_lm. Because of howlm() is implemented inthestats package, the environment in which our model wasmade is carried along in the fitted output. To remove the (mostly)extraneous component, we can usebutcher():

cleaned_lm<-butcher(big_lm,verbose =TRUE)#> ✔ Memory released: 8.00 MB#> ✖ Disabled: `print()`, `summary()`, and `fitted()`

Comparing it against oursmall_lm, we find:

weigh(cleaned_lm)#> # A tibble: 25 × 2#>    object           size#>    <chr>           <dbl>#>  1 terms        0.00771#>  2 qr.qr        0.00666#>  3 residuals    0.00286#>  4 effects      0.0014#>  5 coefficients 0.00109#>  6 model.mpg    0.000304#>  7 model.cyl    0.000304#>  8 model.disp   0.000304#>  9 model.hp     0.000304#> 10 model.drat   0.000304#> # ℹ 15 more rows

And now it will take up about the same memory on disk assmall_lm:

weigh(small_lm)#> # A tibble: 25 × 2#>    object            size#>    <chr>            <dbl>#>  1 terms         0.00763#>  2 qr.qr         0.00666#>  3 residuals     0.00286#>  4 fitted.values 0.00286#>  5 effects       0.0014#>  6 coefficients  0.00109#>  7 call          0.000728#>  8 model.mpg     0.000304#>  9 model.cyl     0.000304#> 10 model.disp    0.000304#> # ℹ 15 more rows

To make the most of your memory available, this package provides fiveS3 generics for you to remove parts of a model object:

axe_call(): To remove the call object.
axe_ctrl(): To remove controls associated withtraining.
axe_data(): To remove the original training data.
axe_env(): To remove environments.
axe_fitted(): To remove fitted values.

When you runbutcher(), you execute all of these axingfunctions at once. Any kind of axing on the object will append abutchered class to the current model object class(es) as well as a newattribute namedbutcher_disabled that lists any post-fitestimation functions that are disabled as a result.

Model Object Coverage

Check out thevignette("available-axe-methods") to seebutcher’s current coverage. If you are working with a new model objectthat could benefit from any kind of axing, we would love for you to makea pull request! You can visit thevignette("adding-models-to-butcher") for more guidelines,but in short, to contribute a set of axe methods:

Runnew_model_butcher(model_class = "your_object", package_name = "your_package")
Use butcher helper functionsweigh() andlocate() to decide what to axe
Finalize edits toR/your_object.R andtests/testthat/test-your_object.R
Make a pull request!

Contributing

This project is released with aContributorCode of Conduct. By contributing to this project, you agree to abideby its terms.

For questions and discussions about tidymodels packages,modeling, and machine learning, pleaseposton RStudio Community.
If you think you have encountered a bug, pleasesubmit anissue.
Either way, learn how to create and share areprex(a minimal, reproducible example), to clearly communicate about yourcode.
Check out further details oncontributing guidelinesfor tidymodels packages andhow to get help.

[8]ページ先頭