Movatterモバイル変換

{lvmisc} has a set of tools to work with statistical model objects.Generally, the functions behind these tools are S3 generic and themethods currently implemented are for the classeslm fromthe {stats} package andlmerMod from the {lme4}package. If you wouldlike to have methods implemented to other model classes, please file anissue athttps://github.com/verasls/lvmisc/issues.

Model diagnostics

{lvmisc} has a plotting function for some common model diagnosticplots. To explore it, let’s first create some example models:

m1<-lm(disp~ mpg+ hp+ cyl+ mpg:cyl, mtcars)m2<-lmer(disp~ mpg+ (1| cyl), mtcars)

The main function to generate the diagnostic plots isplot_model() and its only argument is the model object:

plot_model(m1)

plot_model(m2)

By default it prints a grid of five (or four) diagnostic plots. Eachindividual plot can also be generated by using another function. Thedefault plots shown are, respectively:

A scatterplot of the residual vs. fitted values, with a smoothingline, which is useful to show if the residuals have a non-linearpattern. In this plot, the points should be randomly distributed aroundthe smoothing line, and the smoothing line should be approximatelyhorizontal and around the 0 line.plot_model_residual_fitted() generates this plot.
A scatterplot of the square root of the absolute value of the modelresiduals vs. the fitted values, with a smoothing line. In this plot,again you should look for a horizontal smoothing line with the pointsrandomly distributed around it, if it’s the case, you can assume thehomogeneity of variance of the residuals.plot_model_scale_location() generates this plot.
A Q-Q plot of the standardized residuals, showing if the residualsare normally distributed. Normality is assumed if the residuals are wellaligned with the straight blue line.plot_model_qq()generates this plot.
A bar plot with the Cook’s distance per observation. This is auseful plot to check for influential observations. A given observationis often considered to be influential in the model if its Cook’sdistance is higher than 1.plot_model_cooks_distance()generates this plot.
A bar plot with the variance inflation factor (VIF) per model term,which only shows if the model has 2 terms or more (you can see that inour example this plot is not built form2). The VIF can beused to check the model for multicollinearity: the bar is green for alow multicollinearity, orange for moderate and red for high.plot_model_multicollinearity() generates this plot.

Model cross-validation

The cross-validation of a model is a way to assess how the results ofa statistical model will generalize to an independent data set, butwithout the burden of collecting more data. At the moment, the onlycross-validation method available in the {lvmisc} package is theleave-one-out. Also, the only model classes supported arelm andlmerMod, but more cross-validationmethods and classes will be implemented in the future.

The leave-one-out cross validation method works by each subject’sdata into a testing dataset (one participant at a time) with theremaining data being the training dataset. New models with exactly thesame specifications as the original model are developed using thetraining dataset and then used to predict the outcome in the testingdataset. This process is repeated once for every subject in the originaldata.

We will now build a linear model to demonstrate the cross-validationwith theloo_cv() function.

# Put the rownames into a column to treat eat row as a "subject" (car) in# the leave-one-out cross-validationmtcars$car<-row.names(mtcars)m<-lm(disp~ mpg, mtcars)loo_cv(model = m,data = mtcars,id = car,keep ="used")#> # A tibble: 32 × 3#>    car            .actual .predicted#>    <chr>            <dbl>      <dbl>#>  1 Honda Civic       75.7       47.5#>  2 Toyota Corona    120.       209.#>  3 Merc 450SLC      276.       318.#>  4 Ferrari Dino     145        241.#>  5 Merc 450SE       276.       296.#>  6 Porsche 914-2    120.       128.#>  7 Merc 280         168.       249.#>  8 AMC Javelin      304        317.#>  9 Ford Pantera L   351        303.#> 10 Valiant          225        267.#> # … with 22 more rows

To use theloo_cv() function you need to specify themodel you will cross-validate, the original dataset used to build it andthe name of the column which identifies the subjects. You will alsonotice the use of thekeep argument: it controls thecolumns to be shown in the output object (more details in theloo_cv() documentation).

The output of the cross-validation function is an object of classlvmisc_cv. It is simply atibble with at leasttwo columns: one containing the dependent variable actual values(.actual) as in the original dataset, and the other withthe values predicted by the cross-validation procedure(.predicted). These two columns contain the data necessaryto evaluate the model performance, as we will see in the nextsection.

Model accuracy

{lvmisc} also has an easy method to compute the most common modelaccuracy indices (accuracy() function) and to compare theaccuracy indices of different models (compare_accuracy()function). Similarly to the other {lvmisc} functions described in thisvignette, it has methods for thelm andlmerMod classes and also for thelvmisc_cvclass.

To explore theaccuracy() function, let’s use the samemodels built in the first example of this vignette.

m1<-lm(disp~ mpg+ hp+ cyl+ mpg:cyl, mtcars)m2<-lmer(disp~ mpg+ (1| cyl), mtcars)# Let's also cross-validate both modelscv_m1<-loo_cv(m1, mtcars,"car",keep ="used")cv_m2<-loo_cv(m2, mtcars,"car",keep ="used")

And then use theaccuracy() function in each of themodels to compute their accuracy indices.

accuracy(m1)#>      AIC    BIC   R2 R2_adj  MAE   MAPE  RMSE#> 1 344.64 353.43 0.87   0.85 34.9 15.73% 43.75accuracy(m2)#>      AIC    BIC R2_marg R2_cond   MAE   MAPE  RMSE#> 1 340.76 346.62    0.19    0.81 35.75 16.43% 44.31accuracy(cv_m1)#>      AIC    BIC   R2 R2_adj   MAE   MAPE  RMSE#> 1 344.64 353.43 0.87   0.85 41.82 18.59% 52.58accuracy(cv_m2)#>      AIC    BIC R2_marg R2_cond   MAE   MAPE  RMSE#> 1 340.76 346.62    0.19    0.81 40.77 19.06% 50.28

As can be seen, theaccuracy() function returns thefollowing accuracy indices: Akaike information criterion (AIC), Bayesianinformation criterion (BIC), R² values appropriate to themodel assessed, mean absolute error (MAE), mean absolute percent error(MAPE) and root mean square error (RMSE).

Examining the accuracy results above we can see that the AIC, BIC andR² values do not change between the cross-validated model andthe original model (e.g.m1 vs. cv_m1andm2 vs. cv_m2). This happens because theAIC, BIC and R² for an object of classlvmisc_cvare computed based on the original model. It can also be observed thatthe MAE, MAPE and RMSE values are worse for the cross-validated model,which is expected, as it is used to test an “out-of-sample”performance.

Furthermore, we can also compare the accuracy indices of differentmodels using thecompare_accuracy() function.

compare_accuracy(m1, cv_m1, m2, cv_m2)#> Warning: Not all models are of the same class.#> Warning: Some models were refit using maximum likelihood.#>   Model                   Class   R2 R2_adj R2_marg R2_cond    AIC    BIC   MAE#> 1    m1                      lm 0.87   0.85      NA      NA 344.64 353.43 34.90#> 2 cv_m1      lvmisc_cv_model/lm 0.87   0.85      NA      NA 344.64 353.43 41.82#> 3    m2                 lmerMod   NA     NA    0.28    0.76 353.90 359.77 36.03#> 4 cv_m2 lvmisc_cv_model/lmerMod   NA     NA    0.19    0.81 340.76 346.62 40.77#>     MAPE  RMSE#> 1 15.73% 43.75#> 2 18.59% 52.58#> 3 16.77% 44.47#> 4 19.06% 50.28

Some of the benefits of using this function instead of manuallycomparing models with severalaccuracy() calls are thatcompare_accuracy() output shows the models info (objectname and class) and it throws warnings to aid the comparison.

Movatterモバイル変換

Working with models

Model diagnostics

Model cross-validation

Model accuracy