Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Partial Dependence Plots
Version:0.8.2
Description:A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
URL:https://github.com/bgreenwell/pdp,http://bgreenwell.github.io/pdp/
BugReports:https://github.com/bgreenwell/pdp/issues
Depends:R (≥ 3.6.0)
Suggests:adabag, AmesHousing, C50, caret, covr, Cubist, doParallel,dplyr, e1071, earth, gbm, gridExtra, ICEbox, ipred, keras,kernlab, magrittr, MASS, Matrix, mda, mlbench, nnet, party,partykit, randomForest, ranger, reticulate, rpart, tinytest,xgboost (≥ 0.6-0), knitr, rmarkdown, vip
Imports:foreach, ggplot2 (≥ 3.0.0), grDevices, lattice, methods,rlang (≥ 0.3.0), stats, utils
LazyData:TRUE
RoxygenNote:7.3.2
Encoding:UTF-8
VignetteBuilder:knitr
NeedsCompilation:yes
Packaged:2024-10-28 17:22:10 UTC; bgreenwell
Author:Brandon M. GreenwellORCID iD [aut, cre]
Maintainer:Brandon M. Greenwell <greenwell.brandon@gmail.com>
Repository:CRAN
Date/Publication:2024-10-28 17:50:02 UTC

pdp: A general framework for constructing partial dependence (i.e., marginaleffect) plots from various types machine learning models in R.

Description

Partial dependence plots (PDPs) help visualize the relationship between asubset of the features (typically 1-3) and the response while accounting forthe average effect of the other predictors in the model. They areparticularly effective with black box models like random forests and supportvector machines.

Details

The development version can be found on GitHub: https://github.com/bgreenwell/pdp.As of right now,pdp exports four functions:

Author(s)

Maintainer: Brandon M. Greenwellgreenwell.brandon@gmail.com (ORCID)

See Also

Useful links:


Plotting Partial Dependence Functions

Description

Plots partial dependence functions (i.e., marginal effects) usingggplot2 graphics.

Usage

## S3 method for class 'partial'autoplot(  object,  center = FALSE,  plot.pdp = TRUE,  pdp.color = "red",  pdp.size = 1,  pdp.linetype = 1,  rug = FALSE,  smooth = FALSE,  smooth.method = "auto",  smooth.formula = y ~ x,  smooth.span = 0.75,  smooth.method.args = list(),  contour = FALSE,  contour.color = "white",  train = NULL,  xlab = NULL,  ylab = NULL,  main = NULL,  legend.title = "yhat",  ...)## S3 method for class 'ice'autoplot(  object,  center = FALSE,  plot.pdp = TRUE,  pdp.color = "red",  pdp.size = 1,  pdp.linetype = 1,  rug = FALSE,  train = NULL,  xlab = NULL,  ylab = NULL,  main = NULL,  ...)## S3 method for class 'cice'autoplot(  object,  plot.pdp = TRUE,  pdp.color = "red",  pdp.size = 1,  pdp.linetype = 1,  rug = FALSE,  train = NULL,  xlab = NULL,  ylab = NULL,  main = NULL,  ...)

Arguments

object

An object that inherits from the"partial" class.

center

Logical indicating whether or not to produce centered ICEcurves (c-ICE curves). Only useful whenobject represents a set of ICEcurves; seepartial for details. Default isFALSE.

plot.pdp

Logical indicating whether or not to plot the partialdependence function on top of the ICE curves. Default isTRUE.

pdp.color

Character string specifying the color to use for the partialdependence function whenplot.pdp = TRUE. Default is"red".

pdp.size

Positive number specifying the line width to use for thepartial dependence function whenplot.pdp = TRUE. Default is1.

pdp.linetype

Positive number specifying the line type to use for thepartial dependence function whenplot.pdp = TRUE. Default is1.

rug

Logical indicating whether or not to include rug marks on thepredictor axes. Default isFALSE.

smooth

Logical indicating whether or not to overlay a LOESS smooth.Default isFALSE.

smooth.method

Character string specifying the smoothing method(function) to use (e.g.,"auto","lm","glm","gam","loess", or"rlm"). Default is"auto".Seegeom_smooth for details.

smooth.formula

Formula to use in smoothing function (e.g.,y ~ x,y ~ poly(x, 2), ory ~ log(x)).

smooth.span

Controls the amount of smoothing for the default loesssmoother. Smaller numbers produce wigglier lines, larger numbers producesmoother lines. Default is0.75.

smooth.method.args

List containing additional arguments to be passedon to the modeling function defined bysmooth.method.

contour

Logical indicating whether or not to add contour lines to thelevel plot.

contour.color

Character string specifying the color to use for thecontour lines whencontour = TRUE. Default is"white".

train

Data frame containing the original training data. Only requiredifrug = TRUE orchull = TRUE.

xlab

Character string specifying the text for the x-axis label.

ylab

Character string specifying the text for the y-axis label.

main

Character string specifying the text for the main title of theplot.

legend.title

Character string specifying the text for the legend title.Default is"yhat".

...

Additional (optional) arguments to be passed ontogeom_line,geom_point, orscale_fill_viridis_c.

Value

A"ggplot" object.

Examples

## Not run: ## Regression example (requires randomForest package to run)## Load required packageslibrary(ggplot2)  # for autoplot() genericlibrary(gridExtra)  # for `grid.arrange()`library(magrittr)  # for forward pipe operator `%>%`library(randomForest)# Fit a random forest to the Boston housing datadata (boston)  # load the boston housing dataset.seed(101)  # for reproducibilityboston.rf <- randomForest(cmedv ~ ., data = boston)# Partial dependence of cmedv on lstatboston.rf %>%  partial(pred.var = "lstat") %>%  autoplot(rug = TRUE, train = boston) + theme_bw()# Partial dependence of cmedv on lstat and rmboston.rf %>%  partial(pred.var = c("lstat", "rm"), chull = TRUE, progress = TRUE) %>%  autoplot(contour = TRUE, legend.title = "cmedv",           option = "B", direction = -1) + theme_bw()# ICE curves and c-ICE curvesage.ice <- partial(boston.rf, pred.var = "lstat", ice = TRUE)grid.arrange(  autoplot(age.ice, alpha = 0.1),                 # ICE curves  autoplot(age.ice, center = TRUE, alpha = 0.1),  # c-ICE curves  ncol = 2)## End(Not run)

Boston Housing Data

Description

Data on median housing values from 506 census tracts in the suburbs of Bostonfrom the 1970 census. This data frame is a corrected version of the originaldata by Harrison and Rubinfeld (1978) with additional spatial information.The data were taken directly fromBostonHousing2 andunneeded columns (i.e., name of town, census tract, and the uncorrectedmedian home value) were removed.

Usage

data(boston)

Format

A data frame with 506 rows and 16 variables.

References

Harrison, D. and Rubinfeld, D.L. (1978). Hedonic prices and the demand forclean air. Journal of Environmental Economics and Management, 5, 81-102.

Gilley, O.W., and R. Kelley Pace (1996). On the Harrison and Rubinfeld Data.Journal of Environmental Economics and Management, 31, 403-405.

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repositoryof machine learning databases[http://www.ics.uci.edu/~mlearn/MLRepository.html] Irvine, CA: University ofCalifornia, Department of Information and Computer Science.

Pace, R. Kelley, and O.W. Gilley (1997). Using the Spatial Configuration ofthe Data to Improve Estimation. Journal of the Real Estate Finance andEconomics, 14, 333-340.

Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine LearningBenchmark Problems. R package version 2.1-1.

Examples

head(boston)

Exemplar observation

Description

Construct a single "exemplar" record from a data frame. For now, all numericcolumns (including"Date" objects) are replaced with theircorresponding median value and non-numeric columns are replaced with theirmost frequent value.

Usage

exemplar(object)## S3 method for class 'data.frame'exemplar(object)## S3 method for class 'matrix'exemplar(object)## S3 method for class 'dgCMatrix'exemplar(object)

Arguments

object

A data frame, matrix, ordgCMatrix (the latter two aresupported byxgboost).

Value

A data frame with the same number of columns asobject and asingle row.

Examples

set.seed(1554)  # for reproducibilitytrain <- data.frame(  x = rnorm(100),  y = sample(letters[1L:3L], size = 100, replace = TRUE,             prob = c(0.1, 0.1, 0.8)))exemplar(train)

Partial Dependence Functions

Description

Compute partial dependence functions (i.e., marginal effects) for variousmodel fitting objects.

Usage

partial(object, ...)## Default S3 method:partial(  object,  pred.var,  pred.grid,  pred.fun = NULL,  grid.resolution = NULL,  ice = FALSE,  center = FALSE,  approx = FALSE,  quantiles = FALSE,  probs = 1:9/10,  trim.outliers = FALSE,  type = c("auto", "regression", "classification"),  inv.link = NULL,  which.class = 1L,  prob = FALSE,  recursive = TRUE,  plot = FALSE,  plot.engine = c("lattice", "ggplot2"),  smooth = FALSE,  rug = FALSE,  chull = FALSE,  levelplot = TRUE,  contour = FALSE,  contour.color = "white",  alpha = 1,  train,  cats = NULL,  check.class = TRUE,  progress = FALSE,  parallel = FALSE,  paropts = NULL,  ...)## S3 method for class 'model_fit'partial(object, ...)

Arguments

object

A fitted model object of appropriate class (e.g.,"gbm","lm","randomForest","train", etc.).

...

Additional optional arguments to be passed ontopredict.

pred.var

Character string giving the names of the predictor variablesof interest. For reasons of computation/interpretation, this should includeno more than three variables.

pred.grid

Data frame containing the joint values of interest for thevariables listed inpred.var.

pred.fun

Optional prediction function that requires two arguments:object andnewdata. If specified, then the function must returna single prediction or a vector of predictions (i.e., not a matrix or dataframe). Default isNULL.

grid.resolution

Integer giving the number of equally spaced points touse for the continuous variables listed inpred.var whenpred.grid is not supplied. If leftNULL, it will default tothe minimum between51 and the number of unique data points for eachof the continuous independent variables listed inpred.var.

ice

Logical indicating whether or not to compute individualconditional expectation (ICE) curves. Default isFALSE. SeeGoldstein et al. (2014) for details.

center

Logical indicating whether or not to produce centered ICEcurves (c-ICE curves). Only used whenice = TRUE. Default isFALSE. See Goldstein et al. (2014) for details.

approx

Logical indicating whether or not to compute a faster, butapproximate, marginal effect plot (similar in spirit to theplotmo package). IfTRUE, thenpartial() will computepredictions across the predictors specified inpred.var while holdingthe other predictors constant (a "poor man's partial dependence" function asStephen Milborrow, the author ofplotmo, puts it).Default isFALSE. Note this works withice = TRUE as well.WARNING: This option is currently experimental. Use at your own risk. It ispossible (and arguably safer) to do this manually by passing a specific"exemplar" observation to the train argument and specifyingpred.gridmanually.

quantiles

Logical indicating whether or not to use the samplequantiles of the continuous predictors listed inpred.var. Ifquantiles = TRUE andgrid.resolution = NULL the samplequantiles will be used to generate the grid of joint values for which thepartial dependence is computed.

probs

Numeric vector of probabilities with values in [0,1]. (Values upto 2e-14 outside that range are accepted and moved to the nearby endpoint.)Default is1:9/10 which corresponds to the deciles of the predictorvariables. These specify which quantiles to use for the continuous predictorslisted inpred.var whenquantiles = TRUE.

trim.outliers

Logical indicating whether or not to trim off outliersfrom the continuous predictors listed inpred.var (using the simpleboxplot method) before generating the grid of joint values for which thepartial dependence is computed. Default isFALSE.

type

Character string specifying the type of supervised learning.Current options are"auto","regression" or"classification". Iftype = "auto" thenpartial will tryto extract the necessary information fromobject.

inv.link

Function specifying the transformation to be applied to thepredictions before the partial dependence function is computed(experimental). Default isNULL (i.e., no transformation). This optionis intended to be used for models that allow for non-Gaussian responsevariables (e.g., counts). For these models, predictions are not typicallyreturned on the original response scale by default. For example, Poisson GBMstypically return predictions on the log scale. In this case settinginv.link = exp will return the partial dependence function on theresponse (i.e., raw count) scale.

which.class

Integer specifying which column of the matrix of predictedprobabilities to use as the "focus" class. Default is to use the first class.Only used for classification problems (i.e., whentype = "classification").

prob

Logical indicating whether or not partial dependence forclassification problems should be returned on the probability scale, ratherthan the centered logit. IfFALSE, the partial dependence function ison a scale similar to the logit. Default isFALSE.

recursive

Logical indicating whether or not to use the weighted treetraversal method described in Friedman (2001). This only applies to objectsthat inherit from class"gbm". Default isTRUE which is muchfaster than the exact brute force approach used for all other models. (Basedon the C++ code behindplot.gbm.)

plot

Logical indicating whether to return a data frame containing thepartial dependence values (FALSE) or plot the partial dependencefunction directly (TRUE). Default isFALSE. SeeplotPartial for plotting details.

plot.engine

Character string specifying which plotting engine to usewheneverplot = TRUE. Options include"lattice" (default) or"ggplot2".

smooth

Logical indicating whether or not to overlay a LOESS smooth.Default isFALSE.

rug

Logical indicating whether or not to include a rug display on thepredictor axes. The tick marks indicate the min/max and deciles of thepredictor distributions. This helps reduce the risk of interpreting thepartial dependence plot outside the region of the data (i.e., extrapolating).Only used whenplot = TRUE. Default isFALSE.

chull

Logical indicating whether or not to restrict the values of thefirst two variables inpred.var to lie within the convex hull of theirtraining values; this affectspred.grid. This helps reduce the risk ofinterpreting the partial dependence plot outside the region of the data(i.e., extrapolating).Default isFALSE.

levelplot

Logical indicating whether or not to use a false color levelplot (TRUE) or a 3-D surface (FALSE). Default isTRUE.

contour

Logical indicating whether or not to add contour lines to thelevel plot. Only used whenlevelplot = TRUE. Default isFALSE.

contour.color

Character string specifying the color to use for thecontour lines whencontour = TRUE. Default is"white".

alpha

Numeric value in[0, 1] specifying the opacity alpha (most useful when plotting ICE/c-ICE curves). Default is 1 (i.e., notransparency). In fact, this option only affects ICE/c-ICE curves and levelplots.

train

An optional data frame, matrix, or sparse matrix containing theoriginal training data. This may be required depending on the class ofobject. For objects that do not store a copy of the original trainingdata, this argument is required. For reasons discussed below, it is goodpractice to always specify this argument.

cats

Character string indicating which columns oftrain shouldbe treated as categorical variables. Only used whentrain inheritsfrom class"matrix" or"dgCMatrix".

check.class

Logical indicating whether or not to make sure each columninpred.grid has the correct class, levels, etc. Default isTRUE.

progress

Logical indicating whether or not to display a text-basedprogress bar. Default isFALSE.

parallel

Logical indicating whether or not to runpartial inparallel using a backend provided by theforeach package. Default isFALSE.

paropts

List containing additional options to be passed ontoforeach whenparallel = TRUE.

Value

By default,partial returns an object of classc("data.frame", "partial"). Ifice = TRUE andcenter = FALSE then an object of classc("data.frame", "ice")is returned. Ifice = TRUE andcenter = TRUE then an object ofclassc("data.frame", "cice") is returned. These three classesdetermine the behavior of theplotPartial function which isautomatically called wheneverplot = TRUE. Specifically, whenplot = TRUE, a"trellis" object is returned (seelattice for details); the"trellis" object willalso include an additional attribute,"partial.data", containing thedata displayed in the plot.

Note

In some cases it is difficult forpartial to extract the originaltraining data fromobject. In these cases an error message isdisplayed requesting the user to supply the training data via thetrain argument in the call topartial. In most cases wherepartial can extract the required training data fromobject,it is taken from the same environment in whichpartial is called.Therefore, it is important to not change the training data used to constructobject before callingpartial. This problem is completelyavoided when the training data are passed to thetrain argument in thecall topartial.

It is recommended to callpartial withplot = FALSE and storethe results. This allows for more flexible plotting, and the user will nothave to waste time callingpartial again if the default plot is notsufficient.

It is possible to retrieve the last printed"trellis" object, such asthose produced byplotPartial, usingtrellis.last.object().

Ifice = TRUE or the prediction function given topred.funreturns a prediction for each observation innewdata, then the resultwill be a curve for each observation. These are called individual conditionalexpectation (ICE) curves; see Goldstein et al. (2015) andice for details.

References

J. H. Friedman. Greedy function approximation: A gradient boosting machine.Annals of Statistics,29: 1189-1232, 2001.

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside theBlack Box: Visualizing Statistical Learning With Plots of IndividualConditional Expectation. (2014)Journal of Computational and GraphicalStatistics,24(1): 44-65, 2015.

Examples

## Not run: ## Regression example (requires randomForest package to run)## Fit a random forest to the boston housing datalibrary(randomForest)data (boston)  # load the boston housing dataset.seed(101)  # for reproducibilityboston.rf <- randomForest(cmedv ~ ., data = boston)# Using randomForest's partialPlot functionpartialPlot(boston.rf, pred.data = boston, x.var = "lstat")# Using pdp's partial functionhead(partial(boston.rf, pred.var = "lstat"))  # returns a data framepartial(boston.rf, pred.var = "lstat", plot = TRUE, rug = TRUE)# The partial function allows for multiple predictorspartial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40,        plot = TRUE, chull = TRUE, progress = TRUE)# The plotPartial function offers more flexible plottingpd <- partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40)plotPartial(pd, levelplot = FALSE, zlab = "cmedv", drape = TRUE,            colorkey = FALSE, screen = list(z = -20, x = -60))# The autplot function can be used to produce graphics based on ggplot2library(ggplot2)autoplot(pd, contour = TRUE, legend.title = "Partial\ndependence")## Individual conditional expectation (ICE) curves## Use partial to obtain ICE/c-ICE curvesrm.ice <- partial(boston.rf, pred.var = "rm", ice = TRUE)plotPartial(rm.ice, rug = TRUE, train = boston, alpha = 0.2)autoplot(rm.ice, center = TRUE, alpha = 0.2, rug = TRUE, train = boston)## Classification example (requires randomForest package to run)## Fit a random forest to the Pima Indians diabetes datadata (pima)  # load the boston housing dataset.seed(102)  # for reproducibilitypima.rf <- randomForest(diabetes ~ ., data = pima, na.action = na.omit)# Partial dependence of positive test result on glucose (default logit scale)partial(pima.rf, pred.var = "glucose", plot = TRUE, chull = TRUE,        progress = TRUE)# Partial dependence of positive test result on glucose (probability scale)partial(pima.rf, pred.var = "glucose", prob = TRUE, plot = TRUE,        chull = TRUE, progress = TRUE)## End(Not run)

Pima Indians Diabetes Data

Description

Diabetes test results collected by the the US National Institute of Diabetesand Digestive and Kidney Diseases from a population of women who were atleast 21 years old, of Pima Indian heritage, and living near Phoenix,Arizona. The data were taken directly fromPimaIndiansDiabetes2.

Usage

data(pima)

Format

A data frame with 768 observations on 9 variables.

References

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repositoryof machine learning databases[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University ofCalifornia, Department of Information and Computer Science.

Brian D. Ripley (1996), Pattern Recognition and Neural Networks, CambridgeUniversity Press, Cambridge.

Grace Whaba, Chong Gu, Yuedong Wang, and Richard Chappell (1995), SoftClassification a.k.a. Risk Estimation via Penalized Log Likelihood andSmoothing Spline Analysis of Variance, in D. H. Wolpert (1995), TheMathematics of Generalization, 331-359, Addison-Wesley, Reading, MA.

Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine LearningBenchmark Problems. R package version 2.1-1.

Examples

head(pima)

Plotting Partial Dependence Functions

Description

Plots partial dependence functions (i.e., marginal effects) usinglattice graphics.

Usage

plotPartial(object, ...)## S3 method for class 'ice'plotPartial(  object,  center = FALSE,  plot.pdp = TRUE,  pdp.col = "red2",  pdp.lwd = 2,  pdp.lty = 1,  rug = FALSE,  train = NULL,  ...)## S3 method for class 'cice'plotPartial(  object,  plot.pdp = TRUE,  pdp.col = "red2",  pdp.lwd = 2,  pdp.lty = 1,  rug = FALSE,  train = NULL,  ...)## S3 method for class 'partial'plotPartial(  object,  center = FALSE,  plot.pdp = TRUE,  pdp.col = "red2",  pdp.lwd = 2,  pdp.lty = 1,  smooth = FALSE,  rug = FALSE,  chull = FALSE,  levelplot = TRUE,  contour = FALSE,  contour.color = "white",  col.regions = NULL,  number = 4,  overlap = 0.1,  train = NULL,  ...)

Arguments

object

An object that inherits from the"partial" class.

...

Additional optional arguments to be passed ontodotplot,levelplot,xyplot, orwireframe.

center

Logical indicating whether or not to produce centered ICEcurves (c-ICE curves). Only useful whenobject represents a set of ICEcurves; seepartial for details. Default isFALSE.

plot.pdp

Logical indicating whether or not to plot the partialdependence function on top of the ICE curves. Default isTRUE.

pdp.col

Character string specifying the color to use for the partialdependence function whenplot.pdp = TRUE. Default is"red".

pdp.lwd

Integer specifying the line width to use for the partialdependence function whenplot.pdp = TRUE. Default is1. Seepar for more details.

pdp.lty

Integer or character string specifying the line type to usefor the partial dependence function whenplot.pdp = TRUE. Default is1. Seepar for more details.

rug

Logical indicating whether or not to include rug marks on thepredictor axes. Default isFALSE.

train

Data frame containing the original training data. Only requiredifrug = TRUE orchull = TRUE.

smooth

Logical indicating whether or not to overlay a LOESS smooth.Default isFALSE.

chull

Logical indicating whether or not to restrict the first twovariables inpred.var to lie within the convex hull of their trainingvalues; this affectspred.grid. Default isFALSE.

levelplot

Logical indicating whether or not to use a false color levelplot (TRUE) or a 3-D surface (FALSE). Default isTRUE.

contour

Logical indicating whether or not to add contour lines to thelevel plot. Only used whenlevelplot = TRUE. Default isFALSE.

contour.color

Character string specifying the color to use for thecontour lines whencontour = TRUE. Default is"white".

col.regions

Vector of colors to be passed on tolevelplot'scol.region argument. Defaults togrDevices::hcl.colors(100) (which is the same viridis color paletteused in the past).

number

Integer specifying the number of conditional intervals to usefor the continuous panel variables. Seeco.intervalsandequal.count for further details.

overlap

The fraction of overlap of the conditioning variables. Seeco.intervals andequal.countfor further details.

Examples

## Not run: ## Regression example (requires randomForest package to run)## Load required packageslibrary(gridExtra)  # for `grid.arrange()`library(magrittr)  # for forward pipe operator `%>%`library(randomForest)# Fit a random forest to the Boston housing datadata (boston)  # load the boston housing dataset.seed(101)  # for reproducibilityboston.rf <- randomForest(cmedv ~ ., data = boston)# Partial dependence of cmedv on lstatboston.rf %>%  partial(pred.var = "lstat") %>%  plotPartial(rug = TRUE, train = boston)# Partial dependence of cmedv on lstat and rmboston.rf %>%  partial(pred.var = c("lstat", "rm"), chull = TRUE, progress = TRUE) %>%  plotPartial(contour = TRUE, legend.title = "rm")# ICE curves and c-ICE curvesage.ice <- partial(boston.rf, pred.var = "lstat", ice = TRUE)p1 <- plotPartial(age.ice, alpha = 0.1)p2 <- plotPartial(age.ice, center = TRUE, alpha = 0.1)grid.arrange(p1, p2, ncol = 2)## End(Not run)

Extract Most "Important" Predictors (Experimental)

Description

Extract the most "important" predictors for regression and classificationmodels.

Usage

topPredictors(object, n = 1L, ...)## Default S3 method:topPredictors(object, n = 1L, ...)## S3 method for class 'train'topPredictors(object, n = 1L, ...)

Arguments

object

A fitted model object of appropriate class (e.g.,"gbm","lm","randomForest", etc.).

n

Integer specifying the number of predictors to return. Default is1 meaning return the single most important predictor.

...

Additional optional arguments to be passed ontovarImp.

Details

This function uses the generic functionvarImp tocalculate variable importance scores for each predictor. After that, they aresorted at the names of then highest scoring predictors are returned.

Examples

## Not run: ## Regression example (requires randomForest package to run)#Load required packageslibrary(ggplot2)library(randomForest)# Fit a random forest to the mtcars datasetdata(mtcars, package = "datasets")set.seed(101)mtcars.rf <- randomForest(mpg ~ ., data = mtcars, mtry = 5, importance = TRUE)# Topfour predictorstop4 <- topPredictors(mtcars.rf, n = 4)# Construct partial dependence functions for top four predictorspd <- NULLfor (i in top4) {  tmp <- partial(mtcars.rf, pred.var = i)  names(tmp) <- c("x", "y")  pd <- rbind(pd,  cbind(tmp, predictor = i))}# Display partial dependence functionsggplot(pd, aes(x, y)) +  geom_line() +  facet_wrap(~ predictor, scales = "free") +  theme_bw() +  ylab("mpg")## End(Not run)

Retrieve the last trellis object

Description

Seetrellis.last.object for more details.

Usage

trellis.last.object(..., prefix)

[8]ページ先頭

©2009-2025 Movatter.jp