This tutorial on using theFFTrees package followsthe examples presented inPhillips et al.(2017) (freely available inhtml |PDF):
In the following, we explain how to useFFTrees tocreate, evaluate and visualize FFTs in four simple steps.
We can install FFTrees from CRAN usinginstall.packages(). (We only need to do this once.)
To use the package, we first need to load it into your current Rsession. We load the package usinglibrary():
TheFFTrees package contains several vignettes thatguide through the package’s functionality (like this one). To open themain guide, runFFTrees.guide():
In this example, we will create FFTs from a heart disease data set.The training data are in an object calledheart.train, andthe testing data are in an object calledheart.test. Forthese data, we will predictdiagnosis, a binary criterionthat indicates whether each patient has or does not have heart disease(i.e., is at high-risk or low-risk).
To create anFFTrees object, we use the functionFFTrees() with two main arguments:
formula expects a formula indicating a binarycriterion variable as a function of one or more predictor variable(s) tobe considered for the tree. The shorthandformula = diagnosis ~ . means to include all predictorvariables.
data specifies the training data used to constructthe FFTs (which must include the criterion variable).
Here is how we can construct our first FFTs:
# Create an FFTrees object:heart.fft<-FFTrees(formula = diagnosis~ .,# Criterion and (all) predictorsdata = heart.train,# Training datadata.test = heart.test,# Testing datamain ="Heart Disease",# General labeldecision.labels =c("Low-Risk","High-Risk")# Decision labels (False/True) )Evaluating this expression runs code that examines the data,optimizes thresholds based on our current goals for each cue, andcreates and evaluates 7 FFTs. The resultingFFTrees objectthat contains the tree definitions, their decisions, and theirperformance statistics, are assigned to theheart.fft object.
algorithm: There are two different algorithmsavailable to build FFTs"ifan"(Phillips et al., 2017) and"dfan"(Phillips et al., 2017).("max"(Martignon et al.,2008), and"zigzag"(Martignon et al., 2008) are no longersupported).
max.levels: Changes the maximum number of levelsthat are allowed in the tree.
The following arguments apply when using the “ifan” or “dfan”algorithms for creating new FFTs:
goal.chase: Thegoal.chase argumentchanges which statistic is maximized during tree construction (for the"ifan" and"dfan" algorithms). Possiblearguments include"acc","bacc","wacc","dprime", and"cost". Thedefault is"wacc" with a sensitivity weight of 0.50 (whichrenders it identical to"bacc").
goal: Thegoal argument changes whichstatistic is maximized whenselecting trees after construction(for the"ifan" and"dfan" algorithms).Possible arguments include"acc","bacc","wacc","dprime", and"cost".
my.tree ortree.definitions: We candefine a new tree from a verbal description (as a set of sentences), ormanually specify sets of FFTs as a data frame (in appropriate format).See theManually specifying FFTsvignette for details.
Now we can inspect and summarize the generated decision trees. Wewill start by printing theFFTrees object to return basicinformation to the console:
#> Heart Disease#> FFTrees #> - Trees: 7 fast-and-frugal trees predicting diagnosis#> - Cost of outcomes: hi = 0, fa = 1, mi = 1, cr = 0#> - Cost of cues: #> age sex cp trestbps chol fbs restecg thalach #> 1 1 1 1 1 1 1 1 #> exang oldpeak slope ca thal #> 1 1 1 1 1 #> #> FFT #1: Definition#> [1] If thal = {rd,fd}, decide High-Risk.#> [2] If cp != {a}, decide Low-Risk.#> [3] If ca > 0, decide High-Risk, otherwise, decide Low-Risk.#> #> FFT #1: Training Accuracy#> Training data: N = 150, Pos (+) = 66 (44%) #> #> | | True + | True - | Totals:#> |----------|--------|--------|#> | Decide + | hi 54 | fa 18 | 72#> | Decide - | mi 12 | cr 66 | 78#> |----------|--------|--------|#> Totals: 66 84 N = 150#> #> acc = 80.0% ppv = 75.0% npv = 84.6%#> bacc = 80.2% sens = 81.8% spec = 78.6%#> #> FFT #1: Training Speed, Frugality, and Cost#> mcu = 1.74, pci = 0.87#> cost_dec = 0.200, cost_cue = 1.740, cost = 1.940The output tells us several pieces of information:
The tree with the highest weighted sensitivity waccwith a sensitivity weight of 0.5 is selected as the best tree.
Here, the best tree, FFT #1 uses three cues:thal,cp, andca.
Several summary statistics for this tree in training and testdata are summarized.
All statistics to evaluate each tree can be derived from a 2 x 2confusion table:
Table 1: A 2x2 confusion table illustrating the typesof frequency counts for 4 possible outcomes.
For definitions of all accuracy statistics, see theaccuracy statisticsvignette.
We useplot(x) to visualize an FFT (froman FFTrees object x). Usingdata = "train" evaluates an FFT for training data(fitting), whereasdata = "test" predicts the performanceof an FFT for a different dataset:
# Plot predictions of the best FFT when applied to test data:plot(heart.fft,# An FFTrees objectdata ="test")# data to use (i.e., either "train" or "test")?Theplot() function forFFTrees object
tree: Which tree in the object should beplotted? Toplot a tree other than the best fitting tree (FFT #1), just specifyanother tree as an integer (e.g.;plot(heart.fft, tree = 2)).
data: For which dataset should statistics be shown?Eitherdata = "train" (showing fitting or “Training”performance by default), ordata = "test" (showingprediction or “Testing” performance).
stats: Should accuracy statistics be shown with thetree? To show only the tree, without any performance statistics, includethe argumentstats = FALSE.
comp: Should statistics from competitive algorithmsbe shown in the ROC curve? To remove the performance statistics ofcompetitive algorithms (e.g.; regression, random forests), include theargumentcomp = FALSE.
what: Which parts of anFFTrees objectshould be visualized (e.g.,all,icontree andtree). Usingwhat = "roc" plots treeperformance as an ROC curve. To show individual cue accuracies (in ROCspace), specifywhat = "cues":
#> Plotting cue training statistics:#> — Cue accuracies ranked by baccSee thePlotting FFTrees vignette fordetails on plotting FFTs.
Creating sets of FFTs and evaluating them on data by printing andplotting individual FFTs provides the core functionality ofFFTrees. However, the package also provides moreadvanced functions for accessing, defining, using and evaluatingFFTs.
AnFFTrees object contains many different outputs. Basicperformance information on the current data and set of FFTs is availableby thesummary() function. To see and access parts of anFFTrees object, usestr() ornames():
#> [1] "criterion_name" "cue_names" "formula" "trees" #> [5] "data" "params" "competition" "cues"Key elements of anFFTrees object are explained in thevignette onCreating FFTs withFFTrees().
To predict classification outcomes for new data, use the standardpredict() function. For example, here’s how to predict theclassifications for data in theheartdisease object (whichactually is just a combination ofheart.train andheart.test):
To define a specific FFT and apply it to data, we can define a treeby providing its verbal description to themy.treeargument. Similarly, we can define sets of FFT definitions (as a dataframe) and evaluate them on data by using thetree.definitions argument ofFFTrees(). As weoften start from an existing set of FFTs,FFTreesprovides a set of functions for extracting, converting, and modifyingtree definitions.
See the vignette onManually specifyingFFTs for defining FFTs from descriptions and modifying treedefinitions.
Here is a complete list of the vignettes available in theFFTrees package:
| Vignette | Description | |
|---|---|---|
| Main guide: FFTreesoverview | An overview of theFFTreespackage | |
| 1 | Tutorial: FFTs for heartdisease | An example of usingFFTrees() to modelheart disease diagnosis |
| 2 | Accuracystatistics | Definitions of accuracy statistics used throughout thepackage |
| 3 | Creating FFTs withFFTrees() | Details on the mainFFTrees()function |
| 4 | Manually specifyingFFTs | How to directly create FFTs without using the built-inalgorithms |
| 5 | Visualizing FFTs | PlottingFFTrees objects, from full treesto icon arrays |
| 6 | Examples ofFFTs | Examples of FFTs from different datasets contained inthe package |