Movatterモバイル変換


[0]ホーム

URL:


Tutorial: Creating FFTs for heartdisease

Nathaniel Phillips and Hansjörg Neth

2025-09-03

Tutorial: Creating FFTs for heart disease

This tutorial on using theFFTrees package followsthe examples presented inPhillips et al.(2017) (freely available inhtml |PDF):

In the following, we explain how to useFFTrees tocreate, evaluate and visualize FFTs in four simple steps.

Step 1: Install and load the FFTrees package

We can install FFTrees from CRAN usinginstall.packages(). (We only need to do this once.)

# Install the package from CRAN:install.packages("FFTrees")

To use the package, we first need to load it into your current Rsession. We load the package usinglibrary():

# Load the package:library(FFTrees)

TheFFTrees package contains several vignettes thatguide through the package’s functionality (like this one). To open themain guide, runFFTrees.guide():

# Open the main package guide:FFTrees.guide()

Step 2: Create FFTs from training data (and test on testingdata)

In this example, we will create FFTs from a heart disease data set.The training data are in an object calledheart.train, andthe testing data are in an object calledheart.test. Forthese data, we will predictdiagnosis, a binary criterionthat indicates whether each patient has or does not have heart disease(i.e., is at high-risk or low-risk).

To create anFFTrees object, we use the functionFFTrees() with two main arguments:

  1. formula expects a formula indicating a binarycriterion variable as a function of one or more predictor variable(s) tobe considered for the tree. The shorthandformula = diagnosis ~ . means to include all predictorvariables.

  2. data specifies the training data used to constructthe FFTs (which must include the criterion variable).

Here is how we can construct our first FFTs:

# Create an FFTrees object:heart.fft<-FFTrees(formula = diagnosis~ .,# Criterion and (all) predictorsdata = heart.train,# Training datadata.test = heart.test,# Testing datamain ="Heart Disease",# General labeldecision.labels =c("Low-Risk","High-Risk")# Decision labels (False/True)                     )

Evaluating this expression runs code that examines the data,optimizes thresholds based on our current goals for each cue, andcreates and evaluates 7 FFTs. The resultingFFTrees objectthat contains the tree definitions, their decisions, and theirperformance statistics, are assigned to theheart.fft object.

Other arguments

  • algorithm: There are two different algorithmsavailable to build FFTs"ifan"(Phillips et al., 2017) and"dfan"(Phillips et al., 2017).("max"(Martignon et al.,2008), and"zigzag"(Martignon et al., 2008) are no longersupported).

  • max.levels: Changes the maximum number of levelsthat are allowed in the tree.

The following arguments apply when using the “ifan” or “dfan”algorithms for creating new FFTs:

  • goal.chase: Thegoal.chase argumentchanges which statistic is maximized during tree construction (for the"ifan" and"dfan" algorithms). Possiblearguments include"acc","bacc","wacc","dprime", and"cost". Thedefault is"wacc" with a sensitivity weight of 0.50 (whichrenders it identical to"bacc").

  • goal: Thegoal argument changes whichstatistic is maximized whenselecting trees after construction(for the"ifan" and"dfan" algorithms).Possible arguments include"acc","bacc","wacc","dprime", and"cost".

  • my.tree ortree.definitions: We candefine a new tree from a verbal description (as a set of sentences), ormanually specify sets of FFTs as a data frame (in appropriate format).See theManually specifying FFTsvignette for details.

Step 3: Inspect and summarize FFTs

Now we can inspect and summarize the generated decision trees. Wewill start by printing theFFTrees object to return basicinformation to the console:

# Print an FFTrees object:heart.fft
#> Heart Disease#> FFTrees #> - Trees: 7 fast-and-frugal trees predicting diagnosis#> - Cost of outcomes:  hi = 0,  fa = 1,  mi = 1,  cr = 0#> - Cost of cues: #>      age      sex       cp trestbps     chol      fbs  restecg  thalach #>        1        1        1        1        1        1        1        1 #>    exang  oldpeak    slope       ca     thal #>        1        1        1        1        1 #> #> FFT #1: Definition#> [1] If thal = {rd,fd}, decide High-Risk.#> [2] If cp != {a}, decide Low-Risk.#> [3] If ca > 0, decide High-Risk, otherwise, decide Low-Risk.#> #> FFT #1: Training Accuracy#> Training data: N = 150, Pos (+) = 66 (44%) #> #> |          | True + | True - | Totals:#> |----------|--------|--------|#> | Decide + | hi  54 | fa  18 |      72#> | Decide - | mi  12 | cr  66 |      78#> |----------|--------|--------|#>   Totals:        66       84   N = 150#> #> acc  = 80.0%   ppv  = 75.0%   npv  = 84.6%#> bacc = 80.2%   sens = 81.8%   spec = 78.6%#> #> FFT #1: Training Speed, Frugality, and Cost#> mcu = 1.74,  pci = 0.87#> cost_dec = 0.200,  cost_cue = 1.740,  cost = 1.940

The output tells us several pieces of information:

All statistics to evaluate each tree can be derived from a 2 x 2confusion table:

**Table 1**: A 2x2 confusion table illustrating the types of frequency counts for 4 possible outcomes.

Table 1: A 2x2 confusion table illustrating the typesof frequency counts for 4 possible outcomes.

For definitions of all accuracy statistics, see theaccuracy statisticsvignette.

Step 4: Visualise the final FFT

We useplot(x) to visualize an FFT (froman FFTrees object x). Usingdata = "train" evaluates an FFT for training data(fitting), whereasdata = "test" predicts the performanceof an FFT for a different dataset:

# Plot predictions of the best FFT when applied to test data:plot(heart.fft,# An FFTrees objectdata ="test")# data to use (i.e., either "train" or "test")?

Other arguments

Theplot() function forFFTrees object

  • tree: Which tree in the object should beplotted? Toplot a tree other than the best fitting tree (FFT #1), just specifyanother tree as an integer (e.g.;plot(heart.fft, tree = 2)).

  • data: For which dataset should statistics be shown?Eitherdata = "train" (showing fitting or “Training”performance by default), ordata = "test" (showingprediction or “Testing” performance).

  • stats: Should accuracy statistics be shown with thetree? To show only the tree, without any performance statistics, includethe argumentstats = FALSE.

# Plot only the tree, without accuracy statistics:plot(heart.fft,what ="tree")

# plot(heart.fft, stats = FALSE)  #  The 'stats' argument has been deprecated.
  • comp: Should statistics from competitive algorithmsbe shown in the ROC curve? To remove the performance statistics ofcompetitive algorithms (e.g.; regression, random forests), include theargumentcomp = FALSE.

  • what: Which parts of anFFTrees objectshould be visualized (e.g.,all,icontree andtree). Usingwhat = "roc" plots treeperformance as an ROC curve. To show individual cue accuracies (in ROCspace), specifywhat = "cues":

# Plot cue accuracies (for training data) in ROC space:plot(heart.fft,what ="cues")
#> Plotting cue training statistics:#> — Cue accuracies ranked by bacc

See thePlotting FFTrees vignette fordetails on plotting FFTs.

Advanced functions

Creating sets of FFTs and evaluating them on data by printing andplotting individual FFTs provides the core functionality ofFFTrees. However, the package also provides moreadvanced functions for accessing, defining, using and evaluatingFFTs.

Accessing outputs

AnFFTrees object contains many different outputs. Basicperformance information on the current data and set of FFTs is availableby thesummary() function. To see and access parts of anFFTrees object, usestr() ornames():

# Show the names of all outputs in heart.fft:names(heart.fft)
#> [1] "criterion_name" "cue_names"      "formula"        "trees"         #> [5] "data"           "params"         "competition"    "cues"

Key elements of anFFTrees object are explained in thevignette onCreating FFTs withFFTrees().

Predicting for new data

To predict classification outcomes for new data, use the standardpredict() function. For example, here’s how to predict theclassifications for data in theheartdisease object (whichactually is just a combination ofheart.train andheart.test):

# Predict classifications for a new dataset:predict(heart.fft,newdata = heartdisease)

Directly defining FFTs

To define a specific FFT and apply it to data, we can define a treeby providing its verbal description to themy.treeargument. Similarly, we can define sets of FFT definitions (as a dataframe) and evaluate them on data by using thetree.definitions argument ofFFTrees(). As weoften start from an existing set of FFTs,FFTreesprovides a set of functions for extracting, converting, and modifyingtree definitions.

See the vignette onManually specifyingFFTs for defining FFTs from descriptions and modifying treedefinitions.

Vignettes

Here is a complete list of the vignettes available in theFFTrees package:

VignetteDescription
Main guide: FFTreesoverviewAn overview of theFFTreespackage
1Tutorial: FFTs for heartdiseaseAn example of usingFFTrees() to modelheart disease diagnosis
2AccuracystatisticsDefinitions of accuracy statistics used throughout thepackage
3Creating FFTs withFFTrees()Details on the mainFFTrees()function
4Manually specifyingFFTsHow to directly create FFTs without using the built-inalgorithms
5Visualizing FFTsPlottingFFTrees objects, from full treesto icon arrays
6Examples ofFFTsExamples of FFTs from different datasets contained inthe package

References

Martignon, L., Katsikopoulos, K. V., & Woike, J. K. (2008).Categorization with limited resources: A family of simple heuristics.Journal of Mathematical Psychology,52(6), 352–361.https://doi.org/10.1016/j.jmp.2008.04.003
Phillips, N. D., Neth, H., Woike, J. K., & Gaissmaier, W. (2017).FFTrees: A toolbox to create, visualize, andevaluate fast-and-frugal decision trees.Judgment andDecision Making,12(4), 344–368.https://doi.org/10.1017/S1930297500006239

[8]ページ先頭

©2009-2025 Movatter.jp