Movatterモバイル変換


[0]ホーム

URL:


Visualising FFTs

Nathaniel Phillips and Hansjörg Neth

2025-09-03

Visualizing FFTrees

TheFFTrees package makes it very easy to visualizeand evaluate fast-and-frugal trees (FFTs):

The two key arguments for plotting arewhat andtree: Whereas thetree argument allowsselecting between different trees in x (usingtree = 1 by default), thewhat argumentdistinguishes between five main types of plots:

  1. plot(x, what = 'all') visualizes a tree andcorresponding performance statistics. This is also the default whenevaluatingplot(x).

  2. plot(x, what = 'tree') visualizes only the treediagram of the selected tree (without performance statistics).

  3. plot(x, what = 'icontree') visualizes the treediagram of the selected tree with icon arrays on exit nodes (withadditional options forshow.iconguide andn.per.icon.

  4. plot(x, what = 'cues') visualizes the current cueaccuracies in ROC space (by calling theshowcues()function).

  5. plot(x, what = 'roc') visualizes a performancecomparison of FFTs and competing algorithms in ROC space.

The other arguments of theplot.FFTrees() function allowfurther customization of the plot (e.g., by defining labels andparameters, or selectively hiding or showing elements).

In the following, we illustrate both ways by creating FFTs based onthetitanic data (included in theFFTreespackage).

The Titanic data

Thetitanic dataset contains basic survival statisticsofTitanic passengers. For each passenger, we know in whichclass s/he traveled, as well as binary categories specifying age, sex,and survival information. To get a first impression, we inspect a randomsample of cases:

set.seed(12)# reproducible randomnessrcases<-sort(sample(1:nrow(titanic),10))# Sample of data:knitr::kable(titanic[rcases, ],caption ="A sample of 10 observations from the `titanic` data.")
A sample of 10 observations from thetitanicdata.
classagesexsurvived
82firstadultmaleFALSE
91firstadultmaleFALSE
336secondadultmaleTRUE
346secondadultmaleFALSE
450secondadultmaleFALSE
546secondadultfemaleTRUE
1093thirdadultfemaleTRUE
1160thirdadultfemaleFALSE
1271thirdchildmaleFALSE
1500crewadultmaleTRUE

Our current goal is to fit FFTs to this dataset. This essentiallyasks:

First, let’s create anFFTrees object (calledtitanic.fft) from thetitanic dataset:

# Create FFTs for the titanic data:titanic.fft<-FFTrees(formula = survived~.,data = titanic,main ="Surviving the Titanic",decision.labels =c("Died","Survived"))

Note that we used the entiretitanic data (i.e., all2201 cases) to traintitanic.fft, rather than specifyingtrain.p to set aside some proportion of it or specifying adedicateddata.test set for predictive purposes. Thisimplies that our present goal isto fit FFTs to the historicdata, rather than on create and use FFTsto predict newcases.

Visualising cue accuracies

We can visualize individual cue accuracies (specifically theirsensitivities and specificities) by including thewhat = 'cues' argument within theplot()function. Let’s apply the function to thetitanic.fftobject to see how accurate each of the cues were on their own inpredicting survival:

plot(titanic.fft,what ="cues",main ="Cues predicting Titanic survival")
#> Plotting cue training statistics:#> — Cue accuracies ranked by bacc
**Figure 1**: Cue accuracies of FFTs predicting survival in the `titanic` dataset.

Figure 1: Cue accuracies of FFTs predicting survival inthetitanic dataset.

Given the axes of this plot, good performing cues should be near thetop left corner of the graph (i.e., exhibit both a low false alarm rateand a high hit rate). For thetitanic data, this impliesthatnone of the cues predicts very well on its own. The bestindividual cue appears to besex (indicated as 1), followedbyclass (2). By contrast,age (3) seems apretty poor cue for predicting survival on its own (despite itsspecificity of 97%).

Inspecting cue accuracies can provide valuable information forconstructing FFTs. While they provide lower bounds on the performance oftrees (as combining cues is only worthwhile when this yields a benefit),even poor individual cues can shine in combination with otherpredictors.

Visualizing FFTs and their performance

To visualize the tree from anFFTrees object, useplot(). Let’s plot one of the trees (Tree #1, i.e., thebest one, given our current goal):

plot(titanic.fft,tree =1)
**Figure 2**: Plotting the best FFT of an `FFTrees` object.

Figure 2: Plotting the best FFT of anFFTrees object.

The resulting plot visualizes one out ofrtitanic.fft\(trees\)n\ possible trees in thetitanic.fftobject. Astree=1corresponds to the best tree given our currentgoalfor selecting FFTs, we could have plotted the same tree by specifyingtree= ‘best.train’`.

AsFigure 2 contains a lot of information in threedistinct panels, let’s briefly consider their contents:

  1. Basic dataset information: The top row of the plot showsbasic information on the current dataset: Its population size (N) andthe baseline frequencies of the two categories of the criterionvariable.

  2. FFT and classification performance: The middle row showsthe tree (in the center) as well as how many cases (here: persons) wereclassified at each level in the tree (on either side). For example, thecurrent tree (Tree #1 of 4) can be understood as:

    • If a person is female, decide that they survived.
    • Otherwise, if a person is neither in first nor in second class,decide that they died.
    • Finally, if the person is a child, predict they survived, otherwisedecide that they died.
  3. Accuracy and performance information: The bottom rowshows general performance statistics of the FFT:
    As our models intitanic.fft were trained on the entiretitanic dataset, we fitted FFTs to its 2201 cases, ratherthan setting aside some data for predictive purposes. The panel labelreflects this important distinction:

The bottom panel provides performance information and is structuredinto three subpanels:

  1. The classification table (on the left) shows the relationshipbetween the true criterion states (as columns) and predicted decisions(as rows). The abbreviationshi (hits) andcr (Correctrejections) denote correct decisions;mi (misses) andfa (false-alarms) denote incorrect decisions.

  2. A range of vertical levels (in the middle) show the tree’scumulative performance in terms of two frugality measures(mcu andpci) and various accuracy measures(sensitivity, specificity, accuracy, and balanced accuracy (seeAccuracy statistics fordetails).

  3. Finally, the plot (on the right) shows an ROC curve comparing theperformance of all trees in theFFTrees object.Additionally, the performance of logistic regression (blue) and CART(red) are shown. The tree plotted in the middle panel is highlighted ina solid green color (i.e., Figure 2 shows Tree #1).

Additional arguments

Specifying additional arguments ofplot() changes whatand how various elements are being displayed.

The following examples illustrate the wide range of correspondingplots:

# Plot tree diagram with icon arrays:plot(titanic.fft,what ="icontree",n.per.icon =50,show.iconguide =TRUE)
**Figure 3**: An FFT diagram with icon arrays on exit nodes.

Figure 3: An FFT diagram with icon arrays on exitnodes.

# Plot only the performance comparison in ROC space:plot(titanic.fft,what ="roc")
**Figure 4**: Performance comparison of FFTs in ROC space.

Figure 4: Performance comparison of FFTs in ROC space.

# Hide some elements of the FFT plot:plot(titanic.fft,show.icons =FALSE,# hide iconsshow.iconguide =FALSE,# hide icon guideshow.header =FALSE# hide header     )
**Figure 5**: Plotting selected elements.

Figure 5: Plotting selected elements.

As thedata andtree arguments can bothrefer to datasets used for training or fitting (i.e., the “train” or“test” sets), they should be specified consistently. For instance, thefollowing command would visualize the best training tree intitanic.fft:

plot(titanic.fft,tree ="best.train")

asdata = "train" by default. However, the followinganalog expression would fail:

plot(titanic.fft,tree ="best.test")

for two distinct reasons:

  1. Whendata remains unspecified, its default isdata = "train". Thus, asking fortree = "best.test" would require switching todata = "test".

  2. More crucially,titanic.fft was created without anytest data. Hence, asking for the best test tree does not make sense —which is whyplot() will show the best training tree (witha warning).

Plotting performance for new data

Shifting our emphasis from fitting to prediction, we primarily needto specify some test data that was not used to train theFFTrees object.When predicting performance for a new dataset (e.g.;data = test.data), the plotting and printing functions willautomatically apply an existingFFTrees object to the newdata and compute corresponding performance statistics (using thefftrees_apply() function). However, when applying existingFFTs to new data, the changes to theFFTrees object are notstored in the input object, unless the (invisible) output ofplot.FFTrees() orprint.FFTrees() isre-assigned to that object. The best way to fit FFTs to training dataand evaluate them to test data is to explicitly include both datasets inthe originalFFTrees() command by either using itsdata.test or itstrain.p argument.

For example, we can repeat the previous analysis, but now let’screate separate training and test datasets by including thetrain.p = .50 argument. This will split the dataset into a50% training set, and a distinct 50% testing set. (Alternatively, wecould specify a dedicated test data set by using thedata.test argument.)

set.seed(100)# for replicability of the training/test splittitanic.pred.fft<-FFTrees(formula = survived~.,data = titanic,train.p = .50,# use 50% to train, 50% to testmain ="Titanic",decision.labels =c("Died","Survived")                            )

Here is the best training tree applied to thetrainingdata:

# print(titanic.pred.fft, tree = 1)plot(titanic.pred.fft,tree =1)
**Figure 6**: Plotting the best FFT on _training_ data.

Figure 6: Plotting the best FFT ontrainingdata.

Tree #1 is the best training tree — and could also be visualized byplot(titanic.pred.fft, tree = "best.train"). This tree hasa high specificity of 92%, but a much lower sensitivity of just 51%. Theoverall accuracy of the tree’s classifications is at 79%, which exceedsthe baseline, but is far from perfect. However, as we can see in theROC table, a logistic regression (LR) would not perform much better, andCART performed even worse than Tree #1.

Now let’s inspect the performance of the same tree on thetest data:

# print(titanic.pred.fft, data = "test", tree = 1)plot(titanic.pred.fft,data ="test",tree =1)
**Figure 7**: Plotting the best FFT on _test_ data.

Figure 7: Plotting the best FFT ontest data.

We could have visualized the same tree by asking forplot(titanic.pred.fft, data = "test", tree = "best.test").Note that the label of the bottom panel has now switched from “Accuracy(Training)” to “Accuracy (Testing)”. Both the sensitivity andspecificity values have decreased somewhat, which is typical when usinga model (fitted on training data) for predicting new (test) data.

Let’s visualize the prediction performance of Tree #2, the mostliberal tree (i.e., with the highest sensitivity):

plot(titanic.pred.fft,data ="test",tree =2)
**Figure 8**: Plotting Tree #2.

Figure 8: Plotting Tree #2.

This alternative tree has a better sensitivity (of 63%), but itsoverall accuracy decreased to about baseline level (of 67%).

Whereas comparing training with test performance illustrates thetrade-offs between mere fitting and genuine predictive modeling,comparing the performance details of various FFTs illustrates thetypical trade-offs that any model for solving binary classificationproblems engages in. Importantly, both types of trade-offs are renderedtransparent when usingFFTrees.

Vignettes

Here is a complete list of the vignettes available in theFFTrees package:

VignetteDescription
Main guide: FFTreesoverviewAn overview of theFFTreespackage
1Tutorial: FFTs for heartdiseaseAn example of usingFFTrees() to modelheart disease diagnosis
2AccuracystatisticsDefinitions of accuracy statistics used throughout thepackage
3Creating FFTs withFFTrees()Details on the mainFFTrees()function
4Manually specifyingFFTsHow to directly create FFTs without using the built-inalgorithms
5Visualizing FFTsPlottingFFTrees objects, from full treesto icon arrays
6Examples ofFFTsExamples of FFTs from different datasets contained inthe package

[8]ページ先頭

©2009-2025 Movatter.jp