Movatterモバイル変換

FFTrees 2.1.0

The R packageFFTrees creates, visualizes andevaluatesfast-and-frugal decision trees (FFTs) for solvingbinary classification tasks, using the algorithms and methods describedin Phillips, Neth, Woike & Gaissmaier (2017,10.1017/S1930297500006239).

What are fast-and-frugaltrees (FFTs)?

Fast-and-frugal trees (FFTs) are simple and transparentdecision algorithms for solving binary classification problems. The keyfeature making FFTs faster and more frugal than other decision trees isthat every node allows making a decision. When predicting novel cases,the performance of FFTs competes with more complex algorithms andmachine learning techniques, such as logistic regression (LR),support-vector machines (SVM), and random forests (RF). Apart from beingfaster and requiring less information, FFTs tend to be robust againstoverfitting, and are easy to interpret, use, and communicate.

Installation

The latest release ofFFTrees is available fromCRAN athttps://CRAN.R-project.org/package=FFTrees:

install.packages("FFTrees")

The current development version can be installed from itsGitHub repository athttps://github.com/ndphillips/FFTrees:

# install.packages("devtools")devtools::install_github("ndphillips/FFTrees",build_vignettes =TRUE)

Getting started

As an example, let’s create a FFT predicting patients’ heart diseasestatus (Healthy vs. Disease) based on theheartdisease dataset included inFFTrees:

library(FFTrees)# load package

Using data

Theheartdisease data provides medical information for303 patients that were examined for heart disease. The full datacontains a binary criterion variable describing the true state of eachpatient and were split into two subsets: Aheart.train setfor fitting decision trees, andheart.test set for atesting these trees. Here are the first rows and columns of both subsetsof theheartdisease data:

heart.train (the training / fitting data) describes 150patients:

diagnosis	age	sex	cp	trestbps	chol	restecg	thalach	exang	oldpeak	slope	ca	thal
FALSE	44	0	np	108	141	normal	175	0	0.6	flat	0	normal
FALSE	51	0	np	140	308	hypertrophy	142	0	1.5	up	1	normal
FALSE	52	1	np	138	223	normal	169	0	0.0	up	1	normal
TRUE	48	1	aa	110	229	normal	168	0	1.0	down	0	rd
FALSE	59	1	aa	140	221	normal	164	1	0.0	up	0	normal
FALSE	58	1	np	105	240	hypertrophy	154	1	0.6	flat	0	rd

Table 1: Beginning of theheart.trainsubset (using the data of 150 patients for fitting/training FFTs).

heart.test (the testing / prediction data) describes153 different patients on the same variables:

diagnosis	age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	ca	thal
FALSE	51	0	np	120	295	0	hypertrophy	157	0	0.6	up	0	normal
TRUE	45	1	ta	110	264	0	normal	132	0	1.2	flat	0	rd
TRUE	53	1	a	123	282	0	normal	95	1	2.0	flat	2	rd
TRUE	45	1	a	142	309	0	hypertrophy	147	1	0.0	flat	3	rd
FALSE	66	1	a	120	302	0	hypertrophy	151	0	0.4	flat	0	normal
TRUE	48	1	a	130	256	1	hypertrophy	150	1	0.0	up	2	rd

Table 2: Beginning of theheart.testsubset (used to predictdiagnosis for 153 newpatients).

Our challenge is to predict each patient’sdiagnosis — acolumn of logical values indicating the true state of each patient(i.e.,TRUE or FALSE, based on the patientsuffering or not suffering from heart disease) — from the values ofpotential predictors.

Questions answered by FFTs

To solve binary classification problems by FFTs, we must answer twokey questions:

Which of the variables should we use to predict the criterion?
How should we use and combine predictor variables into FFTs?

Once we have created some FFTs, additional questions include:

How accurate are the predictions of a specific FFT?
How costly are the predictions of each algorithm?

TheFFTrees package answers these questions bycreating, evaluating, and visualizing FFTs.

Creating fast-and-frugaltrees (FFTs)

We use the mainFFTrees() function to create FFTs fortheheart.train data and evaluate their predictiveperformance on theheart.test data:

The mainFFTrees() function allows creating anFFTrees object for theheartdisease data:

# Create an FFTrees object from the heartdisease data:heart_fft<-FFTrees(formula = diagnosis~.,data = heart.train,data.test = heart.test,decision.labels =c("Healthy","Disease"))

EvaluatingFFTrees() analyzes the training data, createsseveral FFTs, and applies them to the test data. The results are storedin an objectheart_fft, which can be printed, plotted andsummarized (with options for selecting specific data or trees).

Let’s plot ourFFTrees object to visualize a tree andits predictive performance (on thetest data):

# Plot the best tree applied to the test data:plot(heart_fft,data ="test",main ="Heart Disease")

Figure 1: A fast-and-frugal tree (FFT) predictingheart disease fortest data and its performancecharacteristics.

A summary of the trees in ourFFTrees object and theirkey performance statistics can be obtained bysummary(heart_fft).

Building FFTs fromverbal descriptions

FFTs are so simple that we even can create them ‘from words’ and thenapply them to data.
For example, let’s create a tree with the following three nodes andevaluate its performance on theheart.test data:

Ifsex = 1, predictDisease.
Ifage < 45, predictHealthy.
Ifthal = {fd, normal}, predictHealthy,
otherwise, predictDisease.

These conditions can directly be supplied to themy.treeargument ofFFTrees():

# Create custom FFT 'in words' and apply it to test data:# 1. Create my own FFT (from verbal description):my_fft<-FFTrees(formula = diagnosis~.,data = heart.train,data.test = heart.test,decision.labels =c("Healthy","Disease"),my.tree ="If sex = 1, predict Disease.                             If age < 45, predict Healthy.                             If thal = {fd, normal}, predict Healthy,                             Otherwise, predict Disease.")# 2. Plot and evaluate my custom FFT (for test data):plot(my_fft,data ="test",main ="My custom FFT")

Figure 2: An FFT predicting heart disease createdfrom a verbal description.

The performance measures (in the bottom panel ofFigure 2) show that this particular tree is somewhatbiased: It has nearly perfectsensitivity (i.e., is good atidentifying cases ofDisease) but suffers from lowspecificity (i.e., performs poorly in identifyingHealthy cases). Expressed in terms of its errors,my_fft incurs few misses at the expense of many falsealarms. Although theaccuracy of our custom tree still exceedsthe data’s baseline by a fair amount, the FFTs inheart_fft(created above) strike a better balance.

Overall, what counts as the “best” tree for a particular problemdepends on many factors (e.g., the goal of fitting vs. predicting dataand the trade-offs between maximizing accuracy vs. incorporating thecosts of cues or errors). To explore this range of options, theFFTrees package enables us to design and evaluate arange of FFTs.

Resources

The following versions ofFFTrees and correspondingresources are available:

Type:	Version:	URL:
A.FFTrees (Rpackage):	Releaseversion	https://CRAN.R-project.org/package=FFTrees
	Developmentversion	https://github.com/ndphillips/FFTrees
B. Other resources:	Onlinedocumentation	https://www.nathanieldphillips.co/FFTrees/
	Online demo(running v1.3.3)	https://econpsychbasel.shinyapps.io/shinyfftrees/

References

We had fun creating theFFTrees package and hope youlike it too! As a comprehensive, yet accessible introduction to FFTs, werecommend our article in the journalJudgment and DecisionMaking (2017), entitledFFTrees: A toolbox to create, visualize,and evaluate fast-and-frugaldecision trees (available inhtml |PDF ).

Citation (in APA format):

Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017).FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugaldecision trees.Judgment and Decision Making,12 (4),344–368. doi 10.1017/S1930297500006239

We encourage you to read the article to learn more about the historyof FFTs and how theFFTrees package creates,visualizes, and evaluates them. When usingFFTrees inyour own work, please cite us and share your experiences (e.g.,on GitHub) so wecan continue developing the package.

By 2025, over 150 scientific publications have used or citedFFTrees (seeGoogleScholar for the full list). Examples include:

Lötsch, J., Haehner, A., & Hummel, T. (2020).Machine-learning-derived rules set excludes risk of Parkinson’s diseasein patients with olfactory or gustatory symptoms with high accuracy.Journal of Neurology,267(2), 469–478. doi 10.1007/s00415-019-09604-6
Kagan, R., Parlee, L., Beckett, B., Hayden, J. B., Gundle, K. R.,& Doung, Y. C. (2020). Radiographic parameter-driven decision treereliably predicts aseptic mechanical failure of compressiveosseointegration fixation.Acta Orthopaedica,91(2),171–176. doi 10.1080/17453674.2020.1716295
Klement, R. J., Sonke, J. J., Allgäuer, M., Andratschke, N.,Appold, S., Belderbos, J., … & Mantel, F. (2020). Correlating dosevariables with local tumor control in stereotactic body radiotherapy forearly stage non-small cell lung cancer: A modeling study on 1500individual treatments.International Journal of Radiation Oncology *Biology * Physics. doi 10.1016/j.ijrobp.2020.03.005
Nobre, G. G., Hunink, J. E., Baruth, B., Aerts, J. C., &Ward, P. J. (2019). Translating large-scale climate variability intocrop production forecast in Europe.Scientific Reports,9(1), 1–13. doi 10.1038/s41598-018-38091-4
Buchinsky, F. J., Valentino, W. L., Ruszkay, N., Powell, E.,Derkay, C. S., Seedat, R. Y., … & Mortelliti, A. J. (2019). Age atdiagnosis, but not HPV type, is strongly associated with clinical coursein recurrent respiratory papillomatosis.PloS One,14(6). doi 10.1371/journal.pone.0216697

[FileREADME.Rmd last updated on 2025-09-03.]

[8]ページ先頭