Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Hypothesis Testing Tree
Version:0.1.2
Date:2023-03-05
Author:Jiaqi Hu [cre, aut], Zhe Gao [aut], Bo Zhang [aut], Xueqin Wang [aut]
Maintainer:Jiaqi Hu <hujiaqi@mail.ustc.edu.cn>
Description:A novel decision tree algorithm in the hypothesis testing framework. The algorithm examines the distribution difference between two child nodes over all possible binary partitions. The test statistic of the hypothesis testing is equivalent to the generalized energy distance, which enables the algorithm to be more powerful in detecting the complex structure, not only the mean difference. It is applicable for numeric, nominal, ordinal explanatory variables and the response in general metric space of strong negative type. The algorithm has superior performance compared to other tree models in type I error, power, prediction accuracy, and complexity.
License:GPL-3
VignetteBuilder:knitr
Depends:R (≥ 3.5.0)
Suggests:knitr, rmarkdown, MASS
Imports:Rcpp (≥ 1.0.6), ggraph, igraph, ggplot2
LinkingTo:Rcpp
RoxygenNote:7.2.3
Encoding:UTF-8
NeedsCompilation:yes
Packaged:2023-03-12 14:10:03 UTC; hujq
Repository:CRAN
Date/Publication:2023-03-12 14:30:02 UTC

Energy efficiency dataset

Description

The data is about energy performance of buildings, containing eight input variables: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution and two output variables: heating load (HL) and cooling load (CL) of residential buildings. The goal is to predict two real valued responses from eight input variables. It can also be used as a multi-class classification problem if the response is rounded to the nearest integer.

Usage

data("ENB")

Format

A data frame with 768 observations on the following 10 variables.

X1

Relative Compactness

X2

Surface Area

X3

Wall Area

X4

Roof Area

X5

Overall Height

X6

Orientation

X7

Glazing Area

X8

Glazing Area Distribution

Y1

Heating Load

Y2

Cooling Load

Source

UCI Machine Learning Repository:https://archive.ics.uci.edu/ml/datasets/Energy+efficiency.

References

A. Tsanas, A. Xifara: 'Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools', Energy and Buildings, Vol. 49, pp. 560-567, 2012

Examples

data(ENB)set.seed(1)idx = sample(1:nrow(ENB), floor(nrow(ENB)*0.8))train = ENB[idx, ]test = ENB[-idx, ]htt_enb = HTT(cbind(Y1, Y2) ~ . , data = train, controls = htt_control(pt = 0.05, R = 99))# predictionpred = predict(htt_enb, newdata = test)test_y = test[, 9:10]# MAEcolMeans(abs(pred - test_y))# MSEcolMeans(abs(pred - test_y)^2)

Hypothesis Testing Tree

Description

Fit a hypothesis testing tree.

Usage

HTT(formula, data, method, distance, controls = htt_control(...), ...)

Arguments

formula

a symbolic description of the model to be fit.

data

a data frame containing the variables in the model.

method

"regression" or"classification".Ifmethod is missing then the routine tries to make an intelligent guess.IfY is factor, thenmethod = "classification".IfY is numeric vector or numeric matrix, thenmethod = "classification".

distance

Ifdistance is missing, then Euclidean distance with exponent alpha is used for regression tree,0-1 distance is used for classification tree.Otherwise, use thedistance as the distance matrix ofY.

controls

a list of options that control details of theHTT algorithm. Seehtt_control.

...

arguments passed tohtt_control.

Details

Hypothesis testing trees examines the distribution difference over two child nodes by the binary partitioning in a hypothesis testing framework. At each split, it finds the maximum distribution difference over all possible binary partitions, the test statistic is based on generalized energy distance. The permutation test is used to estimate the p-value of the hypothesis testing.

Value

An object of classhtt. Seehtt.object.

Author(s)

Jiaqi Hu

See Also

htt_control,print.htt,plot.htt,predict.htt

Examples

## regressiondata("Boston", package = "MASS")Bostonhtt <- HTT(medv ~ . , data = Boston, controls = htt_control(R = 99))plot(Bostonhtt)mean((Boston$medv - predict(Bostonhtt))^2)## classificationirishtt <- HTT(Species ~., data = iris)plot(irishtt)mean(iris$Species == predict(irishtt))

Hypothesis Testing Tree Object

Description

A class for representing hypothesis testing tree.

Value

frame

a dataframe about the split information. It contains following information:

node: the node numbers in a split order.

parent: the parent node number.

leftChild: the left daughter node number,NA represents leaf.

rightChild: the right daughter node number,NA represents leaf.

statistic: the maximum test statistic of all possible splits within the node.

split: the rule of the split. It is numeric for continuous covariate split,it is a character for non-numeric covariate split, the levels of two child nodes arestored in two braces.

pval: approximate p-values estimated from permutation test.

isleaf: 1 denotes terminal node and 0 denotes internal node.

n: the number of observations reaching the node.

var: the names of the variables used in thesplit at each node (leaf nodes are denoted by the label"<leaf>").

yval: the fitted value of the response at the node, if the dimension of responseis larger than 1, it will presents as yval1, yval2, ... .

prob: the probability of each class at the node, only visible for classification tree.

where

an integer vector of the same length as the number of observations in theroot node, containing the row number offrame corresponding tothe leaf node that each observation falls into.

method

the method used to grow the hypothesis testing tree,"regression" or"classification".

control

a list of options that control theHTT algorithm. Seehtt_control.

X

a copy of the inputX in a dataframe format.

var.type

a vector recording for each variables, 0 represents continuous,1 represents ordinal and 2 represents nominal variables.

See Also

HTT,plot.htt,print.htt,predict.htt


Control for Hypothesis Testing Tree

Description

Various parameters that control aspects of theHTT function.

Usage

htt_control(teststat = c("energy0", "energy1"),             testtype = c("permutation", "fastpermutation"),             alpha = 1, pt = 0.05, minsplit = 30,             minbucket = round(minsplit/3),             R = 199, nmin = 1000)

Arguments

teststat

a character specifying the type of the test statistic to be applied.It can beteststat = "energy0" orteststat = "energy1".Default isteststat = "energy0".

testtype

a character specifying how to compute the distribution of the test statistic.It can betesttype = "permutation" ortesttype = "fastpermutation".Fortesttype = "fastpermutation", it will not perform the permutation testson the node with more thannmin observations.Default istesttype = "permutation".

alpha

the exponent on Euclidean distance in (0,2] (for regression tree).Default isalpha = 1.

pt

the p-value of the permutation test must be less than in order to implement a split.Ifpt = 1, hypothesis testing tree will fully splitwithout performing the permutation tests. Default ispt = 0.05.

minsplit

the minimum number of observations in a nodein order to be considered for splitting.Default isminsplit = 30.

minbucket

the minimum number of observations in a terminal node.Default isminbucket = round(minsplit/3).

R

the number of permutation replications are used to simulatedthe distribution of the test statistic.Default isR = 199.

nmin

the minimum number of observations in a node that does not requirethe permutation test (fortesttype = "fastpermutation").Default isnmin = 1000.

Details

The argumentsteststat,testtype andpt determinethe hypothesis testing of each split.The argumentR is the number of permutations to be used.For the dataset with more than 2000 observations,testtype = "fastpermutation" will be useful to save time.

Value

A list containing the options.

See Also

HTT,htt.object

Examples

## choose the teststat as "energy1"htt_control(teststat = "energy1")## choose the p-value 0.01htt_control(pt = 0.01)## choose the alpha to 0.5htt_control(alpha = 0.5)## change the minimum number of observations in a terminal nodehtt_control(minbucket = 7)## reduce the number of permutation replications to save timehtt_control(R = 99)

Plot an htt Object

Description

Visualize ahtt object, several arguments can be passed to control the color and shape.

Usage

## S3 method for class 'htt'plot(x, digits = 3,    line.color = "blue",    node.color = "black",    line.type = c("straight", "curved"),    layout = c("tree", "dendrogram"), ...)

Arguments

x

fitted model object of classhtt returnedby theHTT function.

digits

the number of significant digits in displayed numbers.Default isdigits = 3.

line.color

a character specifying the edge color.Default isline.color = "blue".

node.color

a character specifying the node color.Default isnode.color = "black".

line.type

a character specifying the type of edge,line.type = "straight" orline.type = "curved".Default isline.type = "straight".

layout

a character specifying the layout,layout = "tree" orlayout = "dendrogram".Default islayout = "tree".

...

additional print arguments.

Details

This function is a method for the generic functionplot, for objects of classhtt.

Value

Visualize the hypothesis testing tree.

See Also

print.htt,printsplit,predict.htt

Examples

irishtt = HTT(Species ~., data = iris)plot(irishtt)# change the line color and node colorplot(irishtt, line.color = "black", node.color = "blue")# change the line typeplot(irishtt, line.type = "curved")# change the layoutplot(irishtt, layout = "dendrogram")

Predictions from a Fitted htt Object

Description

Compute predictions fromhtt object.

Usage

## S3 method for class 'htt'predict(object, newdata,        type = c("response", "prob", "node"),        ...)

Arguments

object

fitted model object of classhtt returnedby theHTT function.

newdata

an optional data frame in which to look for variableswith which to predict, if omitted, the fitted values are used.

type

a character string denoting the type of predicted value returned.Fortype = "response", the mean of a numeric response andthe predicted class for a categorical response is returned.Fortype = "prob" the matrix of class probabilitiesis returned for a categorical response.type = "node"returns an integer vector of terminal node identifiers.

...

additional print arguments.

Details

This function is a method for the generic functionpredictfor classhtt. It can be invoked by callingpredict for anobject of the appropriate class, or directly by callingpredict.httregardless of the class of the object.

Value

A list of predictions, possibly simplified to a numeric vector,numeric matrix or factor.

Iftype = "response":
the mean of a numeric response andthe predicted class for a categorical response is returned.

Iftype = "prob":
the matrix of class probabilitiesis returned for a categorical response.

Iftype = "node":
an integer vector of terminal node identifiers is returned.

See Also

predict,htt.object

Examples

irishtt <- HTT(Species ~., data = iris)## the predicted classpredict(irishtt, type = "response")## class probabilitiespredict(irishtt, type = "prob")## terminal node identifierspredict(irishtt, type = "node")

Print a Fitted htt Object

Description

This function prints ahtt.object.It is a method for the generic functionprint of classhtt.It can be invoked by callingprint for an object of the appropriate class,or directly by callingprint.htt regardless of the class of the object.

Usage

## S3 method for class 'htt'print(x, ...)

Arguments

x

fitted model object of classhtt returnedby theHTT function.

...

additional print arguments.

Details

A semi-graphical layout of the contents of x$frame is printed.Indentation is used to convey the tree topology.Information for each node includesthe node number,split rule, size and p-value.For the "class" method, the class probabilities are also printed.

Value

Visualize the hypothesis testing tree in a semi-graphical layout.

See Also

htt.object,printsplit

Examples

irishtt = HTT(Species ~., data = iris)print(irishtt)

Displays split table for Fitted htt Object

Description

Display the split table for fittedhtt object.

Usage

printsplit(object)

Arguments

object

fitted model object of classhtt returnedby theHTT function.

Value

Display the split table.

See Also

HTT,htt.object

Examples

irishtt = HTT(Species ~., data = iris)printsplit(irishtt)

[8]ページ先頭

©2009-2025 Movatter.jp