Movatterモバイル変換

Type:

Package

Title:

Class Cover Catch Digraph Classification

Version:

0.3.2

Description:

Fit Class Cover Catch Digraph Classification models that can be used in machine learning. Pure and proper and random walk approaches are available. Methods are explained in Priebe et al. (2001) <doi:10.1016/S0167-7152(01)00129-8>, Priebe et al. (2003) <doi:10.1007/s00357-003-0003-7>, and Manukyan and Ceyhan (2016) <doi:10.48550/arXiv.1904.04564>.

Depends:

R (≥ 4.2)

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.2.1

LinkingTo:

Rcpp, RcppArmadillo

Imports:

Rcpp, RANN, Rfast, proxy

NeedsCompilation:

yes

Packaged:

2023-04-22 15:24:40 UTC; Fatih

Author:

Fatih Saglam

[aut, cre]

Maintainer:

Fatih Saglam <saglamf89@gmail.com>

Repository:

CRAN

Date/Publication:

2023-04-24 09:50:02 UTC

Pure and Proper Class Cover Catch Digraph Classifier

Description

pcccd_classifier fits a Pure and Proper Class Cover CatchDigraph (PCCCD) classification model.

Usage

pcccd_classifier(x, y, proportion = 1)

Arguments

x

feature matrix or dataframe.

y

class factor variable.

proportion

proportion of covered samples. A real number between(0,1].1 by default. Smaller numbers results in less dominant samples.

Details

Multiclass framework for PCCCD. PCCCD determines target class dominant pointssetS and their circular cover area by determining ballsB(x^{\text{target}}, r_{i}) with radii r using minimum amount ofdominant point which satisfiesX^{\text{non-target}}\cap \bigcup_{i}B_{i} = \varnothing (pure) andX^{\text{target}}\subset \bigcup_{i}B_{i} (proper).

This guarantees that balls of target class never covers any non-targetsamples (pure) and balls cover all target samples (proper).

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003),and Manukyan and Ceyhan (2016).

Note: Much faster thancccd package.

Value

an object of "cccd_classifier" which includes:

i_dominant_list

dominant sample indexes.

x_dominant_list

dominant samples from feature matrix, x

radii_dominant_list

Radiuses of the circle for dominant samples

class_names

class names

k_class

number of classes

proportions

proportions each class covered

Author(s)

Fatih Saglam, saglamf89@gmail.com

References

Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distributionof the domination number for random class cover catch digraphs. Statistics &Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8

Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003).Classification Using Class Cover Catch Digraphs. Journal of Classification,20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7

Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with ageometric digraph family. Journal of Machine Learning Research, 17(1),6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf

Examples

n <- 1000x1 <- runif(n, 1, 10)x2 <- runif(n, 1, 10)x <- cbind(x1, x2)y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))m_pcccd <- pcccd_classifier(x = x, y = y)# datasetplot(x, col = y, asp = 1)# dominant samples of first classx_center <- m_pcccd$x_dominant_list[[1]]# radii of balls for first classradii <- m_pcccd$radii_dominant_list[[1]]# ballsfor (i in 1:nrow(x_center)) {xx <- x_center[i, 1]yy <- x_center[i, 2]r <- radii[i]theta <- seq(0, 2*pi, length.out = 100)xx <- xx + r*cos(theta)yy <- yy + r*sin(theta)lines(xx, yy, type = "l", col = "green")}# testing the performancei_train <- sample(1:n, round(n*0.8))x_train <- x[i_train,]y_train <- y[i_train]x_test <- x[-i_train,]y_test <- y[-i_train]m_pcccd <- pcccd_classifier(x = x_train, y = y_train)pred <- predict(object = m_pcccd, newdata = x_test)# confusion matrixtable(y_test, pred)# test accuracysum(y_test == pred)/nrow(x_test)

Pure and Proper Class Cover Catch Digraph Prediction

Description

predict.pcccd_classifier makes prediction usingpcccd_classifier object.

Usage

## S3 method for class 'pcccd_classifier'predict(object, newdata, type = "pred", ...)

Arguments

object

apcccd_classifier object

newdata

newdata as matrix or dataframe.

type

"pred" or "prob". Default is "pred". "pred" is class estimations,"prob" isn\times k matrix of class probabilities.

...

not used.

Details

Estimations are based on nearest dominant neighbor in radius unit.

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003),and Manukyan and Ceyhan (2016).

Value

a vector of class predictions (if type is "pred") or an\times pmatrix of class probabilities (if type is "prob").

Author(s)

Fatih Saglam, saglamf89@gmail.com

References

Examples

n <- 1000x1 <- runif(n, 1, 10)x2 <- runif(n, 1, 10)x <- cbind(x1, x2)y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))# testing the performancei_train <- sample(1:n, round(n*0.8))x_train <- x[i_train,]y_train <- y[i_train]x_test <- x[-i_train,]y_test <- y[-i_train]m_pcccd <- pcccd_classifier(x = x_train, y = y_train)pred <- predict(object = m_pcccd, newdata = x_test)# confusion matrixtable(y_test, pred)# test accuracysum(y_test == pred)/nrow(x_test)

Random Walk Class Cover Catch Digraph Prediction

Description

predict.rwcccd_classifier makes prediction usingrwcccd_classifier object.

Usage

## S3 method for class 'rwcccd_classifier'predict(object, newdata, type = "pred", e = 0, ...)

Arguments

object

arwcccd_classifier object

newdata

newdata as matrix or dataframe.

type

"pred" or "prob". Default is "pred". "pred" is class estimations,"prob" isn\times k matrix of class probabilities.

e

0 or 1. Default is 0. Penalty based onT scores inrwcccd_classifier object.

...

not used.

Details

Estimations are based on nearest dominant neighbor in radius unit.e argument is used to penalize estimations based onT scores inrwcccd_classifier object.

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003),and Manukyan and Ceyhan (2016).

Value

a vector of class predictions (if type is "pred") or an\times pmatrix of class probabilities (if type is "prob").

Author(s)

Fatih Saglam, saglamf89@gmail.com

References

Examples

n <- 1000x1 <- runif(n, 1, 10)x2 <- runif(n, 1, 10)x <- cbind(x1, x2)y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))# testing the performancei_train <- sample(1:n, round(n*0.8))x_train <- x[i_train,]y_train <- y[i_train]x_test <- x[-i_train,]y_test <- y[-i_train]m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train)pred <- predict(object = m_rwcccd, newdata = x_test, e = 0)# confusion matrixtable(y_test, pred)# test accuracysum(y_test == pred)/nrow(x_test)

Random Walk Class Cover Catch Digraph Classifier

Description

rwcccd_classifier andrwcccd_classifier_2 fits aRandom Walk Class Cover Catch Digraph (RWCCCD) classification model.rwcccd_classifier uses C++ for speed andrwcccd_classifier_2uses R language to determine balls.

Usage

rwcccd_classifier(x, y, method = "default", m = 1, proportion = 0.99)rwcccd_classifier_2(  x,  y,  method = "default",  m = 1,  proportion = 0.99,  partial_ordering = FALSE)

Arguments

x

feature matrix or dataframe.

y

class factor variable.

method

"default" or "balanced".

m

penalization parameter. Takes value in[0,\infty).

proportion

proportion of covered samples. A real number between(0,1].

partial_ordering

TRUE orFALSE Default isFALSETRUE uses partialordering in determining dominant points. It orders incompletely but faster.Only forrwcccd_classifier_2.

Details

Random Walk Class Cover Catch Digraphs (RWCCD) are determined by calculatingT_{\text{target}} score for each class as target class as

T_{\text{target}}=R_{\text{target}}(r_{\text{target}})-\frac{r_{\text{target}}n_u}{2d_m(x)}.

Here,r_{\text{target}} is radius and determined by maximumR_{\text{target}}(r) - P_{\text{target}}(r) calculated for each target sample.R_{\text{target}}(r) is

R_{\text{target}}(r):= w_{target}|{z\in X^{\text{target}}_{n_{\text{target}}}:d(x^{\text{target}},z)\leq r}| - w_{non-target}|{z\in X^{\text{non-target}}_{n_{\text{non-target}}}:d(x^{\text{target}},z)\leq r}|

andP_{\text{target}}(r) is

P_{\text{target}}(r) = m\times d(x^{\text{target}},z)^p.

m=0 removes penalty.w_{target}=1 for default andw_{target}=n_{\text{target}/n_{\text{non-target}}} for balanced method.n_u is the number of uncovered samples in the current iteration andd_m(x) is\max{d(x^{\text{target}},x^{\text{uncovered}})}.

This method is more robust to noise compared to PCCCD However, balls coversclasses improperly andr = 0 can be selected.

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003),and Manukyan and Ceyhan (2016).

Value

a rwcccd_classifier object

i_dominant_list

dominant sample indexes.

x_dominant_list

dominant samples from feature matrix, x

radii_dominant_list

Radiuses of the circle for dominant samples

class_names

class names

k_class

number of classes

proportions

proportions each class covered

Author(s)

Fatih Saglam, saglamf89@gmail.com

References

Examples

n <- 500x1 <- runif(n, 1, 10)x2 <- runif(n, 1, 10)x <- cbind(x1, x2)y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))# datasetm_rwcccd_1 <- rwcccd_classifier(x = x, y = y, method = "default", m = 1)plot(x, col = y, asp = 1, main = "default")# dominant samples of second classx_center <- m_rwcccd_1$x_dominant_list[[2]]# radii of balls for second classradii <- m_rwcccd_1$radii_dominant_list[[2]]# ballsfor (i in 1:nrow(x_center)) {  xx <- x_center[i, 1]  yy <- x_center[i, 2]  r <- radii[i]  theta <- seq(0, 2*pi, length.out = 100)  xx <- xx + r*cos(theta)  yy <- yy + r*sin(theta)  lines(xx, yy, type = "l", col = "green")}# datasetm_rwcccd_2 <- rwcccd_classifier_2(x = x, y = y, method = "default", m = 1, partial_ordering = TRUE)plot(x, col = y, asp = 1, main = "default, prartial_ordering = TRUE")# dominant samples of second classx_center <- m_rwcccd_2$x_dominant_list[[2]]# radii of balls for second classradii <- m_rwcccd_2$radii_dominant_list[[2]]# ballsfor (i in 1:nrow(x_center)) {  xx <- x_center[i, 1]  yy <- x_center[i, 2]  r <- radii[i]  theta <- seq(0, 2*pi, length.out = 100)  xx <- xx + r*cos(theta)  yy <- yy + r*sin(theta)  lines(xx, yy, type = "l", col = "green")}# datasetm_rwcccd_3 <- rwcccd_classifier(x = x, y = y, method = "balanced", m = 1, proportion = 0.5)plot(x, col = y, asp = 1, main = "balanced, proportion = 0.5")# dominant samples of second classx_center <- m_rwcccd_3$x_dominant_list[[2]]# radii of balls for second classradii <- m_rwcccd_3$radii_dominant_list[[2]]# ballsfor (i in 1:nrow(x_center)) {  xx <- x_center[i, 1]  yy <- x_center[i, 2]  r <- radii[i]  theta <- seq(0, 2*pi, length.out = 100)  xx <- xx + r*cos(theta)  yy <- yy + r*sin(theta)  lines(xx, yy, type = "l", col = "green")}# testing the performancei_train <- sample(1:n, round(n*0.8))x_train <- x[i_train,]y_train <- y[i_train]x_test <- x[-i_train,]y_test <- y[-i_train]m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train, method = "balanced")pred <- predict(object = m_rwcccd, newdata = x_test)# confusion matrixtable(y_test, pred)# accuracysum(y_test == pred)/nrow(x_test)