| Title: | ROC-Optimizing Binary Classifiers |
| Version: | 0.1.4 |
| Description: | Implements ROC (Receiver Operating Characteristic)–Optimizing Binary Classifiers, supporting both linear and kernel models. Both model types provide a variety of surrogate loss functions. In addition, linear models offer multiple regularization penalties, whereas kernel models support a range of kernel functions. Scalability for large datasets is achieved through approximation-based options, which accelerate training and make fitting feasible on large data. Utilities are provided for model training, prediction, and cross-validation. The implementation builds on the ROC-Optimizing Support Vector Machines. For more information, see Hernàndez-Orallo, José, et al. (2004) <doi:10.1145/1046456.1046489>, presented in the ROC Analysis in AI Workshop (ROCAI-2004). |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | stats, graphics, utils, ggplot2, fastDummies, kernlab, pracma,rsample, dplyr, caret, pROC |
| RoxygenNote: | 7.3.2 |
| Suggests: | mlbench, knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| URL: | https://github.com/gimunBae/roclab |
| BugReports: | https://github.com/gimunBae/roclab/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-11-04 06:35:47 UTC; bgd55 |
| Author: | Gimun Bae [aut, cre], Seung Jun Shin [aut] |
| Maintainer: | Gimun Bae <gimunbae0201@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-04 07:10:02 UTC |
utils-internal.R - Internal utilities for ROC-SVM
Description
These helper functions are used only inside the package (not exported).They handle pairwise difference construction, Adamax-based optimization,proximal updates for penalties, intercept estimation, and lambda max.
Usage
X.func.complete(X.plus, X.minus)Generic function for AUC
Description
Compute AUC (Area Under the ROC Curve) for a fitted model.Dispatches to class-specific methods such asauc.roclearn.
Usage
auc(object, ...)Arguments
object | A fitted model object. |
... | Additional arguments passed to methods. |
Value
Numeric scalar: estimated AUC.
Compute AUC for a fitted kernel model
Description
Estimate the AUC (Area Under the ROC Curve) for a fitted kernel model on new data.
Usage
## S3 method for class 'kroclearn'auc(object, newdata, y, ...)Arguments
object | A fitted model object of class |
newdata | A matrix or data.frame of test predictors. Must have the samestructure as the training data (categorical variables are dummy-alignedautomatically). |
y | Response vector of test labels ({-1, 1} or convertible). |
... | Not used. |
Value
A numeric scalar giving the estimated AUC.
Examples
set.seed(123)n_train <- 100r_train <- sqrt(runif(n_train, 0.05, 1))theta_train <- runif(n_train, 0, 2*pi)X_train <- cbind(r_train * cos(theta_train), r_train * sin(theta_train))y_train <- ifelse(r_train < 0.5, 1, -1)n_test <- 10r_test <- sqrt(runif(n_test, 0.05, 1))theta_test <- runif(n_test, 0, 2*pi)X_test <- cbind(r_test * cos(theta_test), r_test * sin(theta_test))y_test <- ifelse(r_test < 0.5, 1, -1)fit <- kroclearn(X_train, y_train, lambda = 0.1, kernel = "radial", approx=TRUE)auc(fit, X_test, y_test)Compute AUC for a fitted linear model
Description
Estimate the AUC (Area Under the ROC Curve) for a fitted linear model on new data.
Usage
## S3 method for class 'roclearn'auc(object, newdata, y, ...)Arguments
object | A fitted model object of class |
newdata | A matrix or data.frame of test predictors. Must have the samestructure as the training data (categorical variables are dummy-alignedautomatically). |
y | Response vector of test labels ({-1, 1} or convertible). |
... | Not used. |
Value
A numeric scalar giving the estimated AUC.
Examples
set.seed(123)n_train <- 100n_pos <- round(0.2 * n_train)n_neg <- n_train - n_posX_train <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y_train <- c(rep(-1, n_neg), rep(1, n_pos))n_test <- 10n_pos_test <- round(0.2 * n_test)n_neg_test <- n_test - n_pos_testX_test <- rbind( matrix(rnorm(2 * n_neg_test, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos_test, mean = 1), ncol = 2))y_test <- c(rep(-1, n_neg_test), rep(1, n_pos_test))fit <- roclearn(X_train, y_train, lambda = 0.1, approx=TRUE)auc(fit, X_test, y_test)Cross-validation for kernel models
Description
Perform k-fold cross-validation over a sequence of\lambda values andselect the optimal model based on AUC.
Usage
cv.kroclearn( X, y, lambda.vec = NULL, lambda.length = 30, kernel = "radial", param.kernel = NULL, loss = "hinge", approx = NULL, intercept = TRUE, nfolds = 10, target.perf = list(), param.convergence = list())Arguments
X | Predictor matrix or data.frame (categorical variables areautomatically one-hot encoded). |
y | Response vector with class labels in {-1, 1}. Labels given as{0, 1} or as a two-level factor/character are automatically convertedto this format. |
lambda.vec | Optional numeric vector of regularization parameters (lambda values).If |
lambda.length | Number of |
kernel | Kernel type: |
param.kernel | Kernel-specific parameter:
|
loss | Surrogate loss function type. One of: |
approx | Logical; enables a scalable approximation to accelerate training.The default is |
intercept | Logical; include an intercept in the model (default |
nfolds | Number of cross-validation folds (default 10). |
target.perf | List with target sensitivity and specificity used whenestimating the intercept (defaults to 0.9 each). |
param.convergence | List of convergence controls (e.g., |
Value
An object of class"cv.kroclearn" with:
optimal.lambda— selected\lambda.optimal.fit— model trained atoptimal.lambda.lambda.vec— grid of penalty values considered.auc.mean,auc.sd— mean and sd of cross-validated AUC.auc.result— fold-by-lambda AUC matrix.time.mean,time.sd— mean and sd of training time.time.result— fold-by-lambda training time matrix.nfolds,loss,kernel— settings.
See Also
Examples
set.seed(123)n <- 100r <- sqrt(runif(n, 0.05, 1))theta <- runif(n, 0, 2*pi)X <- cbind(r * cos(theta), r * sin(theta))y <- ifelse(r < 0.5, 1, -1)cvfit <- cv.kroclearn( X, y, lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)), kernel = "radial", approx=TRUE, nfolds = 2)cvfit$optimal.lambdaCross-validation for linear models
Description
Perform k-fold cross-validation over a sequence of\lambda values andselect the optimal model based on AUC.
Usage
cv.roclearn( X, y, lambda.vec = NULL, lambda.length = 30, penalty = "ridge", param.penalty = NULL, loss = "hinge", approx = NULL, intercept = TRUE, nfolds = 10, target.perf = list(), param.convergence = list())Arguments
X | Predictor matrix or data.frame (categorical variables areautomatically one-hot encoded). |
y | Response vector with class labels in {-1, 1}. Labels given as{0, 1} or as a two-level factor/character are automatically convertedto this format. |
lambda.vec | Optional numeric vector of regularization parameters (lambda values).If |
lambda.length | Number of |
penalty | Regularization penalty type: |
param.penalty | Penalty-specific parameter:
|
loss | Surrogate loss function type. One of: |
approx | Logical; enables a scalable approximation to accelerate training.The default is |
intercept | Logical; include an intercept in the model (default |
nfolds | Number of cross-validation folds (default 10). |
target.perf | List with target sensitivity and specificity used whenestimating the intercept (defaults to 0.9 each). |
param.convergence | List of convergence controls (e.g., |
Value
An object of class"cv.roclearn" with:
optimal.lambda— selected\lambda.optimal.fit— model refit on the full data atoptimal.lambda.lambda.vec— grid of penalty values considered.auc.mean,auc.sd— mean and sd of cross-validated AUC.auc.result— fold-by-lambda AUC matrix.time.mean,time.sd— mean and sd of training time.time.result— fold-by-lambda training time matrix.nfolds,loss,penalty— settings.
See Also
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))cvfit <- cv.roclearn( X, y, lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)), approx=TRUE, nfolds = 2)cvfit$optimal.lambdaFit a kernel model
Description
Fit a kernel model
Usage
kroclearn( X, y, lambda, kernel = "radial", param.kernel = NULL, loss = "hinge", approx = NULL, intercept = TRUE, target.perf = list(), param.convergence = list())Arguments
X | Predictor matrix or data.frame (categorical variables areautomatically one-hot encoded). |
y | Response vector with class labels in {-1, 1}. Labels given as{0, 1} or as a two-level factor/character are automatically convertedto this format. |
lambda | Positive scalar regularization parameter. |
kernel | Kernel type: |
param.kernel | Kernel-specific parameter:
|
loss | Surrogate loss function type. One of: |
approx | Logical; enables a scalable approximation to accelerate training.The default is |
intercept | Logical; include an intercept in the model (default |
target.perf | List with target sensitivity and specificity used whenestimating the intercept (defaults to 0.9 each). |
param.convergence | List of convergence controls (e.g., |
Details
For large-scale data, the model is computationally prohibitive because itsloss is a U-statistic involving a double summation. To reduce this burden,the package adopts an efficient algorithm based on an incomplete U-statistic,which approximates the loss with a single summation. In kernel models,a Nyström low-rank approximation is further applied to efficiently computethe kernel matrix. These approximations substantially reduce computationalcost and accelerate training, while maintaining accuracy, making the modelfeasible for large-scale datasets. This option is available when@param approx = TRUE.
Value
An object of class"kroclearn", a list containing:
theta.hat— estimated dual coefficient vector.intercept— fitted intercept (if applicable).lambda,kernel,param.kernel,loss.approx,B(number of sampled pairs if approximation used).time— training time (seconds).nobs,p— number of observations and predictors.converged,n.iter— convergence information.kfunc— kernel function object.nystrom— low rank kernel approximation details (if used).X— training data (post-preprocessing).preprocessing— details on categorical variables,removed columns, and column names.call— the function call.
Examples
set.seed(123)n <- 100r <- sqrt(runif(n, 0.05, 1))theta <- runif(n, 0, 2*pi)X <- cbind(r * cos(theta), r * sin(theta))y <- ifelse(r < 0.5, 1, -1)fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)Visualize Cross-Validation results for kernel models
Description
Produce a visualization of cross-validation results from a fittedcv.kroclearn object. The plot shows the mean AUC acrossregularization parameters\lambda, with error bars reflectingthe cross-validation standard deviation. Optionally, the selectedoptimal\lambda is highlighted with a dashed line and marker.
Usage
## S3 method for class 'cv.kroclearn'plot(x, highlight = TRUE, ...)Arguments
x | A cross-validation object of class |
highlight | Logical; if |
... | Additional arguments passed to underlyingggplot2 functions. |
Details
This function is a method for the genericplot() function,designed specifically for cross-validation objects fromcv.kroclearn. The x-axis is displayed on a log scale for\lambda, and the y-axis represents AUC values. Error bars showvariability across folds. This is the kernel counterpart ofplot.cv.roclearn.
Value
Aggplot2 object is returned and drawn to the current device.
Examples
set.seed(123)n <- 100r <- sqrt(runif(n, 0.05, 1))theta <- runif(n, 0, 2*pi)X <- cbind(r * cos(theta), r * sin(theta))y <- ifelse(r < 0.5, 1, -1)cvfit <- cv.kroclearn( X, y, lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)), kernel = "radial", approx=TRUE, nfolds = 2)plot(cvfit)Visualize Cross-Validation results for linear models
Description
Produce a visualization of cross-validation results from a fittedcv.roclearn object. The plot shows the mean AUC acrossregularization parameters\lambda, with error bars reflectingthe cross-validation standard deviation. Optionally, the selectedoptimal\lambda is highlighted with a dashed line and marker.
Usage
## S3 method for class 'cv.roclearn'plot(x, highlight = TRUE, ...)Arguments
x | A cross-validation object of class |
highlight | Logical; if |
... | Additional arguments passed to underlyingggplot2 functions. |
Details
This function is a method for the genericplot() function,designed specifically for cross-validation objects fromcv.roclearn. The x-axis is displayed on a log scale for\lambda, and the y-axis represents AUC values. Error barsshow variability across folds.
Value
Aggplot2 object is returned and drawn to the current device.
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))cvfit <- cv.roclearn( X, y,lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)), approx=TRUE, nfolds = 2)plot(cvfit)Plot Receiver Operating Characteristic (ROC) curve using ggroc
Description
Draws an ROC curve based on decision values. There is an option to displaythe AUC in the plot title and to print the ROC summary object.
Usage
plot_roc( y_true, y_score, col = "blue", size = 1.2, title = TRUE, summary = FALSE, ...)Arguments
y_true | Response vector with class labels in {-1, 1}. Labels given as{0, 1} or as a two-level factor/character are automatically convertedto this format. |
y_score | Numeric vector of predicted scores or decision values. |
col | Line color. |
size | Line width. |
title | Logical; if TRUE, displays AUC in the plot title. |
summary | Logical; if TRUE, prints the ROC object summary. |
... | Additional arguments passed to pROC::ggroc(). |
Value
A ggplot object of the ROC curve.
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))fit <- roclearn(X, y, lambda = 0.1, approx=TRUE)y_score <- predict(fit, X, type = "response")plot_roc(y, y_score)Predictions from a fitted kernel model
Description
Generate predictions from a fitted kernel model.
Usage
## S3 method for class 'kroclearn'predict(object, newdata, type = c("class", "response"), ...)Arguments
object | A fitted model object of class |
newdata | A data frame or matrix of predictors for which predictionsare desired. Categorical variables are automatically dummy-encoded andaligned to the training structure. |
type | Prediction type: |
... | Not used. |
Value
A numeric vector of predictions ({-1, 1}) iftype = "class",or raw decision scores iftype = "response".
See Also
Examples
set.seed(123)n <- 100r <- sqrt(runif(n, 0.05, 1))theta <- runif(n, 0, 2*pi)X <- cbind(r * cos(theta), r * sin(theta))y <- ifelse(r < 0.5, 1, -1)fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)# Predict classes {-1, 1}predict(fit, X, type = "class")# Predict decision scorespredict(fit, X, type = "response")Predictions from a fitted linear model
Description
Generate predictions from a fitted linear model.
Usage
## S3 method for class 'roclearn'predict(object, newdata, type = c("class", "response"), ...)Arguments
object | A fitted model object of class |
newdata | A data frame or matrix of predictors for which predictionsare desired. Categorical variables are automatically dummy-encoded andaligned to the training structure. |
type | Prediction type: |
... | Not used. |
Value
A numeric vector of predictions ({-1, 1}) iftype = "class",or raw decision scores iftype = "response".
See Also
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))fit <- roclearn(X, y, lambda = 0.1, approx=TRUE)# Predict classes {-1, 1}predict(fit, X, type = "class")# Predict decision scorespredict(fit, X, type = "response")Fit a linear model
Description
Fit a linear model
Usage
roclearn( X, y, lambda, penalty = "ridge", param.penalty = NULL, loss = "hinge", approx = NULL, intercept = TRUE, target.perf = list(), param.convergence = list())Arguments
X | Predictor matrix or data.frame (categorical variables areautomatically one-hot encoded). |
y | Response vector with class labels in {-1, 1}. Labels given as{0, 1} or as a two-level factor/character are automatically convertedto this format. |
lambda | Positive scalar regularization parameter. |
penalty | Regularization penalty type: |
param.penalty | Penalty-specific parameter:
|
loss | Surrogate loss function type. One of: |
approx | Logical; enables a scalable approximation to accelerate training.The default is |
intercept | Logical; include an intercept in the model (default |
target.perf | List with target sensitivity and specificity used whenestimating the intercept (defaults to 0.9 each). |
param.convergence | List of convergence controls (e.g., |
Details
For large-scale data, the model is computationally prohibitive because itsloss is a U-statistic involving a double summation. To reduce this burden,the package adopts an efficient algorithm based on an incomplete U-statistic,which approximates the loss with a single summation. These approximationssubstantially reduce computational cost and accelerate training, whilemaintaining accuracy, making the model feasible for large-scale datasets.This option is available whenapprox = TRUE.
Value
An object of class"roclearn", a list containing:
beta.hat— estimated coefficient vector.intercept— fitted intercept (if applicable).lambda,penalty,param.penalty,loss.approx,B(number of sampled pairs if approximation used).time— training time (seconds).nobs,p— number of observations and predictors.converged,n.iter— convergence information.preprocessing— details on categorical variables,removed columns, and column names.call— the function call.
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))fit <- roclearn(X, y, lambda = 0.1, penalty = "ridge", approx=TRUE)Summarize Cross-Validation results for kernel models
Description
Print a concise summary of cross-validation results for a kernel model.
Usage
## S3 method for class 'cv.kroclearn'summary(object, ...)Arguments
object | A fitted cross-validation object of class |
... | Not used. |
Details
This is a method for the genericsummary() function, applied toobjects of class"cv.kroclearn". It prints training settings(loss, kernel type, number of folds, the set of candidate\lambda),the selected optimal\lambda, the corresponding mean and standarddeviation of cross-validated AUC, and a truncated table of AUC resultsacross candidate\lambda values.
Value
Invisibly returns the inputobject, after printing a summaryto the console.
Examples
set.seed(123)n <- 100r <- sqrt(runif(n, 0.05, 1))theta <- runif(n, 0, 2*pi)X <- cbind(r * cos(theta), r * sin(theta))y <- ifelse(r < 0.5, 1, -1)cvfit <- cv.kroclearn( X, y, lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)), kernel = "radial", approx=TRUE, nfolds = 2)summary(cvfit)Summarize Cross-Validation results for linear models
Description
Print a concise summary of cross-validation results for a linear model.
Usage
## S3 method for class 'cv.roclearn'summary(object, ...)Arguments
object | A fitted cross-validation object of class |
... | Not used. |
Details
This is a method for the genericsummary() function, applied toobjects of class"cv.roclearn". It prints training settings(loss, penalty, number of folds, the set of candidate\lambda),the selected optimal\lambda, the corresponding mean and standarddeviation of cross-validated AUC, and a truncated table of AUC resultsacross candidate\lambda values.
Value
Invisibly returns the inputobject, after printing a summaryto the console.
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))cvfit <- cv.roclearn( X, y, lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)), approx=TRUE, nfolds = 2)summary(cvfit)Summarize a fitted kernel model
Description
Display key information from a fitted"kroclearn" object, including:data dimensions, kernel specification, convergence status, training time,and leading coefficient estimates.
Usage
## S3 method for class 'kroclearn'summary(object, ...)Arguments
object | A fitted model of class |
... | Unused. |
Value
Invisibly returnsobject after printing a formatted summary.
See Also
kroclearn,summary.roclearn,cv.kroclearn,cv.roclearn
Examples
set.seed(123)n <- 100r <- sqrt(runif(n, 0.05, 1))theta <- runif(n, 0, 2*pi)X <- cbind(r * cos(theta), r * sin(theta))y <- ifelse(r < 0.5, 1, -1)fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)summary(fit)Summarize a fitted linear model
Description
Display key information from a fitted"roclearn" object, including:data dimensions, model specification, convergence status, training time,and leading coefficient estimates.
Usage
## S3 method for class 'roclearn'summary(object, ...)Arguments
object | A fitted model of class |
... | Unused. |
Value
Invisibly returnsobject after printing a formatted summary.
See Also
roclearn,summary.kroclearn,cv.roclearn,cv.kroclearn
Examples
set.seed(123)n <- 100n_pos <- round(0.2 * n)n_neg <- n - n_posX <- rbind( matrix(rnorm(2 * n_neg, mean = -1), ncol = 2), matrix(rnorm(2 * n_pos, mean = 1), ncol = 2))y <- c(rep(-1, n_neg), rep(1, n_pos))fit <- roclearn(X, y, lambda = 0.1, approx=TRUE)summary(fit)