Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification

NotificationsYou must be signed in to change notification settings

Thie1e/cutpointr

Repository files navigation

cutpointr

AppVeyor Build StatusProject Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.codecovCRAN_Release_Badge

cutpointr is an R package for tidy calculation of “optimal”cutpoints. It supports several methods for calculating cutpoints andincludes several metrics that can be maximized or minimized by selectinga cutpoint. Some of these methods are designed to be more robust thanthe simple empirical optimization of a metric. Additionally,cutpointr can automatically bootstrap the variability of the optimalcutpoints and return out-of-bag estimates of various performancemetrics.

Installation

You can installcutpointr from CRAN using the menu in RStudio orsimply:

install.packages("cutpointr")

Example

For example, the optimal cutpoint for the included data set is 2 whenmaximizing the sum of sensitivity and specificity.

library(cutpointr)#> Warning: package 'cutpointr' was built under R version 4.4.1data(suicide)head(suicide)#>   age gender dsi suicide#> 1  29 female   1      no#> 2  26   male   0      no#> 3  26 female   0      no#> 4  27 female   0      no#> 5  28 female   0      no#> 6  53   male   2      nocp<- cutpointr(suicide,dsi,suicide,method=maximize_metric,metric=sum_sens_spec)#> Assuming the positive class is yes#> Assuming the positive class has higher x values
summary(cp)#> Method: maximize_metric#> Predictor: dsi#> Outcome: suicide#> Direction: >=#>#>     AUC   n n_pos n_neg#>  0.9238 532    36   496#>#>  optimal_cutpoint sum_sens_spec    acc sensitivity specificity tp fn fp  tn#>                 2        1.7518 0.8647      0.8889      0.8629 32  4 68 428#>#> Predictor summary:#>     Data Min.   5% 1st Qu. Median      Mean 3rd Qu.  95% Max.       SD NAs#>  Overall    0 0.00       0      0 0.9210526       1 5.00   11 1.852714   0#>       no    0 0.00       0      0 0.6330645       0 4.00   10 1.412225   0#>      yes    0 0.75       4      5 4.8888889       6 9.25   11 2.549821   0
plot(cp)

When considering the optimality of a cutpoint, we can only make ajudgement based on the sample at hand. Thus, the estimated cutpoint maynot be optimal within the population or on unseen data, which is why wesometimes put the “optimal” in quotation marks.

cutpointr makes assumptions about the direction of the dependencybetweenclass andx, ifdirection and / orpos_class orneg_class are not specified. The same result as above can be achievedby manually definingdirection and the positive / negative classeswhich is slightly faster, since the classes and direction don’t have tobe determined:

opt_cut<- cutpointr(suicide,dsi,suicide,direction=">=",pos_class="yes",neg_class="no",method=maximize_metric,metric=youden)

opt_cut is a data frame that returns the input data and the ROC curve(and optionally the bootstrap results) in a nested tibble. Methods forsummarizing and plotting the data and results are included(e.g. summary,plot,plot_roc,plot_metric)

To inspect the optimization, the function of metric values per cutpointcan be plotted usingplot_metric, if an optimization function was usedthat returns a metric column in theroc_curve column. For example, themaximize_metric andminimize_metric functions do so:

plot_metric(opt_cut)

Predictions for new data can be made usingpredict:

predict(opt_cut,newdata=data.frame(dsi=0:5))#> [1] "no"  "no"  "yes" "yes" "yes" "yes"

Features

  • Calculation of optimal cutpoints in binary classification tasks
  • Tidy output, integrates well with functions from the tidyverse
  • Functions for plotting ROC curves, metric distributions and more
  • Bootstrapping for simulating the cutpoint variability and forobtaining out-of-bag estimates of various metrics (as a form ofinternal validation) with optional parallelisation
  • Multiple methods for calculating cutpoints
  • Multiple metrics can be chosen for maximization / minimization
  • Tidyeval

Calculating cutpoints

Method functions for cutpoint estimation

The included methods for calculating cutpoints are:

  • maximize_metric: Maximize the metric function
  • minimize_metric: Minimize the metric function
  • maximize_loess_metric: Maximize the metric function after LOESSsmoothing
  • minimize_loess_metric: Minimize the metric function after LOESSsmoothing
  • maximize_gam_metric: Maximize the metric function after smoothingvia Generalized Additive Models
  • minimize_gam_metric: Minimize the metric function after smoothingvia Generalized Additive Models
  • maximize_boot_metric: Bootstrap the optimal cutpoint when maximizinga metric
  • minimize_boot_metric: Bootstrap the optimal cutpoint when minimizinga metric
  • oc_manual: Specify the cutoff value manually
  • oc_mean: Use the sample mean as the “optimal” cutpoint
  • oc_median: Use the sample median as the “optimal” cutpoint
  • oc_youden_kernel: Maximize the Youden-Index after kernel smoothingthe distributions of the two classes
  • oc_youden_normal: Maximize the Youden-Index parametrically assumingnormally distributed data in both classes

Metric functions

The included metrics to be used with the minimization and maximizationmethods are:

  • accuracy: Fraction correctly classified
  • abs_d_sens_spec: The absolute difference of sensitivity andspecificity
  • abs_d_ppv_npv: The absolute difference between positive predictivevalue (PPV) and negative predictive value (NPV)
  • roc01: Distance to the point (0,1) on ROC space
  • cohens_kappa: Cohen’s Kappa
  • sum_sens_spec: sensitivity + specificity
  • sum_ppv_npv: The sum of positive predictive value (PPV) and negativepredictive value (NPV)
  • prod_sens_spec: sensitivity * specificity
  • prod_ppv_npv: The product of positive predictive value (PPV) andnegative predictive value (NPV)
  • youden: Youden- or J-Index = sensitivity + specificity - 1
  • odds_ratio: (Diagnostic) odds ratio
  • risk_ratio: risk ratio (relative risk)
  • p_chisquared: The p-value of a chi-squared test on the confusionmatrix
  • misclassification_cost: The sum of the misclassification cost offalse positives and false negatives. Additional arguments: cost_fp,cost_fn
  • total_utility: The total utility of true / false positives /negatives. Additional arguments: utility_tp, utility_tn, cost_fp,cost_fn
  • F1_score: The F1-score (2 * TP) / (2 * TP + FP + FN)
  • metric_constrain: Maximize a selected metric given a minimal valueof another selected metric
  • sens_constrain: Maximize sensitivity given a minimal value ofspecificity
  • spec_constrain: Maximize specificity given a minimal value ofsensitivity
  • acc_constrain: Maximize accuracy given a minimal value ofsensitivity

Furthermore, the following functions are included which can be used asmetric functions but are more useful for plotting purposes, for exampleinplot_cutpointr, or for defining new metric functions:tp,fp,tn,fn,tpr,fpr,tnr,fnr,false_omission_rate,false_discovery_rate,ppv,npv,precision,recall,sensitivity, andspecificity.

The inputs to the argumentsmethod andmetric are functions so thatuser-defined functions can easily be supplied instead of the built-inones.

Separate subgroups and bootstrapping

Cutpoints can be separately estimated on subgroups that are defined by athird variable,gender in this case. Additionally, ifboot_runs islarger zero,cutpointr will carry out the usual cutpoint calculationon the full sample, just as before, and additionally onboot_runsbootstrap samples. This offers a way of gauging the out-of-sampleperformance of the cutpoint estimation method. If a subgroup is given,the bootstrapping is carried out separately for every subgroup which isalso reflected in the plots and output.

set.seed(12)opt_cut<- cutpointr(suicide,dsi,suicide,boot_runs=1000)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Running bootstrap...opt_cut#> # A tibble: 1 × 16#>   direction optimal_cutpoint method          sum_sens_spec      acc sensitivity#>   <chr>                <dbl> <chr>                   <dbl>    <dbl>       <dbl>#> 1 >=                       2 maximize_metric       1.75179 0.864662    0.888889#>   specificity      AUC pos_class neg_class prevalence outcome predictor#>         <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>   <chr>#> 1    0.862903 0.923779 yes       no         0.0676692 suicide dsi#>   data               roc_curve            boot#>   <list>             <list>               <list>#> 1 <tibble [532 × 2]> <rc_ctpnt [13 × 10]> <tibble [1,000 × 23]>

The returned object has the additional columnboot which is a nestedtibble that includes the cutpoints per bootstrap sample along with themetric calculated using the function inmetric and various defaultmetrics. The metrics are suffixed by_b to indicate in-bag results or_oob to indicate out-of-bag results:

opt_cut$boot#> [[1]]#> # A tibble: 1,000 × 23#>    optimal_cutpoint AUC_b AUC_oob sum_sens_spec_b sum_sens_spec_oob acc_b#>               <dbl> <dbl>   <dbl>           <dbl>             <dbl> <dbl>#>  1                2 0.957   0.884            1.80              1.71 0.874#>  2                1 0.918   0.935            1.70              1.70 0.752#>  3                2 0.920   0.946            1.79              1.73 0.874#>  4                2 0.940   0.962            1.82              1.76 0.893#>  5                2 0.849   0.96             1.66              1.76 0.848#>  6                4 0.926   0.927            1.80              1.51 0.925#>  7                2 0.927   0.919            1.74              1.78 0.885#>  8                2 0.958   0.882            1.82              1.67 0.863#>  9                4 0.911   0.923            1.80              1.53 0.914#> 10                1 0.871   0.975            1.62              1.80 0.737#> # ℹ 990 more rows#> # ℹ 17 more variables: acc_oob <dbl>, sensitivity_b <dbl>,#> #   sensitivity_oob <dbl>, specificity_b <dbl>, specificity_oob <dbl>,#> #   cohens_kappa_b <dbl>, cohens_kappa_oob <dbl>, TP_b <dbl>, FP_b <dbl>,#> #   TN_b <int>, FN_b <int>, TP_oob <dbl>, FP_oob <dbl>, TN_oob <int>,#> #   FN_oob <int>, roc_curve_b <list>, roc_curve_oob <list>

The summary and plots include additional elements that summarize ordisplay the bootstrap results:

summary(opt_cut)#> Method: maximize_metric#> Predictor: dsi#> Outcome: suicide#> Direction: >=#> Nr. of bootstraps: 1000#>#>     AUC   n n_pos n_neg#>  0.9238 532    36   496#>#>  optimal_cutpoint sum_sens_spec    acc sensitivity specificity tp fn fp  tn#>                 2        1.7518 0.8647      0.8889      0.8629 32  4 68 428#>#> Predictor summary:#>     Data Min.   5% 1st Qu. Median      Mean 3rd Qu.  95% Max.       SD NAs#>  Overall    0 0.00       0      0 0.9210526       1 5.00   11 1.852714   0#>       no    0 0.00       0      0 0.6330645       0 4.00   10 1.412225   0#>      yes    0 0.75       4      5 4.8888889       6 9.25   11 2.549821   0#>#> Bootstrap summary:#>           Variable Min.   5% 1st Qu. Median Mean 3rd Qu.  95% Max.   SD NAs#>   optimal_cutpoint 1.00 1.00    2.00   2.00 2.12    2.00 4.00 4.00 0.72   0#>              AUC_b 0.83 0.88    0.91   0.93 0.92    0.94 0.96 0.98 0.02   0#>            AUC_oob 0.82 0.86    0.90   0.92 0.92    0.95 0.97 1.00 0.03   0#>    sum_sens_spec_b 1.57 1.67    1.72   1.76 1.76    1.80 1.84 1.89 0.05   0#>  sum_sens_spec_oob 1.37 1.56    1.66   1.72 1.71    1.78 1.86 1.90 0.09   0#>              acc_b 0.73 0.77    0.85   0.87 0.86    0.88 0.91 0.94 0.04   0#>            acc_oob 0.72 0.77    0.85   0.86 0.86    0.88 0.90 0.93 0.04   0#>      sensitivity_b 0.72 0.81    0.86   0.90 0.90    0.94 0.98 1.00 0.05   0#>    sensitivity_oob 0.44 0.67    0.80   0.87 0.86    0.93 1.00 1.00 0.10   0#>      specificity_b 0.72 0.76    0.85   0.86 0.86    0.88 0.91 0.94 0.04   0#>    specificity_oob 0.69 0.76    0.84   0.86 0.86    0.88 0.91 0.94 0.04   0#>     cohens_kappa_b 0.16 0.27    0.37   0.42 0.41    0.46 0.52 0.66 0.07   0#>   cohens_kappa_oob 0.15 0.25    0.34   0.39 0.39    0.44 0.51 0.62 0.08   0plot(opt_cut)

Parallelized bootstrapping

Usingforeach anddoRNG the bootstrapping can be parallelizedeasily. ThedoRNG package is being used to make the bootstrapsampling reproducible.

if (suppressPackageStartupMessages(require(doParallel)& require(doRNG))) {cl<- makeCluster(2)# 2 cores  registerDoParallel(cl)  registerDoRNG(12)# Reproducible parallel loops using doRNGopt_cut<- cutpointr(suicide,dsi,suicide,gender,pos_class="yes",direction=">=",boot_runs=1000,allowParallel=TRUE)  stopCluster(cl)opt_cut}#> Warning: package 'doParallel' was built under R version 4.4.2#> Warning: package 'doRNG' was built under R version 4.4.2#> Warning: package 'rngtools' was built under R version 4.4.2#> Running bootstrap...#> # A tibble: 2 × 18#>   subgroup direction optimal_cutpoint method          sum_sens_spec      acc#>   <chr>    <chr>                <dbl> <chr>                   <dbl>    <dbl>#> 1 female   >=                       2 maximize_metric       1.80812 0.885204#> 2 male     >=                       3 maximize_metric       1.62511 0.842857#>   sensitivity specificity      AUC pos_class neg_class prevalence outcome#>         <dbl>       <dbl>    <dbl> <chr>     <fct>          <dbl> <chr>#> 1    0.925926    0.882192 0.944647 yes       no         0.0688776 suicide#> 2    0.777778    0.847328 0.861747 yes       no         0.0642857 suicide#>   predictor grouping data               roc_curve#>   <chr>     <chr>    <list>             <list>#> 1 dsi       gender   <tibble [392 × 2]> <rc_ctpnt [11 × 10]>#> 2 dsi       gender   <tibble [140 × 2]> <rc_ctpnt [11 × 10]>#>   boot#>   <list>#> 1 <tibble [1,000 × 23]>#> 2 <tibble [1,000 × 23]>

More robust cutpoint estimation methods

Bootstrapped cutpoints

It has been shown that bagging can substantially improve performance ofa wide range of types of models in regression as well as inclassification tasks. This method is available for cutpoint estimationvia themaximize_boot_metric andminimize_boot_metric functions. Ifone of these functions is used asmethod,boot_cut bootstrap samplesare drawn, the cutpoint optimization is carried out in each one and asummary (e.g. the mean) of the resulting optimal cutpoints on thebootstrap samples is returned as the optimal cutpoint incutpointr.Note that if bootstrap validation is run, i.e. ifboot_runs is largerzero, an outer bootstrap will be executed. In the bootstrap validationroutineboot_runs bootstrap samples are generated and each one isagain bootstrappedboot_cut times. This may lead to long run times, soactivating the built-in parallelization may be advisable.

The advantages of bootstrapping the optimal cutpoint are that theprocedure doesn’t possess parameters that have to be tuned, unlike theLOESS smoothing, that it doesn’t rely on assumptions, unlike the Normalmethod, and that it is applicable to any metric that can be used withminimize_metric ormaximize_metric, unlike the Kernel method.Furthermore, like Random Forests cannot be overfit by increasing thenumber of trees, the bootstrapped cutpoints cannot be overfit by runningan excessive amount ofboot_cut repetitions.

set.seed(100)cutpointr(suicide,dsi,suicide,gender,method=maximize_boot_metric,boot_cut=200,summary_func=mean,metric=accuracy,silent=TRUE)#> # A tibble: 2 × 18#>   subgroup direction optimal_cutpoint method               accuracy      acc#>   <chr>    <chr>                <dbl> <chr>                   <dbl>    <dbl>#> 1 female   >=                 5.73246 maximize_boot_metric 0.956633 0.956633#> 2 male     >=                 8.41026 maximize_boot_metric 0.95     0.95#>   sensitivity specificity      AUC pos_class neg_class prevalence outcome#>         <dbl>       <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>#> 1    0.444444    0.994521 0.944647 yes       no         0.0688776 suicide#> 2    0.222222    1        0.861747 yes       no         0.0642857 suicide#>   predictor grouping data               roc_curve           boot#>   <chr>     <chr>    <list>             <list>              <lgl>#> 1 dsi       gender   <tibble [392 × 2]> <rc_ctpnt [11 × 9]> NA#> 2 dsi       gender   <tibble [140 × 2]> <rc_ctpnt [11 × 9]> NA

LOESS smoothing for selecting a cutpoint

When usingmaximize_metric andminimize_metric the optimal cutpointis selected by searching the maximum or minimum of the metric function.For example, we may want to minimize the misclassification cost. Sincefalse negatives (a suicide attempt was not anticipated) can be regardedas much more severe than false positives we can set the cost of a falsenegativecost_fn for example to ten times the cost of a falsepositive.

opt_cut<- cutpointr(suicide,dsi,suicide,gender,method=minimize_metric,metric=misclassification_cost,cost_fp=1,cost_fn=10)#> Assuming the positive class is yes#> Assuming the positive class has higher x values
plot_metric(opt_cut)

As this “optimal” cutpoint may depend on minor differences between thepossible cutoffs, smoothing of the function of metric values by cutpointvalue might be desirable, especially in small samples. Theminimize_loess_metric andmaximize_loess_metric functions can beused to smooth the function so that the optimal cutpoint is selectedbased on the smoothed metric values. Options to modify the smoothing,which is implemented usingloess.as from thefANCOVA package,include:

  • criterion: the criterion for automatic smoothing parameterselection: “aicc” denotes bias-corrected AIC criterion, “gcv” denotesgeneralized cross-validation.
  • degree: the degree of the local polynomials to be used. It can be 0,1 or 2.
  • family: if “gaussian” fitting is by least-squares, and if“symmetric” a re-descending M estimator is used with Tukey’s biweightfunction.
  • user.span: the user-defined parameter which controls the degree ofsmoothing.

Using parameters for the LOESS smoothing ofcriterion = "aicc",degree = 2,family = "symmetric", anduser.span = 0.7 we get thefollowing smoothed versions of the above metrics:

opt_cut<- cutpointr(suicide,dsi,suicide,gender,method=minimize_loess_metric,criterion="aicc",family="symmetric",degree=2,user.span=0.7,metric=misclassification_cost,cost_fp=1,cost_fn=10)#> Assuming the positive class is yes#> Assuming the positive class has higher x values
plot_metric(opt_cut)

The optimal cutpoint for the female subgroup changes to 3. Note, though,that there are no reliable rules for selecting the “best” smoothingparameters. Notably, the LOESS smoothing is sensitive to the number ofunique cutpoints. A large number of unique cutpoints generally leads toa more volatile curve of metric values by cutpoint value, even aftersmoothing. Thus, the curve tends to be undersmoothed in that scenario.The unsmoothed metric values are returned inopt_cut$roc_curve in thecolumnm_unsmoothed.

Smoothing via Generalized Additive Models for selecting a cutpoint

In a similar fashion, the function of metric values per cutpoint can besmoothed using Generalized Additive Models with smooth terms.Internally,mgcv::gam carries out the smoothing which can becustomized via the argumentsformula andoptimizer, seehelp("gam", package = "mgcv"). Most importantly, the GAM can bespecified by altering the default formula, for example the smoothingfunction could be configured to apply cubic regression splines ("cr")as the smooth term. As thesuicide data has only very few uniquecutpoints, it is not very suitable for showcasing the GAM smoothing, sowe will use two classes of theiris data here. In this case, thepurely empirical method and the GAM smoothing lead to identicalcutpoints, but in practice the GAM smoothing tends to be more robust,especially with larger data. An attractive feature of the GAM smoothingis that the default values tend to work quite well and usually requireno tuning, eliminating researcher degrees of freedom.

library(ggplot2)exdat<-irisexdat<-exdat[exdat$Species!="setosa", ]opt_cut<- cutpointr(exdat,Petal.Length,Species,method=minimize_gam_metric,formula=m~ s(x.sorted,bs="cr"),metric=abs_d_sens_spec)#> Assuming the positive class is virginica#> Assuming the positive class has higher x valuesplot_metric(opt_cut)

Parametric method assuming normality

The Normal method inoc_youden_normal is a parametric method formaximizing the Youden-Index or equivalently the sum of$Se$ and$Sp$. Itrelies on the assumption that the predictor for both the negative andpositive observations is normally distributed. In that case it can beshown that

$$c^* = \frac{(\mu_P \sigma_N^2 - \mu_N \sigma_P^2) - \sigma_N \sigma_P \sqrt{(\mu_N - \mu_P)^2 + (\sigma_N^2 - \sigma_P^2) log(\sigma_N^2 / \sigma_P^2)}}{\sigma_N^2 - \sigma_P^2}$$

where the negative class is normally distributed with$\sim N(\mu_N, \sigma_N^2)$ and the positive class independentlynormally distributed with$\sim N(\mu_P, \sigma_P^2)$ provides theoptimal cutpoint $c^$ that maximizes the Youden-Index. If $\sigma_N$and $\sigma_P$ are equal, the expression can be simplified to$c^ = \frac{\mu_N + \mu_P}{2}$. However, theoc_youden_normal methodin cutpointr always assumes unequal standard deviations. Since thismethod does not select a cutpoint from the observed predictor values, itis questionable which values for$Se$ and$Sp$ should be reported. Here,the Youden-Index can be calculated as

$$J = \Phi(\frac{c^* - \mu_N}{\sigma_N}) - \Phi(\frac{c^* - \mu_P}{\sigma_P})$$

if the assumption of normality holds. However, since there exist severalmethods that do not select cutpoints from the available observations andto unify the reporting of metrics for these methods,cutpointrreports all metrics, e.g. $Se$ and$Sp$, based on the empiricalobservations.

cutpointr(suicide,dsi,suicide,gender,method=oc_youden_normal)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> # A tibble: 2 × 18#>   subgroup direction optimal_cutpoint method           sum_sens_spec      acc#>   <chr>    <chr>                <dbl> <chr>                    <dbl>    <dbl>#> 1 female   >=                 2.47775 oc_youden_normal       1.71618 0.895408#> 2 male     >=                 3.17226 oc_youden_normal       1.54453 0.864286#>   sensitivity specificity      AUC pos_class neg_class prevalence outcome#>         <dbl>       <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>#> 1    0.814815    0.901370 0.944647 yes       no         0.0688776 suicide#> 2    0.666667    0.877863 0.861747 yes       no         0.0642857 suicide#>   predictor grouping data               roc_curve           boot#>   <chr>     <chr>    <list>             <list>              <lgl>#> 1 dsi       gender   <tibble [392 × 2]> <rc_ctpnt [11 × 9]> NA#> 2 dsi       gender   <tibble [140 × 2]> <rc_ctpnt [11 × 9]> NA

Nonparametric kernel method

A nonparametric alternative is the Kernel method[@fluss_estimation_2005]. Here, the empirical distribution functionsare smoothed using the Gaussian kernel functions$\hat{F}N(t) = \frac{1}{n} \sum^n{i=1} \Phi(\frac{t - y_i}{h_y})$ and$\hat{G}P(t) = \frac{1}{m} \sum^m{i=1} \Phi(\frac{t - x_i}{h_x})$ forthe negative and positive classes respectively. Following Silverman’splug-in “rule of thumb” the bandwidths are selected as$h_y = 0.9 * min{s_y, iqr_y/1.34} * n^{-0.2}$ and$h_x = 0.9 * min{s_x, iqr_x/1.34} * m^{-0.2}$ where$s$ is the samplestandard deviation and$iqr$ is the inter quartile range. It has beendemonstrated that AUC estimation is rather insensitive to the choice ofthe bandwidth procedure [@faraggi_estimation_2002] and thus theplug-in bandwidth estimator has also been recommended for cutpointestimation. Theoc_youden_kernel function incutpointr uses aGaussian kernel and the direct plug-in method for selecting thebandwidths. The kernel smoothing is done via thebkde function fromtheKernSmooth package [@wand_kernsmooth:_2013].

Again, there is a way to calculate the Youden-Index from the results ofthis method [@fluss_estimation_2005] which is

$$\hat{J} = max_c {\hat{F}_N(c) - \hat{G}_N(c) }$$

but as before we prefer to report all metrics based on applying thecutpoint that was estimated using the Kernel method to the empiricalobservations.

cutpointr(suicide,dsi,suicide,gender,method=oc_youden_kernel)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> # A tibble: 2 × 18#>   subgroup direction optimal_cutpoint method           sum_sens_spec      acc#>   <chr>    <chr>                <dbl> <chr>                    <dbl>    <dbl>#> 1 female   >=                 1.18128 oc_youden_kernel       1.80812 0.885204#> 2 male     >=                 1.31636 oc_youden_kernel       1.58694 0.807143#>   sensitivity specificity      AUC pos_class neg_class prevalence outcome#>         <dbl>       <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>#> 1    0.925926    0.882192 0.944647 yes       no         0.0688776 suicide#> 2    0.777778    0.809160 0.861747 yes       no         0.0642857 suicide#>   predictor grouping data               roc_curve           boot#>   <chr>     <chr>    <list>             <list>              <lgl>#> 1 dsi       gender   <tibble [392 × 2]> <rc_ctpnt [11 × 9]> NA#> 2 dsi       gender   <tibble [140 × 2]> <rc_ctpnt [11 × 9]> NA

Additional features

Calculating only the ROC curve

When runningcutpointr, a ROC curve is by default returned in thecolumnroc_curve. This ROC curve can be plotted usingplot_roc.Alternatively, if only the ROC curve is desired and no cutpoint needs tobe calculated, the ROC curve can be created usingroc() and plottedusingplot_cutpointr. Theroc function, unlikecutpointr, does notdeterminedirection,pos_class orneg_class automatically.

roc_curve<- roc(data=suicide,x=dsi,class=suicide,pos_class="yes",neg_class="no",direction=">=")auc(roc_curve)#> [1] 0.9237791head(roc_curve)#> # A tibble: 6 × 9#>   x.sorted    tp    fp    tn    fn    tpr   tnr     fpr   fnr#>      <dbl> <dbl> <dbl> <int> <int>  <dbl> <dbl>   <dbl> <dbl>#> 1      Inf     0     0   496    36 0      1     0       1#> 2       11     1     0   496    35 0.0278 1     0       0.972#> 3       10     2     1   495    34 0.0556 0.998 0.00202 0.944#> 4        9     3     1   495    33 0.0833 0.998 0.00202 0.917#> 5        8     4     1   495    32 0.111  0.998 0.00202 0.889#> 6        7     7     1   495    29 0.194  0.998 0.00202 0.806plot_roc(roc_curve)

Midpoints

So far - which is the default incutpointr - we have considered allunique values of the predictor as possible cutpoints. An alternativecould be to use a sequence of equidistant values instead, for example inthe case of thesuicide data all integers in$[0, 10]$. However, withvery sparse data and small intervals between the candidate cutpoints(i.e. a ‘dense’ sequence likeseq(0, 10, by = 0.01)) this leads to theuninformative evaluation of large ranges of cutpoints that all result inthe same metric value. A more elegant alternative, not only for the caseof sparse data, that is supported bycutpointr is the use of a meanvalue of the optimal cutpoint and the next highest (ifdirection = ">=") or the next lowest (ifdirection = "<=") predictorvalue in the data. The result is an optimal cutpoint that is equal tothe cutpoint that would be obtained using an infinitely dense sequenceof candidate cutpoints and is thus usually more efficientcomputationally. This behavior can be activated by settinguse_midpoints = TRUE, which is the default. If we use this setting, weobtain an optimal cutpoint of 1.5 for the complete sample on thesuicide data instead of 2 when maximizing the sum of sensitivity andspecificity.

Assume the following small data set:

dat<-data.frame(outcome= c("neg","neg","neg","pos","pos","pos","pos"),pred= c(1,2,3,8,11,11,12))

Since the distance of the optimal cutpoint (8) to the next lowestobservation (3) is rather large we arrive at a range of possiblecutpoints that all maximize the metric. In the case of this kind ofsparseness it might for example be desirable to classify a newobservation with a predictor value of 4 as belonging to the negativeclass. Ifuse_midpoints is set toTRUE, the mean of the optimalcutpoint and the next lowest observation is returned as the optimalcutpoint, if direction is>=. The mean of the optimal cutpoint and thenext highest observation is returned as the optimal cutpoint, ifdirection = "<=".

opt_cut<- cutpointr(dat,x=pred,class=outcome,use_midpoints=TRUE)#> Assuming the positive class is pos#> Assuming the positive class has higher x valuesplot_x(opt_cut)

A simulation demonstrates more clearly that settinguse_midpoints = TRUE avoids biasing the cutpoints. To simulate thebias of the metric functions, the predictor values of both classes weredrawn from normal distributions with constant standard deviations of 10,a constant mean of the negative class of 100 and higher mean values ofthe positive class that are selected in such a way that optimalYouden-Index values of 0.2, 0.4, 0.6, and 0.8 result in the population.Samples of 9 different sizes were drawn and the cutpoints that maximizethe Youden-Index were estimated. The simulation was repeated 10000times. As can be seen by the mean error,use_midpoints = TRUEeliminates the bias that is introduced by otherwise selecting the valueof an observation as the optimal cutpoint. Ifdirection = ">=", as inthis case, the observation that represents the optimal cutpoint is thehighest possible cutpoint that leads to the optimal metric value andthus the biases are positive. The methodsoc_youden_normal andoc_youden_kernel are always unbiased, as they don’t select a cutpointbased on the ROC-curve or the function of metric values per cutpoint.

Finding all cutpoints with acceptable performance

By default, most packages only return the “best” cutpoint and disregardother cutpoints with quite similar performance, even if the performancedifferences are minuscule.cutpointr makes this process moreexplicit via thetol_metric argument. For example, if all cutpointsare of interest that achieve at least an accuracy within0.05 of theoptimally achievable accuracy,tol_metric can be set to0.05 andalso those cutpoints will be returned.

In the case of thesuicide data and when maximizing the sum ofsensitivity and specificity, empirically the cutpoints 2 and 3 lead toquite similar performances. Iftol_metric is set to0.05, both willbe returned.

opt_cut<- cutpointr(suicide,dsi,suicide,metric=sum_sens_spec,tol_metric=0.05,break_ties=c)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Multiple optimal cutpoints found, applying break_ties.library(tidyr)opt_cut %>%     select(optimal_cutpoint,sum_sens_spec) %>%     unnest(cols= c(optimal_cutpoint,sum_sens_spec))#> # A tibble: 2 × 2#>   optimal_cutpoint sum_sens_spec#>              <dbl>         <dbl>#> 1                2          1.75#> 2                1          1.70

Manual and mean / median cutpoints

Using theoc_manual function the optimal cutpoint will not bedetermined based on, for example, a metric but is instead set manuallyusing thecutpoint argument. This is useful for supplying andevaluating cutpoints that were found in the literature or in otherexternal sources.

Theoc_manual function could also be used to set the cutpoint to thesample mean usingcutpoint = mean(data$x). However, this may introducea bias into the bootstrap validation procedure, since the actual mean ofthe population is not known and thus the mean to be used as the cutpointshould be automatically determined in every resample. To do so, theoc_mean andoc_median functions can be used.

set.seed(100)opt_cut_manual<- cutpointr(suicide,dsi,suicide,method=oc_manual,cutpoint= mean(suicide$dsi),boot_runs=30)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Running bootstrap...set.seed(100)opt_cut_mean<- cutpointr(suicide,dsi,suicide,method=oc_mean,boot_runs=30)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Running bootstrap...

Nonstandard evaluation via tidyeval

The arguments tocutpointr do not need to be enclosed in quotes. Thisis possible thanks to nonstandard evaluation of the arguments, which areevaluated ondata.

Functions that use nonstandard evaluation are often not suitable forprogramming with. The use of nonstandard evaluation may lead to scopingproblems and subsequent obvious as well as possibly subtle errors.cutpointr uses tidyeval internally and accordingly the same rules asfor programming withdplyr apply. Arguments can be unquoted with!!:

myvar<-"dsi"cutpointr(suicide,!!myvar,suicide)

ROC curve and optimal cutpoint for multiple variables

Alternatively, we can map the standard evaluation versioncutpointr tothe column names. Ifdirection and / orpos_class andneg_classare unspecified, these parameters will automatically be determined bycutpointr so that the AUC values for all variables will be$&gt; 0.5$.

We could do this manually, e.g. usingpurrr::map, but to make thistask more convenientmulti_cutpointr can be used to achieve the sameresult. It maps multiple predictor columns tocutpointr, by defaultall numeric columns except for the class column.

mcp<- multi_cutpointr(suicide,class=suicide,pos_class="yes",use_midpoints=TRUE,silent=TRUE) summary(mcp)#> Method: maximize_metric#> Predictor: age, dsi#> Outcome: suicide#>#> Predictor: age#> --------------------------------------------------------------------------------#>  direction    AUC   n n_pos n_neg#>         <= 0.5257 532    36   496#>#>  optimal_cutpoint sum_sens_spec    acc sensitivity specificity tp fn  fp tn#>              55.5        1.1154 0.1992      0.9722      0.1431 35  1 425 71#>#> Predictor summary:#>     Data Min. 5% 1st Qu. Median    Mean 3rd Qu.   95% Max.      SD NAs#>  Overall   18 19      24   28.0 34.1259   41.25 65.00   83 15.0542   0#>       no   18 19      24   28.0 34.2218   41.25 65.50   83 15.1857   0#>      yes   18 18      22   27.5 32.8056   41.25 54.25   69 13.2273   0#>#> Predictor: dsi#> --------------------------------------------------------------------------------#>  direction    AUC   n n_pos n_neg#>         >= 0.9238 532    36   496#>#>  optimal_cutpoint sum_sens_spec    acc sensitivity specificity tp fn fp  tn#>               1.5        1.7518 0.8647      0.8889      0.8629 32  4 68 428#>#> Predictor summary:#>     Data Min.   5% 1st Qu. Median   Mean 3rd Qu.  95% Max.     SD NAs#>  Overall    0 0.00       0      0 0.9211       1 5.00   11 1.8527   0#>       no    0 0.00       0      0 0.6331       0 4.00   10 1.4122   0#>      yes    0 0.75       4      5 4.8889       6 9.25   11 2.5498   0

Accessingdata,roc_curve, andboot

The object returned bycutpointr is of the classescutpointr,tbl_df,tbl, anddata.frame. Thus, it can be handled like a usualdata frame. The columnsdata,roc_curve, andboot consist ofnested data frames, which means that these are list columns whoseelements are data frames. They can either be accessed using[ or byusing functions from the tidyverse. If subgroups were given, the outputcontains one row per subgroup and the function that accesses the datashould be mapped to every row or the data should be grouped by subgroup.

# Extracting the bootstrap resultsset.seed(123)opt_cut<- cutpointr(suicide,dsi,suicide,gender,boot_runs=1000)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Running bootstrap...# Using base R to summarise the result of the bootstrapsummary(opt_cut$boot[[1]]$optimal_cutpoint)#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.#>   1.000   2.000   2.000   2.172   2.000   5.000summary(opt_cut$boot[[2]]$optimal_cutpoint)#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.#>   1.000   1.000   3.000   2.921   4.000  11.000# Using dplyr and tidyrlibrary(tidyr)opt_cut %>%   group_by(subgroup) %>%   select(boot) %>%   unnest(boot) %>%   summarise(sd_oc_boot= sd(optimal_cutpoint),m_oc_boot= mean(optimal_cutpoint),m_acc_oob= mean(acc_oob))#> Adding missing grouping variables: `subgroup`#> # A tibble: 2 × 4#>   subgroup sd_oc_boot m_oc_boot m_acc_oob#>   <chr>         <dbl>     <dbl>     <dbl>#> 1 female        0.766      2.17     0.880#> 2 male          1.51       2.92     0.806

Adding metrics to the result of cutpointr() or roc()

By default, the output ofcutpointr includes the optimized metric andseveral other metrics. Theadd_metric function adds further metrics.Here, we’re adding the negative predictive value (NPV) and the positivepredictive value (PPV) at the optimal cutpoint per subgroup:

cutpointr(suicide,dsi,suicide,gender,metric=youden,silent=TRUE) %>%     add_metric(list(ppv,npv)) %>%     select(subgroup,optimal_cutpoint,youden,ppv,npv)#> # A tibble: 2 × 5#>   subgroup optimal_cutpoint   youden      ppv      npv#>   <chr>               <dbl>    <dbl>    <dbl>    <dbl>#> 1 female                  2 0.808118 0.367647 0.993827#> 2 male                    3 0.625106 0.259259 0.982301

In the same fashion, additional metric columns can be added to aroc_cutpointr object:

roc(data=suicide,x=dsi,class=suicide,pos_class="yes",neg_class="no",direction=">=") %>%   add_metric(list(cohens_kappa,F1_score)) %>%   select(x.sorted,tp,fp,tn,fn,cohens_kappa,F1_score) %>%   head()#> # A tibble: 6 × 7#>   x.sorted    tp    fp    tn    fn cohens_kappa F1_score#>      <dbl> <dbl> <dbl> <int> <int>        <dbl>    <dbl>#> 1      Inf     0     0   496    36       0        0#> 2       11     1     0   496    35       0.0506   0.0541#> 3       10     2     1   495    34       0.0931   0.103#> 4        9     3     1   495    33       0.138    0.15#> 5        8     4     1   495    32       0.182    0.195#> 6        7     7     1   495    29       0.301    0.318

User-defined functions

method

User-defined functions can be supplied tomethod, which is thefunction that is responsible for returning the optimal cutpoint. Todefine a new method function, create a function that may take asinput(s):

  • data: Adata.frame ortbl_df
  • x: (character) The name of the predictor variable
  • class: (character) The name of the class variable
  • metric_func: A function for calculating a metric, e.g. accuracy.Note that the method function does not necessarily have to accept thisargument
  • pos_class: The positive class
  • neg_class: The negative class
  • direction:">=" if the positive class has higher x values,"<="otherwise
  • tol_metric: (numeric) In the built-in methods, all cutpoints will bereturned that lead to a metric value in the interval [m_max -tol_metric, m_max + tol_metric] where m_max is the maximum achievablemetric value. This can be used to return multiple decent cutpoints andto avoid floating-point problems.
  • use_midpoints: (logical) In the built-in methods, if TRUE (defaultFALSE) the returned optimal cutpoint will be the mean of the optimalcutpoint and the next highest observation (for direction = “>”) orthe next lowest observation (for direction = “<”) which avoidsbiasing the optimal cutpoint.
  • ...: Further arguments that are passed tometric or that can becaptured inside ofmethod

The function should return a data frame or tibble with one row, thecolumnoptimal_cutpoint, and an optional column with an arbitrary namewith the metric value at the optimal cutpoint.

For example, a function for choosing the cutpoint as the mean of theindependent variable could look like this:

mean_cut<-function(data,x,...) {oc<- mean(data[[x]])return(data.frame(optimal_cutpoint=oc))}

If amethod function does not return a metric column, the defaultsum_sens_spec, the sum of sensitivity and specificity, is returned asthe extra metric column in addition to accuracy, sensitivity andspecificity.

Somemethod functions that make use of the additional arguments (thatare captured by...) are already included incutpointr, see thelist at the top. Since these functions are arguments tocutpointrtheir code can be accessed by simply typing their name, see for exampleoc_youden_normal.

metric

User definedmetric functions can be used as well. They are mainlyuseful in conjunction withmethod = maximize_metric,method = minimize_metric, or one of the other minimization andmaximization functions. In case of a differentmethod functionmetric will only be used as the main out-of-bag metric when plottingthe result. Themetric function should accept the following inputs asvectors:

  • tp: Vector of true positives
  • fp: Vector of false positives
  • tn: Vector of true negatives
  • fn: Vector of false negatives
  • ...: Further arguments

The function should return a numeric vector, a matrix, or adata.framewith one column. If the column is named, the name will be included inthe output and plots. Avoid using names that are identical to the columnnames that are by default returned bycutpointr, as such names willbe prefixed bymetric_ in the output. The inputs (tp,fp,tn,andfn) are vectors. The code of the included metric functions can beaccessed by simply typing their name.

For example, this is themisclassification_cost metric function:

misclassification_cost#> function (tp, fp, tn, fn, cost_fp = 1, cost_fn = 1, ...)#> {#>     misclassification_cost <- cost_fp * fp + cost_fn * fn#>     misclassification_cost <- matrix(misclassification_cost,#>         ncol = 1)#>     colnames(misclassification_cost) <- "misclassification_cost"#>     return(misclassification_cost)#> }#> <bytecode: 0x000001faa6f0ac88>#> <environment: namespace:cutpointr>

Plotting

cutpointr includes several convenience functions for plotting datafrom acutpointr object. These include:

  • plot_cutpointr: General purpose plotting function for cutpointr orroc_cutpointr objects
  • plot_cut_boot: Plot the bootstrapped distribution of optimalcutpoints
  • plot_metric: Ifmaximize_metric orminimize_metric was used thisfunction plots all possible cutoffs on the x-axis vs. the respectivemetric values on the y-axis. If bootstrapping was run, a confidenceinterval based on the bootstrapped distribution of metric values ateach cutpoint can be displayed. To display no confidence interval setconf_lvl = 0.
  • plot_metric_boot: Plot the distribution of out-of-bag metric values
  • plot_precision_recall: Plot the precision recall curve
  • plot_sensitivity_specificity: Plot all cutpoints vs. sensitivity andspecificity
  • plot_roc: Plot the ROC curve
  • plot_x: Plot the distribution of the predictor variable
set.seed(102)opt_cut<- cutpointr(suicide,dsi,suicide,gender,method=minimize_metric,metric=abs_d_sens_spec,boot_runs=200,silent=TRUE)opt_cut#> # A tibble: 2 × 18#>   subgroup direction optimal_cutpoint method          abs_d_sens_spec      acc#>   <chr>    <chr>                <dbl> <chr>                     <dbl>    <dbl>#> 1 female   >=                       2 minimize_metric       0.0437341 0.885204#> 2 male     >=                       2 minimize_metric       0.0313825 0.807143#>   sensitivity specificity      AUC pos_class neg_class prevalence outcome#>         <dbl>       <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>#> 1    0.925926    0.882192 0.944647 yes       no         0.0688776 suicide#> 2    0.777778    0.809160 0.861747 yes       no         0.0642857 suicide#>   predictor grouping data               roc_curve            boot#>   <chr>     <chr>    <list>             <list>               <list>#> 1 dsi       gender   <tibble [392 × 2]> <rc_ctpnt [11 × 10]> <tibble [200 × 23]>#> 2 dsi       gender   <tibble [140 × 2]> <rc_ctpnt [11 × 10]> <tibble [200 × 23]>plot_cut_boot(opt_cut)

plot_metric(opt_cut,conf_lvl=0.9)

plot_metric_boot(opt_cut)#> Warning: Removed 2 rows containing non-finite outside the scale range#> (`stat_density()`).

plot_precision_recall(opt_cut)

plot_sensitivity_specificity(opt_cut)

plot_roc(opt_cut)

All plot functions, except for the standard plot method that returns acomposed plot, returnggplot objects than can be further modified. Forexample, changing labels, title, and the theme can be achieved this way:

p<- plot_x(opt_cut)p+ ggtitle("Distribution of dsi")+ theme_minimal()+ xlab("Depression score")

Flexible plotting function

Usingplot_cutpointr any metric can be chosen to be plotted on the x-or y-axis and results ofcutpointr() as well asroc() can beplotted. If acutpointr object is to be plotted, it is thus irrelevantwhichmetric function was chosen for cutpoint estimation. Any metricthat can be calculated based on the ROC curve can be subsequentlyplotted as only the true / false positives / negatives over allcutpoints are needed. That way, not only the above plots can beproduced, but also any combination of two metrics (or metric functions)and / or cutpoints. The built-in metric functions as well asuser-defined functions or anonymous functions can be supplied toxvarandyvar. If bootstrapping was run, confidence intervals can beplotted around the y-variable. This is especially useful if thecutpoints, available in thecutpoints function, are placed on thex-axis. Note that confidence intervals can only be correctly plotted ifthe values ofxvar are constant across bootstrap samples. For example,confidence intervals for TPR by FPR (a ROC curve) cannot be plottedeasily, as the values of the false positive rate vary per bootstrapsample.

set.seed(500)oc<- cutpointr(suicide,dsi,suicide,boot_runs=1000,metric=sum_ppv_npv)# metric irrelevant for plot_cutpointr#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Running bootstrap...plot_cutpointr(oc,xvar=cutpoints,yvar=sum_sens_spec,conf_lvl=0.9)

plot_cutpointr(oc,xvar=fpr,yvar=tpr,aspect_ratio=1,conf_lvl=0)

plot_cutpointr(oc,xvar=cutpoint,yvar=tp,conf_lvl=0.9)+ geom_point()

Manual plotting

Sincecutpointr returns adata.frame with the original data,bootstrap results, and the ROC curve in nested tibbles, these data canbe conveniently extracted and plotted manually. The relevant nestedtibbles are in the columnsdata,roc_curve andboot. The followingis an example of accessing and plotting the grouped data.

set.seed(123)opt_cut<- cutpointr(suicide,dsi,suicide,gender,boot_runs=1000)#> Assuming the positive class is yes#> Assuming the positive class has higher x values#> Running bootstrap...opt_cut %>%     select(data,subgroup) %>%unnest %>%     ggplot(aes(x=suicide,y=dsi))+     geom_boxplot(alpha=0.3)+ facet_grid(~subgroup)#> Warning: `cols` is now required when using `unnest()`.#> ℹ Please use `cols = c(data)`.

Benchmarks

To offer a comparison to established solutions,cutpointr 1.0.0 willbe benchmarked againstoptimal.cutpoints fromOptimalCutpoints1.1-4,ThresholdROC 2.7 and custom functions based onROCR 1.0-7andpROC 1.15.0. By generating data of different sizes, thebenchmarks will offer a comparison of the scalability of the differentsolutions.

Usingprediction andperformance from theROCR package androcfrom thepROC package, we can write functions for computing thecutpoint that maximizes the sum of sensitivity and specificity.pROChas a built-in function to optimize a few metrics:

# Return cutpoint that maximizes the sum of sensitivity and specificiy# ROCR packagerocr_sensspec<-function(x,class) {pred<-ROCR::prediction(x,class)perf<-ROCR::performance(pred,"sens","spec")sens<- slot(perf,"y.values")[[1]]spec<- slot(perf,"x.values")[[1]]cut<- slot(perf,"alpha.values")[[1]]cut[which.max(sens+spec)]}# pROC packageproc_sensspec<-function(x,class) {r<-pROC::roc(class,x,algorithm=2,levels= c(0,1),direction="<")pROC::coords(r,"best",ret="threshold",transpose=FALSE)[1]}

The benchmarking will be carried out using themicrobenchmarkpackage and randomly generated data. The values of thex predictorvariable are drawn from a normal distribution which leads to a lot moreunique values than were encountered before in thesuicide data.Accordingly, the search for an optimal cutpoint is much more demanding,if all possible cutpoints are evaluated.

Benchmarks are run for sample sizes of 100, 1000, 1e4, 1e5, 1e6, and1e7. For low sample sizescutpointr is slower than the othersolutions. While this should be of low practical importance,cutpointr scales more favorably with increasing sample size. Thespeed disadvantage in small samples that leads to the lower limit ofaround 25ms is mainly due to the nesting of the original data and theresults that makes the compact output ofcutpointr possible. Thisobservation is emphasized by the fact thatcutpointr::roc is quitefast also in small samples. For sample sizes > 1e5cutpointr is alittle faster than the function based onROCR andpROC. Both ofthese solutions are generally faster thanOptimalCutpoints andThresholdROC with the exception of small samples.OptimalCutpoints andThresholdROC had to be excluded frombenchmarks with more than 1e4 observations due to high memoryrequirements and/or excessive run times, rendering the use of thesepackages in larger samples impractical.

# ROCR packagerocr_roc<-function(x,class) {pred<-ROCR::prediction(x,class)perf<-ROCR::performance(pred,"sens","spec")return(NULL)}# pROC packageproc_roc<-function(x,class) {r<-pROC::roc(class,x,algorithm=2,levels= c(0,1),direction="<")return(NULL)}

ntaskcutpointrOptimalCutpointspROCROCRThresholdROC
1e+02Cutpoint Estimation4.50180152.2887020.6621011.8128021.194301
1e+03Cutpoint Estimation4.839401045.0568010.9810012.17640136.239852
1e+04Cutpoint Estimation8.56625152538.6120014.0317015.6671012503.801251
1e+05Cutpoint Estimation45.3845010NA37.15015143.118751NA
1e+06Cutpoint Estimation465.0032010NA583.095000607.023851NA
1e+07Cutpoint Estimation5467.3328010NA7339.3561017850.258700NA
1e+02ROC curve calculation0.7973505NA0.4477011.732651NA
1e+03ROC curve calculation0.8593010NA0.6948022.035852NA
1e+04ROC curve calculation1.8781510NA3.6580505.662151NA
1e+05ROC curve calculation11.0992510NA35.32930142.820852NA
1e+06ROC curve calculation159.8100505NA610.433700612.471901NA
1e+07ROC curve calculation2032.6935510NA7081.8972517806.385452NA

About

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp