There are several graphics generated within qwraps2. The namingconvention for the “quick” plots was inspired by the (deprecated)ggplot2functionqplot. The development, flexibility, androbustness of these functions vary. Some “tips and tricks” areprovided.
Generate an example data set.
set.seed(42)n<-250x1<- x2<- x3<- x4<-vector('numeric',length = n)x1[1]<-runif(1)x2[1]<-runif(1)x3[1]<-runif(1)x4[1]<-runif(1)# white noiseZ.1<-rnorm(n,0,1)Z.2<-rnorm(n,0,2)Z.3<-rnorm(n,0,5)for(iin2:n){ x1[i]<- x1[i-1]+ Z.1[i]- Z.1[i-1]+ x4[i-1]- x2[i-1] x2[i]<- x2[i-1]-2* Z.2[i]+ Z.2[i-1]- x4[i-1] x3[i]<- x3[i-1]+ x2[i-1]+0.2* Z.3[i]+ Z.3[i-1] x4[i]<- x4[i-1]+runif(1,0.5,1.5)* x4[i-1]}testdf<-data.frame(x1, x2, x3, x4)# Base acf plot for one variableacf(testdf$x1)The implementation of qacf is based on the use ofstats::acf to produce the statistics needed for the plot.If you want to get at the data itself to build your own acf plot you canextract the data frame from the qacf return:
Introduced in[@altman1983measurement]and[@bland1986statistical], theqblandaltman method builds ggplot2 style Bland Altman plots. Forexamples we use the provided pefr data set which was transcribed from[@bland1986statistical]. Seevignette("qwraps2-data-sets", package = "qwraps2") For moredetails on that data set.
The following replicates the figures in[@bland1986statistical].
Using the first measurement only:
pefr_m1<-cbind("Large"= pefr[pefr$measurement==1& pefr$meter=="Wright peak flow meter","pefr"],"Mini"= pefr[pefr$measurement==1& pefr$meter=="Mini Wright peak flow meter","pefr"])A standard x-y style plot and a correlation coefficient suggests thatthe two meters provide reasonably similar results.
cor(pefr_m1)## Large Mini## Large 1.0000000 0.9432794## Mini 0.9432794 1.0000000ggplot2::ggplot(data =as.data.frame(pefr_m1))+ ggplot2::aes(x = Large,y = Mini)+ ggplot2::geom_point()+ ggplot2::xlab("Large Meter")+ ggplot2::ylab("Mini Meter")+ ggplot2::xlim(0,800)+ ggplot2::ylim(0,800)+ ggplot2::geom_abline(slope =1)However, for many reasons, the above is misleading. One simple note:correlation is not a metric for agreement, i.e., perfect agreement wouldbe shown if all the data points fell on the line of equality whereasperfect correlation occurs when the data points are simplyco-linear.
The Bland Altman plot plots the average value on the x-axis and thedifference in the measurements on the y-axis:
# modified plotggplot2::last_plot()+ ggplot2::xlim(0,800)+ ggplot2::ylim(-100,100)+ ggplot2::xlab("Average of two meters")+ ggplot2::ylab("Difference in the measurements")There is no distinct relationship between the differences and theaverage, but the difference in the measurements between the two meterswas observed to range between -73 and 81 liters per minute. Such adiscrepancy between the meters is not observable from the simple x-yplot.
Reliability, or repeatability, of measurements can also beinvestigated with a Bland Altman plot.
pefr_mini<-cbind(m1 = pefr[pefr$measurement==1& pefr$meter=="Mini Wright peak flow meter","pefr"],m2 = pefr[pefr$measurement==2& pefr$meter=="Mini Wright peak flow meter","pefr"])qblandaltman(pefr_mini)# create a survfit objectrequire(survival)## Loading required package: survivalleukemia.surv<- survival::survfit(survival::Surv(time, status)~ x,data = survival::aml)# base R km plotsurvival:::plot.survfit(leukemia.surv,conf.int =TRUE,lty =2:3,col =1:2)# qkmplotqkmplot(leukemia.surv,conf_int =TRUE)## Warning: Removed 1 row containing non-finite outside the scale range## (`stat_step_ribbon()`).The functionqkmplot_bulid_data_frame can be used togenerate a data.frame needed for building a KM plot. This could behelpful for creating bespoke plots.
leukemia_km_data<-qkmplot_bulid_data_frame(leukemia.surv)head(leukemia_km_data,3)## time n.risk n.event n.censor surv upper lower strata## 2 0 11 0 0 1.0000000 1 1.0000000 x=Maintained## 3 9 11 1 0 0.9090909 1 0.7541338 x=Maintained## 4 13 10 1 1 0.8181818 1 0.6192490 x=MaintainedIntercept only models are easy to plot too.
intonly_fit<- survival::survfit(survival::Surv(time, status)~1,data = survival::aml)survival:::plot.survfit(intonly_fit,conf.int =TRUE)Starting inqwraps2 version 0.6.0, the methods for buildingthese graphics have been fundamentally changed as part of a majorrefactor of theconfusion_matrix method which replaces alot of the code thatqroc andqprc were builton.
For this work we will consider a couple models for categorizing emailand spam or not based on the Spambase[@spambase] data. More details on this data setcan be found in thevignette("qwraps2-data-sets", package = "qwraps2")
Start by defining a training and validation splits of the spambasedata
set.seed(42)tidx<-runif(nrow(spambase))<=0.80xidx<-which(names(spambase)!="spam")yidx<-which(names(spambase)=="spam")training_set<- spambase[tidx, ]validating_set<- spambase[!tidx, ]Train a few models:
logistic_model<-glm( spam~ . ,data = training_set ,family =binomial() )## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurredridge_model<- glmnet::cv.glmnet(y = training_set[, yidx] ,x =as.matrix(training_set[, xidx]) ,family =binomial() ,alpha =0 )lasso_model<- glmnet::cv.glmnet(y = training_set[, yidx] ,x =as.matrix(training_set[, xidx]) ,family =binomial() ,alpha =1 )Generate the predicted values on the validation set:
validating_set$logistic_model_prediction<-predict( logistic_model ,newdata = validating_set ,type ="response" )validating_set$ridge_model_prediction<-as.numeric(predict( ridge_model ,newx =as.matrix(validating_set[, xidx]) ,type ="response" ,s ="lambda.1se" ) )validating_set$lasso_model_prediction<-as.numeric(predict( lasso_model ,newx =as.matrix(validating_set[, xidx]) ,type ="response" ,s ="lambda.1se" ) )To build ROC and/or PRC plots start by building the confusion matrixfor each model. The qwraps2 functionconfusion_matrix makesthis easy.
cm1<-confusion_matrix(spam~ logistic_model_prediction,data = validating_set)cm2<-confusion_matrix(spam~ ridge_model_prediction,data = validating_set)cm3<-confusion_matrix(spam~ lasso_model_prediction,data = validating_set)The ROC and PRC plots are ggplot objects and can be modified as youwould any other ggplot object.
Graphing all three curves in one image with AUROC in the legend:
roc_plot_data<-rbind(cbind(Model =paste("Logisitic; AUROC =",frmt(cm1$auroc,3)), cm1$cm_stats) ,cbind(Model =paste("Ridge; AUROC =",frmt(cm2$auroc,3)), cm2$cm_stats) ,cbind(Model =paste("LASSO; AUROC =",frmt(cm3$auroc,3)), cm3$cm_stats) )qroc(roc_plot_data)+ ggplot2::aes(color = Model)+ ggplot2::theme(legend.position ="bottom")Similar for PRC:
prc_plot_data<-rbind(cbind(Model =paste("Logisitic; AUPRC =",frmt(cm1$auprc,3)), cm1$cm_stats) ,cbind(Model =paste("Ridge; AUPRC =",frmt(cm2$auprc,3)), cm2$cm_stats) ,cbind(Model =paste("LASSO; AUPRC =",frmt(cm3$auprc,3)), cm3$cm_stats) )qprc(prc_plot_data)+ ggplot2::aes(color = Model)+ ggplot2::geom_hline(yintercept = cm1$prevalence)+ ggplot2::theme(legend.position ="bottom")sessionInfo()## R version 4.4.1 (2024-06-14)## Platform: x86_64-apple-darwin20## Running under: macOS Sonoma 14.6#### Matrix products: default## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0#### locale:## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#### time zone: America/Denver## tzcode source: internal#### attached base packages:## [1] stats graphics grDevices utils datasets methods base#### other attached packages:## [1] survival_3.7-0 qwraps2_0.6.1#### loaded via a namespace (and not attached):## [1] glmnet_4.1-8 Matrix_1.7-0 gtable_0.3.5 jsonlite_1.8.9## [5] dplyr_1.1.4 compiler_4.4.1 highr_0.11 tidyselect_1.2.1## [9] Rcpp_1.0.13 jquerylib_0.1.4 splines_4.4.1 scales_1.3.0## [13] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-6 ggplot2_3.5.1## [17] R6_2.5.1 labeling_0.4.3 generics_0.1.3 shape_1.4.6.1## [21] knitr_1.48 iterators_1.0.14 tibble_3.2.1 munsell_0.5.1## [25] bslib_0.8.0 pillar_1.9.0 rlang_1.1.4 utf8_1.2.4## [29] cachem_1.1.0 xfun_0.48 sass_0.4.9 cli_3.6.3## [33] withr_3.0.1 magrittr_2.0.3 foreach_1.5.2 digest_0.6.37## [37] grid_4.4.1 lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.1## [41] glue_1.8.0 farver_2.1.2 codetools_0.2-20 fansi_1.0.6## [45] colorspace_2.1-1 rmarkdown_2.28 tools_4.4.1 pkgconfig_2.0.3## [49] htmltools_0.5.8.1