ePCR is an R-package intended for the survival analysis of advanced prostate cancer. This document is a basic introduction to the functionality ofePCR and a general overview to the possible analysis workflows for clinical trial or hospital registry cohorts. The approach leverages ensemble-driven usage of single Cox regression based regression models namedePCR, which was the top performing approach in the DREAM 9.5 Prostate Cancer Challenge (Guinney et al, 2017).
The latest version ofePCR is available in the Comprehensive R Archive NetworkCRAN. CRAN mirrors are by default available in the installation of R, and theePCR package is installable using the R terminal command:install.packages("ePCR"). This should prompt the user to select a nearby CRAN mirror, after which the installation ofePCR and its dependencies are automatically performed. After theinstall.packages-call, theePCR package can be loaded with either commandlibrary("ePCR").
The following notation is used in the document: R commands, package names and function names are written intypewriter font. The notation of formatpckgName::funcName indicates that the functionfuncName is called from the packagepckgName, which is prominently used in the underlying R code due to package namespaces. This document as well as other useful PDFs can be inspected using thebrowseVignettes function for any package in R.
TheePCR-package is provided with two example hospital registry datasets. These datasets represent confidential hospital registry cohorts, to which kernel density estimation was fitted. Illustrative virtual patients were then generated from the kernel estimates and are provided here in the example datasets. Please see the accompanyingePCR publication for further details on the two Turku University Hospital cohorts (Laajala et al., 2018), and the Synapse site for DREAM 9.5 PCC for accessing the original DREAM data (Guinney, Wang, Laajala et al. 2017). The exemplifying datasets can be loaded into an R session using:
library(ePCR)## ## Attaching package: 'ePCR'## The following object is masked from 'package:graphics':## ## plot## The following object is masked from 'package:base':## ## plot# Kernel density simulated patients from Turku University Hospital (TYKS)# Data consists of TEXT cohort (text-search found patients)# and MEDI (patients identified using medication and few keywords)data(TYKSSIMU)# The following data matrices x and survival responses y become availablehead(xTEXTSIMU);head(yTEXTSIMU)## BMI HEIGHTBL WEIGHTBL ALP ALT AST CA CREAT## TEXTSIMU1 27.16556 172 83.0 4.852030 3.044522 3.401197 2.305 3.951244## TEXTSIMU2 27.16556 176 83.0 4.442651 3.258097 3.401197 2.310 4.644391## TEXTSIMU3 29.35235 168 91.2 4.304065 2.708050 3.401197 2.305 4.394449## TEXTSIMU4 24.80000 176 83.0 4.442651 2.944439 3.218876 2.330 4.465908## TEXTSIMU5 27.20000 176 83.0 5.129899 2.944439 3.401197 2.310 3.891820## TEXTSIMU6 27.16556 176 83.0 4.564348 1.609438 3.401197 2.305 4.204693## HB LDH NEU PLT PSA TBILI TESTO WBC## TEXTSIMU1 11.3 5.265247 1.128171 323 3.4657359 2.197225 -0.1743534 2.001480## TEXTSIMU2 12.6 5.265247 1.329710 216 4.6051702 2.197225 -0.1743534 2.332144## TEXTSIMU3 13.5 5.265247 2.187174 83 3.8712010 2.197225 -0.1743534 1.856298## TEXTSIMU4 12.7 5.273000 2.551006 189 0.3364722 3.135494 0.3364722 2.186051## TEXTSIMU5 12.3 5.265247 1.329710 298 6.6720329 2.197225 -0.1743534 2.041220## TEXTSIMU6 15.4 5.265247 1.329710 237 3.6505739 2.197225 -0.1743534 1.435085## CREACL NA. MG PHOS ALB TPRO RBC LYM BUN## TEXTSIMU1 3.549617 137 -0.210721 0.1397619 34.8 67 4.830 0.3364722 2.475973## TEXTSIMU2 3.549617 141 -0.210721 0.1397619 34.8 67 4.830 0.3364722 2.475973## TEXTSIMU3 3.549617 135 -0.210721 0.1397619 29.5 67 4.185 0.3364722 2.397895## TEXTSIMU4 3.549617 140 -0.210721 0.1397619 34.8 67 3.620 0.3364722 2.475973## TEXTSIMU5 3.549617 140 -0.210721 0.1397619 34.8 67 4.120 0.3364722 2.475973## TEXTSIMU6 3.549617 142 -0.210721 0.1397619 34.8 67 3.780 0.3364722 2.475973## CCRC GLU SYSTOLICBP DIASTOLICBP PULSE HEMAT SPEGRA LYMperLEU## TEXTSIMU1 3.703478 1.824549 136 76 72 0.43 0 24## TEXTSIMU2 3.703478 1.840550 142 64 72 0.45 0 22## TEXTSIMU3 3.703478 1.856298 111 76 72 0.38 0 22## TEXTSIMU4 3.703478 1.856298 128 76 72 0.38 0 22## TEXTSIMU5 3.703478 1.757858 142 76 69 0.38 0 22## TEXTSIMU6 3.703478 1.856298 151 76 72 0.38 0 22## MONO MONOperLEU NEUperLEU POT BASOperLEU EOS EOSperLEU TARGET## TEXTSIMU1 0.62 9 63 4.1 1 0.17 0 0## TEXTSIMU2 0.62 9 63 4.1 0 0.17 1 0## TEXTSIMU3 0.62 9 63 4.1 0 0.19 2 0## TEXTSIMU4 0.62 9 63 4.9 0 0.17 2 0## TEXTSIMU5 0.62 9 63 3.7 0 0.17 2 0## TEXTSIMU6 0.62 9 63 3.7 0 0.17 2 0## LYMPH_NODES KIDNEYS LUNGS LIVER PLEURA OTHER PROSTATE ORCHIDECTOMY## TEXTSIMU1 0 0 0 0 0 0 0 1## TEXTSIMU2 0 0 0 0 0 0 0 0## TEXTSIMU3 0 0 0 0 0 0 0 0## TEXTSIMU4 0 0 0 0 0 1 0 0## TEXTSIMU5 0 0 0 1 0 0 0 0## TEXTSIMU6 1 0 0 0 0 1 0 0## PROSTATECTOMY LYMPHADENECTOMY BILATERAL_ORCHIDECTOMY## TEXTSIMU1 1 0 1## TEXTSIMU2 0 0 0## TEXTSIMU3 0 0 0## TEXTSIMU4 0 0 0## TEXTSIMU5 0 0 0## TEXTSIMU6 0 0 0## PRIOR_RADIOTHERAPY ANALGESICS ANTI_ANDROGENS GLUCOCORTICOID## TEXTSIMU1 1 0 0 0## TEXTSIMU2 1 1 0 1## TEXTSIMU3 1 0 0 0## TEXTSIMU4 0 0 0 0## TEXTSIMU5 0 0 0 1## TEXTSIMU6 1 0 0 0## GONADOTROPIN BISPHOSPHONATE CORTICOSTEROID IMIDAZOLE ACE_INHIBITORS## TEXTSIMU1 0 0 0 0 0## TEXTSIMU2 0 0 0 0 0## TEXTSIMU3 0 0 0 0 0## TEXTSIMU4 0 0 0 0 0## TEXTSIMU5 0 0 0 0 0## TEXTSIMU6 0 0 0 0 0## BETA_BLOCKING HMG_COA_REDUCT ESTROGENS ANTI_ESTROGENS CEREBACC CHF## TEXTSIMU1 0 0 0 0 0 0## TEXTSIMU2 0 0 0 0 0 0## TEXTSIMU3 0 0 0 0 0 0## TEXTSIMU4 0 0 0 0 0 1## TEXTSIMU5 0 0 0 0 0 0## TEXTSIMU6 0 0 0 0 0 0## DVT DIAB MI PULMEMB SPINCOMP COPD MHBLOOD MHCARD MHCONGEN MHEAR## TEXTSIMU1 0 0 0 0 0 0 0 1 0 0## TEXTSIMU2 0 0 0 0 0 0 0 0 0 0## TEXTSIMU3 0 0 0 0 0 0 0 0 0 0## TEXTSIMU4 0 1 0 0 0 0 0 0 0 0## TEXTSIMU5 0 0 0 0 0 0 0 0 0 0## TEXTSIMU6 0 0 0 0 0 0 0 0 0 1## MHENDO MHGASTRO MHHEPATO MHIMMUNE MHINFECT MHINJURY MHINVEST MHMETAB## TEXTSIMU1 0 0 0 0 0 1 0 0## TEXTSIMU2 0 1 0 0 0 0 0 0## TEXTSIMU3 1 1 0 0 1 0 0 0## TEXTSIMU4 0 1 0 0 0 0 0 0## TEXTSIMU5 0 0 0 0 0 0 0 0## TEXTSIMU6 0 0 0 0 0 0 0 0## MHPSYCH MHRENAL MHRESP MHSKIN MHVASC ECOG_C AGEGRP2 RaceAsian## TEXTSIMU1 0 0 0 0 0 0 2 0## TEXTSIMU2 0 0 0 0 1 0 0 0## TEXTSIMU3 0 0 0 0 0 0 1 0## TEXTSIMU4 0 0 0 0 0 0 1 0## TEXTSIMU5 0 0 0 0 0 0 1 0## TEXTSIMU6 0 0 0 0 0 0 2 0## RaceBlack RaceOther RaceWhite RegionAsia RegionEastEuro## TEXTSIMU1 0 0 0 0 0## TEXTSIMU2 0 0 0 0 0## TEXTSIMU3 0 0 0 0 0## TEXTSIMU4 0 0 0 0 0## TEXTSIMU5 0 0 0 0 0## TEXTSIMU6 0 0 0 0 0## RegionNorthAmer RegionSouthAmer RegionWestEuro## TEXTSIMU1 0 0 0## TEXTSIMU2 0 0 0## TEXTSIMU3 0 0 0## TEXTSIMU4 0 0 0## TEXTSIMU5 0 0 0## TEXTSIMU6 0 0 0## DEATH LKADT_P surv## TEXTSIMU1 1 342 342## TEXTSIMU2 0 360 360+## TEXTSIMU3 1 682 682## TEXTSIMU4 0 1067 1067+## TEXTSIMU5 1 113 113## TEXTSIMU6 0 1246 1246+head(xMEDISIMU);head(yMEDISIMU)## BMI HEIGHTBL WEIGHTBL ALP ALT AST CA CREAT## MEDISIMU1 28.04282 175 90 5.093750 2.708050 3.349750 1.99 4.488636## MEDISIMU2 26.57313 176 60 5.017280 3.091042 3.258097 2.41 4.174387## MEDISIMU3 28.39506 165 65 4.418841 3.332205 3.349750 2.41 4.077537## MEDISIMU4 24.57787 176 107 5.003946 3.295837 3.349750 2.33 4.634729## MEDISIMU5 30.58581 188 73 4.158883 2.484907 3.367296 2.34 4.234107## MEDISIMU6 25.18079 174 86 4.564348 4.882802 3.349750 2.33 4.499810## HB LDH NEU PLT PSA TBILI TESTO WBC## MEDISIMU1 10.9 5.327876 1.2149127 186 6.194405 1.386294 -0.08338161 1.609438## MEDISIMU2 13.3 5.327876 0.7030975 156 2.163323 1.609438 -0.08338161 2.041220## MEDISIMU3 11.8 5.327876 1.0952734 126 3.713572 1.609438 -0.99425227 1.871802## MEDISIMU4 13.1 5.327876 0.4946962 217 3.555348 1.791759 -0.08338161 1.568616## MEDISIMU5 15.3 5.327876 1.1939225 221 3.367296 2.079442 0.78845736 1.704748## MEDISIMU6 12.8 5.327876 1.9892433 386 3.610918 1.791759 -1.56064775 1.824549## CREACL NA. MG PHOS ALB TPRO RBC LYM BUN## MEDISIMU1 0 140 -0.1923903 0.09531018 36.65 68.5 3.91 0.1823216 1.722767## MEDISIMU2 0 142 -0.1923903 -0.02020271 33.60 68.5 4.28 0.8878913 1.722767## MEDISIMU3 0 144 -0.1923903 -0.02020271 36.65 68.5 4.62 0.4946962 1.722767## MEDISIMU4 0 142 -0.1923903 -0.02020271 36.65 69.0 4.35 -0.2744368 1.722767## MEDISIMU5 0 143 -0.1923903 -0.06187540 36.65 68.5 4.05 0.4946962 1.722767## MEDISIMU6 0 137 -0.1923903 -0.02020271 36.65 68.5 4.78 0.5128236 1.722767## CCRC GLU SYSTOLICBP DIASTOLICBP PULSE HEMAT SPEGRA LYMperLEU## MEDISIMU1 3.800105 1.435085 141.5 77 68 0.37 0 29## MEDISIMU2 3.746038 1.871802 107.0 77 58 0.34 0 29## MEDISIMU3 3.800105 1.916923 141.5 77 71 0.43 0 29## MEDISIMU4 3.800105 1.871802 126.0 90 71 0.40 0 29## MEDISIMU5 3.800105 1.589235 141.5 77 71 0.35 0 28## MEDISIMU6 3.800105 1.791759 188.0 77 88 0.38 0 29## MONO MONOperLEU NEUperLEU POT BASOperLEU EOS EOSperLEU TARGET## MEDISIMU1 0.60 11 56.5 4.4 0 0.17 3 0## MEDISIMU2 0.60 11 56.5 4.5 1 0.17 3 0## MEDISIMU3 0.60 11 56.5 3.7 0 0.17 7 0## MEDISIMU4 0.88 11 56.5 4.1 0 0.17 3 0## MEDISIMU5 0.60 11 56.5 4.6 0 0.17 3 0## MEDISIMU6 0.60 11 56.5 4.0 0 0.17 3 0## LYMPH_NODES KIDNEYS LUNGS LIVER PLEURA OTHER PROSTATE ORCHIDECTOMY## MEDISIMU1 0 0 0 0 0 1 0 0## MEDISIMU2 0 0 0 0 0 0 0 0## MEDISIMU3 0 0 0 0 0 0 0 0## MEDISIMU4 1 0 0 0 0 0 0 1## MEDISIMU5 0 0 0 0 0 1 0 0## MEDISIMU6 0 0 0 0 0 0 0 0## PROSTATECTOMY LYMPHADENECTOMY BILATERAL_ORCHIDECTOMY## MEDISIMU1 0 0 0## MEDISIMU2 0 0 0## MEDISIMU3 1 0 0## MEDISIMU4 0 0 0## MEDISIMU5 0 0 0## MEDISIMU6 0 0 0## PRIOR_RADIOTHERAPY ANALGESICS ANTI_ANDROGENS GLUCOCORTICOID## MEDISIMU1 1 0 0 0## MEDISIMU2 1 1 0 0## MEDISIMU3 1 0 1 1## MEDISIMU4 0 1 1 1## MEDISIMU5 1 0 1 1## MEDISIMU6 1 0 1 1## GONADOTROPIN BISPHOSPHONATE CORTICOSTEROID IMIDAZOLE ACE_INHIBITORS## MEDISIMU1 0 1 1 0 0## MEDISIMU2 0 0 1 0 0## MEDISIMU3 0 0 1 0 0## MEDISIMU4 0 0 1 0 0## MEDISIMU5 0 0 0 0 0## MEDISIMU6 0 0 1 0 0## BETA_BLOCKING HMG_COA_REDUCT ESTROGENS ANTI_ESTROGENS CEREBACC CHF## MEDISIMU1 0 1 0 0 0 0## MEDISIMU2 0 0 0 0 0 0## MEDISIMU3 1 0 0 0 0 0## MEDISIMU4 0 0 0 0 0 0## MEDISIMU5 0 0 0 0 0 0## MEDISIMU6 1 0 0 0 0 0## DVT DIAB MI PULMEMB SPINCOMP COPD MHBLOOD MHCARD MHCONGEN MHEAR## MEDISIMU1 0 0 0 0 0 0 0 1 0 0## MEDISIMU2 0 0 0 0 0 0 0 1 0 1## MEDISIMU3 1 0 0 0 0 0 0 1 0 1## MEDISIMU4 0 1 0 0 0 0 0 0 0 0## MEDISIMU5 0 0 0 0 0 0 0 0 0 0## MEDISIMU6 0 1 0 0 0 0 0 0 0 0## MHENDO MHGASTRO MHHEPATO MHIMMUNE MHINFECT MHINJURY MHINVEST MHMETAB## MEDISIMU1 0 1 0 0 1 0 0 0## MEDISIMU2 0 0 0 0 0 1 0 1## MEDISIMU3 0 0 0 0 0 0 0 0## MEDISIMU4 0 0 0 0 0 0 0 1## MEDISIMU5 0 0 0 0 0 0 0 0## MEDISIMU6 0 0 0 0 0 0 0 0## MHPSYCH MHRENAL MHRESP MHSKIN MHVASC ECOG_C AGEGRP2 RaceAsian## MEDISIMU1 0 0 0 0 0 0 2 0## MEDISIMU2 0 0 0 0 0 0 1 0## MEDISIMU3 0 0 0 0 0 0 2 0## MEDISIMU4 0 0 0 1 0 0 2 0## MEDISIMU5 0 1 0 0 0 0 0 0## MEDISIMU6 0 0 0 0 0 0 2 0## RaceBlack RaceOther RaceWhite RegionAsia RegionEastEuro## MEDISIMU1 0 0 0 0 0## MEDISIMU2 0 0 0 0 0## MEDISIMU3 0 0 0 0 0## MEDISIMU4 0 0 0 0 0## MEDISIMU5 0 0 0 0 0## MEDISIMU6 0 0 0 0 0## RegionNorthAmer RegionSouthAmer RegionWestEuro## MEDISIMU1 0 0 0## MEDISIMU2 0 0 0## MEDISIMU3 0 0 0## MEDISIMU4 0 0 0## MEDISIMU5 0 0 0## MEDISIMU6 0 0 0## DEATH LKADT_P surv## MEDISIMU1 0 89 89+## MEDISIMU2 1 754 754## MEDISIMU3 1 783 783## MEDISIMU4 0 159 159+## MEDISIMU5 0 1322 1322+## MEDISIMU6 1 200 200library(survival)It is important to disginguish between thePSP andPEP objects, which represent a single penalized Cox regression model and an ensemble of Cox regression models, respectively.PSP objects are penalized/regularized Cox regression models fitted to a particular dataset by exploring its\(\{\lambda, \alpha\}\) parameter space. Notice that the sequence of\(\lambda\) is dependent on the\(\alpha \in [0,1]\). The regularized/penalized fitting procedure inePCR is provided by theglmnet-package (Simon et al., 2011), although custom cross-validation and other supporting functionality is provided independently.
After fitting suitable candidatePSP-objects (Penalized Single Predictors), these will be aggregated to the ensemble structurePEP (Penalized Ensemble Predictor). The key input toPEP-constructor are thePSP intended for the use of the ensemble. We will start off by introducing the fine-tuning and fitting ofPSPs. For this purpose the generic S4-class contructornew will be called with the main parameter indicating that we wish to construct aPSP-object.
The key attributes provided for the PSP-constructor are the following parameters (see?'PSP-class' in R for further documentation):
x: The input data matrix where rows corresponding to patients and columns to potential predictors.y: TheSurv-class response vector as required by Cox regression andglmnet in survival prediction.seeds: An integer vector or a single value for setting the random seed for cross-validation. Notice that this is highly suggested for reproducibility. If a multiple seed integers are provided, the cross-validation will be conducted separately for each. This will smoothen the cross-validation surface, but will take multiply the computational time required to fit a model.score: The scoring function utilized in evaluating the generalization ability of the fitted model in cross-validation; readily implemented scoring functions includescore.iAUC andscore.cindex, but custom scoring functions are also allowed.alphaseq: Sequence of alpha values. The extreme ends\(\alpha = 1\) is LASSO regression and\(\alpha = 0\) is Ridge Regression.\(\alpha \in ]0,1[\) is generally referred to as Elastic Net. Notice that LASSO and Ridge Regression have noticeably different characteristics as they utilizeo only the\(L_1\) and\(L_2\) norms, respectively; for example, a Ridge Regression model will never have its coefficients exactly zero. Furthermore, for co-linear predictors LASSO tends to pick a single one, while Ridge Regression picks multiple ones and spreads the overall effect over these predictors. Depending on the ultimate prediction purpose, one may prefer one or the other and can tailoralphaseq to suit their needs. By default we suggest utilizing an evenly spacedalphaseq over\([0,1]\) at least for preliminary search.nlambda: Number of\(\lambda\) tested as a function of the corresponding\(\alpha\). By defaultglmnet suggests 100 values which are picked from a feasible range between model including all coefficients and converged model where no further penalization is possible.folds: Number of folds in the cross-validation (minimum 3, maximum n obs = LOO-CV).For the sake of the example, we will construct anePCR model ensemble that consists of twoPSP-objects; one from the medication curated cohort and other from the text search cohort. We will leave out a small portion of medication and text search patients for a small test set, to later evaluate the generalization ability of the ensemble. Notice however that this is not a proper evaluation as the patients are not from an independent source, and therefore give an optimistic view to the generalization capability of the model(s).
testset <-1:30# Medication cohort fit# Leaving out patients into a separate test set using negative indicespsp_medi <-new("PSP",# Input data matrix x (example data loaded previously)x = xMEDISIMU[-testset,],# Response vector, 'surv'-objecty = yMEDISIMU[-testset,"surv"],# Seeds for reproducibilityseeds =c(1,2),# If user wishes to run the CV binning multiple times,# this is possible by averaging over them for smoother CV heatmap.cvrepeat =2,# Using the concordance-index as prediction accuracy in CVscore = score.cindex,# Alpha sequencealphaseq =seq(from=0,to=1,length.out=6),# Using glmnet's default nlambda of 100nlambda =100,# Running the nominal 10-fold cross-validationfolds =10,# x.expand slot is a function that would allow interaction terms# For the sake of the simplicity we will consider identity functionx.expand =function(x) {as.matrix(x) })## --- Initializing new PSP object ---## ## --- Cross-validation ( 10 -folds) repeat run 1 of 2 ---## ## [1] "alpha 0"## [1] "alpha 0.2"## [1] "alpha 0.4"## [1] "alpha 0.6"## [1] "alpha 0.8"## [1] "alpha 1"## --- Cross-validation ( 10 -folds) repeat run 2 of 2 ---## ## [1] "alpha 0"## [1] "alpha 0.2"## [1] "alpha 0.4"## [1] "alpha 0.6"## [1] "alpha 0.8"## [1] "alpha 1"## --- Computing AUCs for regularization curves for coefficients --- ## ## --- Generating feature list and dictionary --- ## ## --- New PSP object successfully created ---The parameters for the secondPSP are similar to the one above. Notice that with thePSP-members, user can tailor multiple parameters to best suit the data.
# Text run similar to above# Leaving out patients into a separate test set using negative indicespsp_text <-new("PSP",x = xTEXTSIMU[-testset,],y = yTEXTSIMU[-testset,"surv"],seeds =c(3,4),cvrepeat =2,score = score.cindex,alphaseq =seq(from=0,to=1,length.out=6),nlambda =100,folds =10,x.expand =function(x) {as.matrix(x) })## --- Initializing new PSP object ---## ## --- Cross-validation ( 10 -folds) repeat run 1 of 2 ---## ## [1] "alpha 0"## [1] "alpha 0.2"## [1] "alpha 0.4"## [1] "alpha 0.6"## [1] "alpha 0.8"## [1] "alpha 1"## --- Cross-validation ( 10 -folds) repeat run 2 of 2 ---## ## [1] "alpha 0"## [1] "alpha 0.2"## [1] "alpha 0.4"## [1] "alpha 0.6"## [1] "alpha 0.8"## [1] "alpha 1"## --- Computing AUCs for regularization curves for coefficients --- ## ## --- Generating feature list and dictionary --- ## ## --- New PSP object successfully created ---# Taking a look on the show-method for PSP:psp_medi## PSP ePCR object## N observations: 120 ## Optimal alpha: 1 ## Optimal lambda: 0.2578574 ## Optimal lambda index: 1# Plot the CV-surface of the fitted PSP:plot(psp_medi,# Showing only every 10th row and column name (propagated to heatcv-function)by.rownames=10,by.colnames=10,# Adjust main title and tilt the bias of the color key legend (see ?heatcv)main="C-index CV for psp_medi",bias=0.2)Noticeably, the cross-validation surface suggests different optimized penalization parameters for the two ensemble members. This most likely stems from systematic differences in the two cohorts, to which end theePCR methodology offers an ensemble-driven alternative to account for differences between patient substrata.
plot(psp_text,# Showing only every 10th row and column name (propagated to heatcv-function)by.rownames=10,by.colnames=10,# Adjust main title and tilt the bias of the color key legend (see ?heatcv)main="C-index CV for psp_text",bias=0.2)In addition to providing the CV-grid, the identified optimal parameters are available for downstream analyses:
psp_medi@optimum## Alpha AlphaIndex Lambda LambdaIndex ## 1.0000000 6.0000000 0.2578574 1.0000000psp_text@optimum## Alpha AlphaIndex Lambda LambdaIndex ## 1.0000000 6.0000000 0.4396716 1.0000000slotNames(psp_medi)## [1] "description" "features" "strata" "alphaseq" "cvfolds" ## [6] "nlambda" "cvmean" "cvmedian" "cvstdev" "cvmin" ## [11] "cvmax" "score" "cvrepeat" "impute" "optimum" ## [16] "seed" "x" "x.expand" "y" "fit" ## [21] "criterion" "dictionary" "regAUC"Once thePSP-objects have been constructed, they are aggregated to the corresponding Penalized Ensemble Predictor (PEP). ThePEP objects aggregatePSP objects from various data slices or optimization criteria, and create an ensemble predictor that averages over the provided single predictors. As such, its most important input is the list of desiredPSP-objects:
pep_tyks <-new("PEP",# The main input is the list of PSP objectsPSPs =list(psp_medi, psp_text))# These PSPs were constructed using the example code above.pep_tyks## Penalized Ensemble Predictor## Count of PSPs: 2# Conduct naive test set evaluationxtest <-rbind(xMEDISIMU[testset,], xTEXTSIMU[testset,])ytest <-rbind(yMEDISIMU[testset,], yTEXTSIMU[testset,])# Perform survival prediction based on the PEP-ensemble we've createdxpred <-predict(pep_tyks,newx=as.matrix(xtest),type="ensemble")# Construct a survival object using the Surv-classytrue <-Surv(time = ytest[,"surv"][,"time"],event = ytest[,"surv"][,"status"])# Test c-index between our constructed ensemble prediction and true responsetyksscore <-score.cindex(pred = xpred,real = ytrue)print(paste("TYKS example c-index:",round(tyksscore,4)))## [1] "TYKS example c-index: 0.5"TheePCR R-package comes with readily fittedePCR-ensembles from the work by (Guinney, Wang, Laajala et al. 2017) as well as from hospital registry cohorts. Due to data confidentiality issues, the original data matrices or responses are not provided in the S4-objects (although normally they would be in the slots@x and@y, respectively).
In order to gain access to the original data by Guinney et al., the processed data can be accessed as raw.csv files or R workspaces at the correspondingSynapse workspace.
Accessing the Turku University Hospital registry cohort requires a research permit and users are encouraged to contact the Center for Clinical Informatics (Arho.Virkki@tyks.fi) for further information.
Despite not providing the original data matrices, the ensemble model fits and their coefficients as a function of\(\{\lambda, \alpha\}\) are fully functional. They are therefore suitable for conducting predictions for future patients or for studying effect within the estimated models/ensembles. These model objects can be loaded inePCR using:
data(ePCRmodels)class(DREAM)## [1] "PEP"## attr(,"package")## [1] ".GlobalEnv"class(TYKS)## [1] "PEP"## attr(,"package")## [1] "ePCR"TheDREAM S4-object is the top-performing mCRPC OS-predicting ensemble from Guinney et al., while the TYKS models are fitted to the original Turku University Hospital cohorts. These model objects can be used for prediction similarly to the novel S4PEP-object created in above sections. As an example, if we utilize the DREAM model trained on controlled clinical trials on the TYKS hospital registry patients, the OS prediction can be conducted using:
# Create a DREAM-matching data input matrix from our xtest and the full data matrixxtemp <-conforminput(DREAM, xtest)# Predict survival for our hospital registry example datasetdreampred <-predict(DREAM,# Providing full new data and average prediction over the ensemble membersnewx=xtemp,type="ensemble",# Defining that we don't want any further data matrix feature extraction# The call to conforminput above already formatted the input datax.expand = as.matrix)Notice that we utilize the helper functionconforminput for feature extraction/creation, as multiple interaction variables were introduced in the original DREAM data matrix and the dimensions would not match in the regression task otherwise.
The following error message is quite commonly encountered when first using pre-built models to new data:
Error in newx %*% nbeta : Cholmod error ‘X and/or Y have wrong dimensions’ at file ../MatrixOps/cholmod_sdmult.c, line 90
It is prompted by theglmnet-package’s C/Fortran implementation, if the\(\beta\) coefficients do not conform to the provided dimensions of the new data matrix\(X\). For this purpose, the new data should have equal number of columns (variables) using data processing (functions such asconforminput or the S4-slot in aPEP-object calledx.expand).
# Test c-index between the DREAM ensemble prediction and TYKS true responsedreamscore <-score.cindex(pred = dreampred,real = ytrue)print(paste("DREAM example c-index:",round(dreamscore,4)))## [1] "DREAM example c-index: 0.389"sessionInfo()## R Under development (unstable) (2023-09-30 r85239 ucrt)## Platform: x86_64-w64-mingw32/x64## Running under: Windows 10 x64 (build 19045)## ## Matrix products: default## ## ## locale:## [1] LC_COLLATE=C LC_CTYPE=English_Finland.utf8 ## [3] LC_MONETARY=English_Finland.utf8 LC_NUMERIC=C ## [5] LC_TIME=English_Finland.utf8 ## ## time zone: Europe/Helsinki## tzcode source: internal## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] survival_3.5-7 ePCR_0.11.0 ## ## loaded via a namespace (and not attached):## [1] Matrix_1.6-1.1 glmnet_4.1-8 future.apply_1.11.0## [4] jsonlite_1.8.7 compiler_4.4.0 Rcpp_1.0.11 ## [7] parallel_4.4.0 jquerylib_0.1.4 globals_0.16.2 ## [10] splines_4.4.0 yaml_2.3.7 fastmap_1.1.1 ## [13] lattice_0.21-8 prodlim_2023.03.31 impute_1.75.1 ## [16] Bolstad2_1.0-29 R6_2.5.1 shape_1.4.6 ## [19] knitr_1.44 iterators_1.0.14 pec_2023.04.12 ## [22] future_1.33.0 bslib_0.5.0 rlang_1.1.1 ## [25] cachem_1.0.8 xfun_0.39 sass_0.4.7 ## [28] cli_3.6.1 hamlet_0.9.6 digest_0.6.33 ## [31] foreach_1.5.2 grid_4.4.0 mvtnorm_1.2-2 ## [34] lava_1.7.2.1 timereg_2.0.5 timeROC_0.4 ## [37] evaluate_0.21 pracma_2.4.2 data.table_1.14.8 ## [40] numDeriv_2016.8-1.1 listenv_0.9.0 codetools_0.2-19 ## [43] parallelly_1.36.0 rmarkdown_2.23 tools_4.4.0 ## [46] htmltools_0.5.5