Abstract

The MDCcure package provides statistical tools to test the effect ofcovariates on the cure rate in mixture cure models. It implementsnonparametric hypothesis tests based on the martingale differencecorrelation (MDC) and its partial version (PMDC), which measuredepartures from conditional mean independence. Since the cure status ispartially observed due to censoring, the package uses an estimator thatadjusts for this by leveraging nonparametric methods for cureprobability and latency functions. Main features include estimation ofMDC and partial MDC, hypothesis tests via permutation and chi-squareapproximations, and a goodness-of-fit test for parametric cure ratemodels. The package incorporates efficient implementations in C++ withparallel computing to handle the computational intensity of the MDCmethods. The package also offers visualization tools to compareparametric fits to flexible nonparametric estimators, aidinginterpretation and model assessment.

Introduction

Standard statistical methods for time-to-event analysis—oftenreferred to as survival models—typically assume that every subject inthe study population is susceptible to the event of interest and willeventually experience the event if the follow-up is sufficiently long.However, in practice, a fraction of individuals may never experience theevent. These subjects are considered as having infinite survival timesand are referred to ascured(Peng andYu 2021).

Over the last few decades, specific methods have been developed todeal with this type of data, collectively known ascure modelanalysis. Several reviews and methodological contributionssummarize the theory and application of these models(Maller et al. 2024; Peng and Yu 2021; Patilea and VanKeilegom 2020; Mailis Amico and Van Keilegom 2018; Morbiducci, Nardi,and Rossi 2003).

Cure models explicitly account for a subpopulation that is notsusceptible to the event, allowing direct modeling of the cureproportion and the effects of covariates on it. Additionally, thesemodels provide inference on the survival distribution of the uncured(susceptible) subjects. Most existing cure models are modifications ofstandard survival models that incorporate the cure probability. Based onhow this incorporation is done, cure models are typically categorized aseithermixture ornonmixture cure models. A keyadvantage ofmixture cure models is that they allowcovariates to affect both the cured and uncured groups differently. Thispackage focuses on mixture cure models.

Let\(Y\) denote the time to event.It is assumed that subjects are subject to right censoring, and thecensoring time\(C\) and the event time\(Y\) are conditionally independentgiven covariates\(\boldsymbol{X}\).The conditional distribution and survival functions of\(Y\) are given by\[F(t|\boldsymbol{x}) = \mathbb{P}(Y \leq t \mid \boldsymbol{X} =\boldsymbol{x}), \quad S(t|\boldsymbol{x}) = 1 - F(t|\boldsymbol{x}).\] Under right censoring, only the pair\((T, \delta)\) is observed, where\(T = \min(Y, C)\) and\(\delta = \mathbb{I}(Y \leq C)\) is thecensoring indicator. The conditional distributions of\(C\) and\(T\) are denoted by\(G(t|\boldsymbol{x})\) and\(H(t|\boldsymbol{x})\), respectively.

Define the cure indicator\(\nu =\mathbb{I}(Y = \infty)\), where\(\nu =0\) for susceptible individuals and\(\nu = 1\) for cured individuals. Theprobability of not being cured (incidence) is\(p(\boldsymbol{x}) = \mathbb{P}(\nu = 0 \mid\boldsymbol{X} = \boldsymbol{x})\), and the conditional survivalfunction of the uncured group (latency) is\[S_0(t|\boldsymbol{x}) = \mathbb{P}(Y > t \mid \nu = 0, \boldsymbol{X}= \boldsymbol{x}).\] Then, the mixture cure model can be written as:\[S(t|\boldsymbol{x}) = 1 - p(\boldsymbol{x}) + p(\boldsymbol{x})S_0(t|\boldsymbol{x}).\]

Motivation for Covariate Testing

Testing whether a covariate has an effect on the cure rate\(1 - p(\boldsymbol{x})\) can be viewed as aproblem of statistical dependence between the cure status\(\nu\) and covariates\(\boldsymbol{X}\). Various methods have beenproposed to detect such dependencies, including universally consistentmeasures such as distance correlation(Szekely,Rizzo, and Bakirov 2007), the Hilbert–Schmidt independencecriterion(Gretton et al. 2005), and theHeller–Heller–Gorfine statistic(Heller, Heller,and Gorfine 2013).

More recently,Shao and Zhang (2014)proposed themartingale difference correlation (MDC) tomeasure the departure from conditional mean independence between ascalar response\(Y\) and a vectorpredictor\(\boldsymbol{X}\):\[\mathbb{E}(Y \mid \boldsymbol{X}) = \mathbb{E}(Y) \quad \text{a.s.}\] The martingale difference correlation is defined as:\[\text{MDC}(Y \mid \boldsymbol{X})^2 =\begin{cases}\dfrac{\text{MDD}(Y \mid \boldsymbol{X})^2}{\sqrt{\text{Var}(Y)^2\mathcal{V}^2(\boldsymbol{X})}}, & \text{if } \text{Var}(Y)^2\mathcal{V}^2(\boldsymbol{X}) > 0, \\0, & \text{otherwise},\end{cases}\] where\(\mathcal{V}^2(\boldsymbol{X})\) is thedistance variance(Szekely, Rizzo, and Bakirov2007), andmartingale difference divergence(MDD) is given by:\[\text{MDD}(Y \mid \boldsymbol{X})^2 = \frac{1}{c_p} \int_{\mathbb{R}^p}\frac{|g_{Y,X}(s) - g_Y g_X(s)|^2}{|s|_p^{1+p}} ds,\] with\(g_{Y,X}(s) = \mathbb{E}(Ye^{i \langle s, \boldsymbol{X} \rangle})\),\(g_Y = \mathbb{E}(Y)\), and\(g_X(s) = \mathbb{E}(e^{i \langle s, \boldsymbol{X}\rangle})\).

Two estimators are commonly used: a biased version (MDCV) fromShao and Zhang (2014) and an unbiased version(MDCU) developed inPark, Shao, and Yao(2015).

Partial Martingale Difference Correlation

The notion of conditional mean independence has been extended tosettings where one adjusts for an additional set of covariates\(\boldsymbol{Z}\). Specifically,\(Y\) is said to beconditionallymean independent of\(\boldsymbol{X}\) given\(\boldsymbol{Z}\) if:\[\mathbb{E}(Y \mid \boldsymbol{X}, \boldsymbol{Z}) = \mathbb{E}(Y \mid\boldsymbol{Z}) \quad \text{a.s.}\]

Park, Shao, and Yao (2015) introduced ameasure called thepartial martingale difference correlation(pMDC) to quantify this conditional mean independence, allowingone to assess the dependence of\(Y\)on\(\boldsymbol{X}\) while adjustingfor\(\boldsymbol{Z}\).

This framework motivates a nonparametric test for covariate effectsin cure models, since the cure probability can be expressed as\(1 - p(\boldsymbol{x}) = \mathbb{E}(\nu \mid\boldsymbol{X} = \boldsymbol{x})\). That is, testing the effectof covariates on the cure rate reduces to testing whether the cureindicator\(\nu\) is conditionally meanindependent of\(\boldsymbol{X}\).

To extend this idea to two covariates, we use thepartialmartingale difference correlation, enabling robust and flexiblehypothesis testing in the context of mixture cure models.

Nonparametric hypothesis tests for the cure rate

The proposed hypotheses are the following:

\[\begin{equation}\label{Eq:Test} \mathcal{H}_0 : \mathbb{E}(\nu | \boldsymbol{X}) \,\, \equiv \,\, 1- p \quad \text{a.s.} \quad \text{vs} \quad \mathcal{H}_1 :\mathbb{E}(\nu | \boldsymbol{X}) \,\, \not\equiv \,\, 1 - p \quad\text{a.s.},\end{equation}\]

which tests whether the covariate vector\(\boldsymbol{X}\) has an effect on the cureprobability. The main problem with the hypotheses\(\eqref{Eq:Test}\) is that the responsevariable (the cure indicator,\(\nu\))is only partially observed due to censoring.

In this package, this challenge is addressed by estimating the cureindicator using the methodology proposed byM.Amico, Van Keilegom, and Han (2021). The idea is as follows:define\(\tau\) as the upper limit ofthe support of the lifetime for a susceptible individual, where\(\tau = \sup_{x} \tau(x)\) and\(\tau(x) = \inf\{t:S_0(t|x) = 0\}\). Weassume that\(\tau < \infty\) andthat the follow-up time is long enough so that\(\tau < \tau_{G(x)}\) for all\(x\), where\(\tau_{G(x)} = \inf\{t:G(t|x) = 1\}\).

Therefore, it is reasonable to consider that all the individuals witha censored observed time greater than\(\tau\) can be categorized as cured\((\nu = 1)\). Then, the proxy of the curerate\(\nu\) is defined as:

\[\begin{equation}\label{eq:nu_tilde_0} \tilde{\nu} = \mathbb{I}(T > \tau) + (1-\delta)\mathbb{I}(T \leq\tau) \, \frac{1 - p(\boldsymbol{X})}{1-p(\boldsymbol{X}) +p(\boldsymbol{X})S_0(T|\boldsymbol{X})}.\end{equation}\]

Since under\(\mathcal{H}_0\) thecure probability does not depend on\(\boldsymbol{X}\), the approximation of\(\nu\) becomes under\(\mathcal{H}_0\):

\[\begin{equation}\label{eq:nu_tilde} \tilde{\nu}_0 = \mathbb{I}(T > \tau) + (1-\delta)\mathbb{I}(T\leq \tau) \, \frac{1 - p}{1-p + pS_0(T|\boldsymbol{X})}.\end{equation}\]

One issue with the proposal of\(\tilde{\nu}_0\) is that it cannot becomputed in practice since\(p\),\(\tau\), and the latency\(S_0(t|\boldsymbol{X})\) are unknown. Anestimator that handles this problem is proposed as:

\[\begin{equation}\label{eq:nu_hat} \hat{\nu}_{h} = \mathbb{I}(T > \hat{\tau}) +(1-\delta)\mathbb{I}(T \leq \hat{\tau}) \, \frac{1 - \hat{p}}{1-\hat{p}+ \hat{p}\hat{S}_{0,h}(T|\boldsymbol{X})}\end{equation}\]

where\(\hat{p}\) is thenonparametric estimator proposed byLaska andMeisner (1992),\(\hat{\tau}\)is the largest observed uncensored survival time, and\(\hat{S}_{0,h}(T|\boldsymbol{X})\) is thenonparametric estimator based on the Beran estimator proposed byLópez-Cheda, Jácome, and Cao (2017).

Two statistics for the covariate hypothesis test, based on themartingale difference correlation between the estimated cure rate\(\hat{\nu}_h\) and the covariate\(\boldsymbol{X}\), are proposed:

\[ \text{MDCV}_n(\hat{\nu}_h|\boldsymbol{X})^2 \quad \text{and} \quad\text{MDCU}_n(\hat{\nu}_h|\boldsymbol{X})^2,\]

where\(\text{MDCV}_n\) and\(\text{MDCU}_n\) correspond to the biasedand bias-corrected estimators, respectively.

In practice, we do not have access to the true null distribution ofthese statistics. To address this, the null distribution is approximatedusing two methods: a permutation test (seeGretton et al. (2008),Pfister et al. (2018)) and a chi-squaretest.

The permutation test compares the statistic computed from theoriginal data to the distribution of the statistic computed overmultiple randomly permuted datasets. While this approach isnonparametric and robust, it can be computationally demanding,especially for large sample sizes or when many permutations arerequired.

To provide a faster alternative, a chi-square approximation isimplemented for the\(\text{MDCU}_n^2\)test statistic. This approach extends the idea proposed byShen, Panda, and Vogelstein (2022) and offers acomputationally efficient solution for hypothesis testing.

Goodness-of-Fit Test

As discussed byMüller and Van Keilegom(2019), most studies in the cure model literature either assumethat no cure fraction exists, or, if it does, that it follows a logisticmodel. However, both assumptions may be unrealistic in practice.

First, the existence of a cure fraction has been debated in thesurvival analysis literature. Second, even when a cure fraction ispresent, there is no theoretical reason to expect the cure probabilityto follow a monotonic function of the covariates—let alone a logisticform.

To address this issue,Müller and Van Keilegom(2019) proposegoodness-of-fit tests forevaluating whether a parametric model for the cure probability isconsistent with the observed data. These tests are flexible andwell-suited to situations where the functional form of the cure model isuncertain or possibly misspecified. Notably, their test is the first toallow the cure rate to vary with a covariate.

The hypotheses tested are:

\[\mathcal{H}_0 : p(x) = p_{\theta}(x) \quad \text{for some } \theta \in\Theta \quad \text{vs.} \quad \mathcal{H}_1 : p(x) \neq p_{\theta}(x)\quad \forall \theta \in \Theta,\]

where\(\Theta\) is afinite-dimensional parameter space, and\(p_{\theta}(x)\) is a known parametric formup to the parameter vector\(\theta\).

The proposed test statistic is a weighted\(L_2\)-distance between a nonparametricestimator\(\hat{p}_h(x)\) of the cureprobability and its parametric counterpart\(p_{\hat{\theta}}(x)\) estimated under\(\mathcal{H}_0\). The test statistic isdefined as:

\[\mathcal{T}_n = nh^{1/2} \int \left( \hat{p}_h(x) - p_{\hat{\theta}}(x)\right)^2 \pi(x)\, dx,\]

where:

\(\hat{p}_h(x)\) is a nonparametricestimator with bandwidth\(h\),
\(\hat{\theta}\) is an estimator ofthe parameter\(\theta\),
\(\pi(x)\) is a user-specifiedweight function (often a density on the covariate space).

In practice, the empirical version of the test statistic is used:

\[\tilde{\mathcal{T}}_n = nh^{1/2} \frac{1}{n} \sum_{i=1}^n \left(\hat{p}_h(X_i) - p_{\hat{\theta}}(X_i) \right)^2.\] The critical values of the test are approximated using abootstrap procedure.

MDCcure

Overview of the package

TheMDCcure package implements the methodologyproposed inMonroy-Castillo et al. (2025)to test whether a covariate or a set of covariates has an effect on thecure rate. This methodology is based on themartingale differencecorrelation (MDC). Since the MDC function is not available on CRAN,it can be obtained from older versions of theEDMeasurepackage. However, the original implementation is computationallyintensive. To address this,MDCcure leverages C++ andparallel computing usingRcppParallel to improveefficiency.

Main Features

TheMDCcure package offers functionality in threemain areas:

1. Dependence Measures

The package provides functions to estimate themartingaledifference correlation (mdc()),martingaledifference divergence (mdd()), and their partialcounterparts—partial martingale difference correlation andpartial martingale difference divergence (pmdc()andpmdd()). It also includes a permutation test for MDCand a chi-squared test based on the bias-corrected MDC(mdc_test()), which test for dependence between a responsecovariate,\(Y\), and a covariatevector\(\boldsymbol{X}\) in theconditional mean, that is, whether the conditional mean of\(Y\) given\(\boldsymbol{X}\) depends on\(\boldsymbol{X}\) is tested.

2. Covariate Hypothesis Testing for the Cure Rate in Mixture CureModels

The main function,testcov(), enables formal testing ofwhether a covariate significantly influences the cure rate in a mixturecure model. This is extended to handle two covariates with thetestcov2() function.

3. Goodness-of-Fit Testing for the Cure Rate in Mixture CureModels

The package implements the goodness-of-fit (GOF) test which assesseswhether the specified parametric model for the cure rate adequately fitsthe data.

In addition to the formal statistical test,MDCcureincludes visualization tools that compare the parametric cure ratefunction with a flexible nonparametric estimate, which is obtained usingtheprobcure() function from thenpcurepackage(López-Cheda, Jácome, andLópez-de-Ullibarri 2021). This graphical comparison provides anintuitive way to assess model adequacy and explore deviations from theassumed model.

Using the package

The following code illustrates a typical call:

library(MDCcure)

Dependence measures

For the first part, and for illustration purposes, we will simulatedata to demonstrate the main functionalities of the package. Thefunctionsmdc andmdd return the squaredmartingale difference correlation and the martingale differencedivergence, respectively. Note that two options for thecenter argument are available:"U" for theunbiased estimator and"D" for the double-centered version,which results in a biased estimator.

set.seed(123)X<-matrix(rnorm(10*2),10,2)Y<-matrix(rnorm(10*2),10,2)mdc(X, Y,center ="U")## [1] 0.2724937mdc(X, Y,center ="D")## [1] 0.3998999

To formally assess the dependence between a response variable\(Y\) and a covariate vector\(\boldsymbol{X}\), the functionmdc_test() provides hypothesis testing procedures based onthemartingale difference correlation framework.

This function supports multiple testing approaches:

MDCU: a permutation test based on thebias-corrected version of the martingale differencecorrelation.
MDCV: a permutation test based on thebiased (double-centered) version.
FMDCU: a fast approximation to the test using theasymptotic distribution of the bias-correctedmartingale difference divergence, which is dominated in the upper tailfor achi-squared distribution.

The null hypothesis in all cases is:\[\mathcal{H}_0: \mathbb{E}(Y \mid \boldsymbol{X}) = \mathbb{E}(Y) \quad\text{a.s.}\] i.e., that\(Y\) is meanindependent of\(\boldsymbol{X}\).

Thepmdc() andpmdd() functions providescalar measures to quantify conditional mean dependence while adjustingfor a set of conditioning variables.

Specifically:

pmdc() returns thesquared partial martingaledifference correlation, which measures the conditional meandependence between\(Y\) and\(\boldsymbol{X}\) adjusting on\(\boldsymbol{Z}\).
pmdd() returns thesquared partial martingaledifference divergence.

set.seed(123)X<-matrix(rnorm(10*2),10,2)Y<-matrix(rnorm(10*2),10,2)Z<-matrix(rnorm(10*2),10,2)pmdd(X, Y, Z)## [1] 0.1247584pmdc(X, Y, Z)## [1] 0.2555037

Covariate hypothesis test for the cure rate in mixture curemodels

With one covariate

Thetestcov() function performsnonparametrichypothesis testing to assess whether a covariate is associatedwith thecure probability in amixture curemodel. This function supports several types of test statistics,including those based on themartingale difference correlation(MDC) and an alternativeGOFT test, whichcompares the distance between two nonparametric estimators of the cureprobability: one under the null hypothesis (constant cure rate) and theother under the alternative hypothesis (covariate-dependent curerate).

The covariate effect is evaluated using observed survival times,censoring indicators, and kernel-based estimation of the latency. Thetest is flexible in terms of bandwidth selection and computationalstrategies.

Available Test Methods

Themethod argument allows the user to choose among thefollowing options:

"MDCU": Martingale Difference Correlation usingU-centering, evaluated via a permutation test.
"MDCV": Martingale Difference Correlation usingdouble-centering, with a permutation test.
"FMDCU": Afast approximation of"MDCU" using anasymptotic chi-squareddistribution.
"GOFT": An alternativegoodness-of-fittest specifically designed for cure models.
"All": Executes all available tests and returns theirresults in a single output.

Bandwidth Selection

The test relies on the estimation of the cure status, which involveskernel smoothing and requires a bandwidth parameterh. Youcan:

Provide a numeric value for manual bandwidth selection.
Seth = NULL to let the function select an optimalbandwidth automatically.
Useh = "bootstrap" to apply thebootstrapbandwidth selection method proposed byLópez-Cheda et al. (2017). Note: This option cansignificantly increase computation time.

By default,testcov() usesparallelcomputing to improve efficiency (parallel = TRUE).It is possible to control the number of cores used via then_cores argument. Ifn_cores = NULL, thefunction will automatically detect the number of available cores and useall but one of them.

With two covariates

Thetestcov2() function performsnonparametrichypothesis testing to assess whether a covariate is associatedwith thecure probability, while adjusting for a secondcovariate in amixture cure model. In other words, itevaluates whether a covariate has an effect on the cure rate, given thatanother covariate is already known to influence it.

\[\mathcal{H}_0 : \mathbb{E}(\nu \mid \boldsymbol{X}, \boldsymbol{Z})\equiv 1 - p(\boldsymbol{Z}) \quad \text{a.s.}\quad \text{vs} \quad\mathcal{H}_1 : \mathbb{E}(\nu \mid \boldsymbol{X}, \boldsymbol{Z})\not\equiv 1 - p(\boldsymbol{Z}) \quad \text{a.s.},\]

where\(\nu\) denotes the(unobserved) cure status, and\(p(\cdot)\) is the cure probabilityfunction.

This test extends the theory developed for a single covariate to thetwo-covariate setting. Estimation of the cure status relies on kernelsmoothing and requires abandwidth matrix\(\boldsymbol{H}\). IfH = NULL,the bandwidth matrix is automatically selected using the rule-of-thumbmethod.

Goodnnes-of-fit test for the cure rate in a mixture cure model

Thegoft() function performs agoodness-of-fittest to assess whether a given parametric model—depending on acovariate—is appropriate for modeling the cure rate in amixturecure model.

Key features include:

Automatic bandwidth selection whenh = NULL,
Adefault initial value for the parameter vector\(\theta\) iftheta0 = NULL,
Support for three parametric cure rate models via themodel argument:"logit","probit", and"cloglog".

In addition to statistical testing, the functionplotCure() provides a visual comparison between thenonparametric andparametricestimations of the cure rate, enabling users to better assess themodel’s adequacy.

Example

In order to illustrate the use of the functions defined above, wesimulate data following the setting described inMüller and Van Keilegom (2019). In this example,the cure rate is generated from a logistic model given by

\[p(x) = \frac{1}{1 + \exp(1 + x)},\]

so it follows a logistic form and depends on the covariate\(X\). Additionally, we simulate a secondcovariate\(Z\), which is independentof both\(X\) and the cure rate.

set.seed(1234)theta0=c(1,1)gamma0=0.5gamma1=0.5cT=0n<-200maxT=0.02*nX=runif(n,-1,1)Z=runif(n,-1,1)phi=exp(theta0[1]+theta0[2]*X)phi= phi/(1+phi)B= (runif(n)<= phi)aT= (X+1)^cTlambdaX=exp(gamma0+gamma1*X)bT= lambdaX^(-1/aT)tau=qweibull(0.9,shape=mean(aT),scale=mean(bT))Y=rep(100000,n)count=0for (jin1:n){if (B[j]==1){stop=0while (stop==0){Y[j]=rweibull(1,shape=aT[j],scale=bT[j])if ((Y[j]> tau)*(count<= maxT)){Y[j]= taucount= count+1}if (Y[j]<= tau) stop=1}}}aC=1bC=1.5C=rweibull(n,shape=aC,scale=bC)C=replace(C,C>tau,tau+0.001)T=apply(cbind(Y,C),1,min)Delta=as.numeric(Y<= C)#Covariate hypothesis test for the cure rate with one covariatetestcov(X, T, Delta,method ="All",P =499)## ============== Covariate hypothesis test for the cure rate ============## Method: All#### MDCU: Martingale difference correlation unbiased with permutations## Test statistic: 0.03986## p-value: 0.002## Number of permutations: 499#### MDCV: Martingale difference correlation biased with permutations## Test statistic: 0.04746## p-value: 0.008## Number of permutations: 499#### FMDCU: Fast Chi-square test based on MDC unbiased## Test statistic: 0.03986## p-value: 0.0027426#### GOFT## Test statistic: 0.769## p-value: 0.03006## Number of bootstrap replicates: 499#### ========================================================================#Covariate hypothesis test for the cure rate with two covariatestestcov2(X, T, Z, Delta,P =499)## ---------------------------------------------------------------------## Covariate hypothesis test for the cure rate with two covariates## ---------------------------------------------------------------------## Hypotheses:##   H0: The conditional mean of the cure status given X adjusting on Z does not depend on X,##       i.e., E[nu|X,Z] = E[nu|Z].##   H1: The conditional mean of the cure status depends on X adjusting on Z#### Test Statistic: 0.02869## p-value: 0.004## Number of permutations: 499testcov2(Z, T, X, Delta,P =499)## ---------------------------------------------------------------------## Covariate hypothesis test for the cure rate with two covariates## ---------------------------------------------------------------------## Hypotheses:##   H0: The conditional mean of the cure status given Z adjusting on X does not depend on Z,##       i.e., E[nu|Z,X] = E[nu|X].##   H1: The conditional mean of the cure status depends on Z adjusting on X#### Test Statistic: -0.002226## p-value: 0.744## Number of permutations: 499#Goodness-of-fit test for the cure rategoft(X, T, Delta,model ="logit")## ---------------------------------------------------------------------## Goodness-of-fit test for the cure rate in a mixture cure model## -------------------------------------------------------------------#### Model: logit#### Hypotheses:##   H0: The model fits the data, i.e., p(x) = p_theta(x)##   H1: The model does not fit the data, i.e., p(x) != p_theta(x)#### Test Statistic: 0.1192## p-value: 0.4168## Number of bootstrap replications: 499# plotCure(X, T, Delta, density = FALSE)

In the first instance, we test whether the cure fraction depends onthe covariate\(X\). All availablemethods are applied. By default, the function usesmethod = "FMDCU", which corresponds to the fast chi-squaretest for the martingale difference correlation. In all cases, theresulting p-values are below 0.05, which aligns with the data-generatingmechanism, the cure probability was simulated to depend on\(X\) through a logistic function. Thisprovides evidence against the null hypothesis of no covariate effect onthe cure rate.

Secondly, we test whether the cure fraction depends on twocovariates. The order of covariates is crucial in this test:testcov2(X, Y, Z) is not equivalent totestcov2(Z, Y, X).

In the first call,testcov2(X, Y, Z), we testwhether the cure fraction depends on\(X\), given that it depends on\(Z\). The resulting p-value is less than0.05, indicating that\(X\) providesadditional explanatory power. This aligns with the simulation setup,where the cure fraction depends on\(X\).
In the second call,testcov2(Z, Y, X), we testwhether the cure fraction depends on\(Z\), given that it depends on\(X\). The p-value is greater than 0.05,indicating that\(Z\) does not provideadditional information once\(X\) isaccounted for. Again, this agrees with the simulation design, where thecure probability is independent of\(Z\).

This example illustrates the asymmetry of the test and the importanceof specifying the covariates in the correct order.

Finally, in the third part we test if the data fit to a logisticmodel. Note that:

The p-value is greater than 0.05, which aligns with thesimulation.
The plot shows both estimations:
- Parametric (red)
- Nonparametric (blue)
For the parametric case, the parameter\(\theta\) is given by default whentheta = NULL and is obtained with thesmcure()function from thesmcure package.
Additionally, it is possible to add the density of the covariatewhendensity = TRUE.

Amico, Mailis, and Ingrid Van Keilegom. 2018.“Cure Models inSurvival Analysis.”Annual Review of Statistics and ItsApplication 5 (1): 311–42.

Amico, M, I Van Keilegom, and B Han. 2021.“Assessing Cure StatusPrediction from Survival Data Using Receiver Operating CharacteristicCurves.”Biometrika 108 (3): 727–40.

Gretton, Arthur, Kenji Fukumizu, Choon Teo, Le Song, Bernhard Schölkopf,and Alex Smola. 2008.“A Kernel Statistical Test ofIndependence.”Advances in Neural Information ProcessingSystems 20: 585–92.

Gretton, Arthur, Ralf Herbrich, Alexander Smola, Olivier Bousquet,Bernhard Schölkopf, and Aapo Hyvärinen. 2005.“Kernel Methods forMeasuring Independence.”Journal of Machine LearningResearch 6 (12): 2075–2129.

Heller, Ruth, Yair Heller, and Malka Gorfine. 2013.“A ConsistentMultivariate Test of Association Based on Ranks of Distances.”Biometrika 100 (2): 503–10.

Laska, Eugene M, and Morris J Meisner. 1992.“NonparametricEstimation and Testing in a Cure Model.”Biometrics 48(4): 1223–34.

López-Cheda, Ana, Ricardo Cao, M Amalia Jácome, and Ingrid Van Keilegom.2017.“Nonparametric Incidence Estimation and Bootstrap BandwidthSelection in Mixture Cure Models.”Computational Statistics& Data Analysis 105: 144–65.

López-Cheda, Ana, M Amalia Jácome, and Ricardo Cao. 2017.“Nonparametric Latency Estimation for Mixture Cure Models.”Test 26 (2): 353–76.

López-Cheda, Ana, M Amalia Jácome, and Ignacio López-de-Ullibarri. 2021.“Npcure: AnR Package for Nonparametric Inference inMixture Cure Models.”R Journal 13 (1): 21–41.

Maller, Ross, Sidney Resnick, Soudabeh Shemehsavar, and Muzhi Zhao.2024.“Mixture Cure Model Methodology in Survival Analysis: SomeRecent Results for the One-Sample Case.”StatisticSurveys 18: 82–138.

Monroy-Castillo, Blanca E., M. A. Jácome, Ricardo Cao, and Ingrid VanKeilegom. 2025.“Ovariate Hypothesis Tests for the Cure Rate inMixture Cure Models Based on Martingale Difference Correlation.”Submitted.

Morbiducci, Marta, Alessandra Nardi, and Carla Rossi. 2003.“Classification of‘Cured’ Individuals in SurvivalAnalysis: The Mixture Approach to the Diagnostic–PrognosticProblem.”Computational Statistics & Data Analysis41 (3-4): 515–29.

Müller, Ursula U, and Ingrid Van Keilegom. 2019.“Goodness-of-FitTests for the Cure Rate in a Mixture Cure Model.”Biometrika 106 (1): 211–27.

Park, Trevor, Xiaofeng Shao, and Shun Yao. 2015.“PartialMartingale Difference Correlation.”Electronic Journal ofStatistics 9 (1): 1492–1517.

Patilea, Valentin, and Ingrid Van Keilegom. 2020.“A GeneralApproach for Cure Models in Survival Analysis.”The Annals ofStatistics 48 (4): 2323–46.

Peng, Yingwei, and Binbing Yu. 2021.Cure Models: Methods,Applications, and Implementation. Chapman; Hall/CRC.

Pfister, Niklas, Peter Bühlmann, Bernhard Schölkopf, and Jonas Peters.2018.“Kernel-Based Tests for Joint Independence.”Journal of the Royal Statistical Society Series B: StatisticalMethodology 80 (1): 5–31.

Shao, Xiaofeng, and Jingsi Zhang. 2014.“Martingale DifferenceCorrelation and Its Use in High-Dimensional Variable Screening.”Journal of the American Statistical Association 109 (507):1302–18.https://doi.org/10.1080/01621459.2014.887012.

Shen, Cencheng, Sambit Panda, and Joshua T Vogelstein. 2022.“TheChi-Square Test of Distance Correlation.”Journal ofComputational and Graphical Statistics 31 (1): 254–62.

Szekely, Gabor J., Maria L. Rizzo, and Nail K. Bakirov. 2007.“Measuring and Testing Dependence by Correlation ofDistances.”The Annals of Statistics 35 (6): 2769–94.

Movatterモバイル変換

MDCcure

Blanca Monroy-Castillo

Introduction

Motivation for Covariate Testing

Partial Martingale Difference Correlation

Nonparametric hypothesis tests for the cure rate

Goodness-of-Fit Test

MDCcure

Overview of the package

Main Features

1. Dependence Measures

2. Covariate Hypothesis Testing for the Cure Rate in Mixture CureModels

3. Goodness-of-Fit Testing for the Cure Rate in Mixture CureModels

Using the package

Dependence measures

Covariate hypothesis test for the cure rate in mixture curemodels

With one covariate

Available Test Methods

Bandwidth Selection

With two covariates

Goodnnes-of-fit test for the cure rate in a mixture cure model

Example