Overview

A basic component for our modelling of multivariate survival data isthat many models are build around marginals that on Cox form. Themarginal Cox model can be fitted efficiently in the mets package, inparticular the handling of strata and robust standard errors isoptimized.

The basic models assumes that each subject has a marginal on Cox-form\[\lambda_{g(k,i)}(t) \exp( X_{ki}^T \beta).\] where\(g(k,i)\) gives thestrata for the subject.

We here discuss and show how to get robust standard errors of

the regression parameters
the baseline

and how to do goodness of fit test using

cumulative residuals score test

First we generate some data from the Clayton-Oakes model, with\(5\) members in each cluster and a varianceparameter at\(2\)

library(mets)options(warn=-1)set.seed(1000)# to control output in simulatins for p-values below. n<-1000 k<-5 theta<-2 data<-simClaytonOakes(n,k,theta,0.3,3)

The data is on has one subject per row.

time : time of event
status : 1 for event and 0 for censoring
x : x is a binary covariate
cluster : cluster

Now we fit the model and produce robust standard errors for bothregression parameters and baseline.

First, recall that the baseline for strata\(g\) is asymptotically equivalent to\[\begin{align}\hat A_g(t) - A_g(t) & = \sum_{k \in g} \left( \sum_{i: ki \in g}\int_0^t \frac{1}{S_{0,g}} dM_{ki}^g - P^g(t) \beta_k \right)\end{align}\] with\(P^g(t) = \int_0^tE_g(s) d \hat \Lambda_g(s)\) the derivative of\(\int_0^t 1/S_{0,g}(s) dN_{\cdot g}\) wrt to\(\beta\), and\[\begin{align}\hat \beta - \beta & = I(\tau)^{-1} \sum_k ( \sum_i \int_0^\tau(Z_{ki} - E_{g}) dM_{ki}^g ) = \sum_k \beta_{k}\end{align}\] with\[\begin{align}M_{ki}^g(t) & = N_{ki}(t) - \int_0^t Y_{ki}(s) \exp( Z_{ki} \beta)d \Lambda_{g(k,i)}(t), \\\beta_{k} & = I(\tau)^{-1} \sum_i \int_0^\tau (Z_{ki} - E_{g})dM_{ki}^g\end{align}\] the basic 0-mean processes, that are martingales inthe iid setting, and\(I(t)\) is thederivative of the total score,\(\hatU(t,\beta))\), with respect to\(\beta\) evaluated at time\(t\).

The variance of the baseline of strata g is estimated by\[\begin{align}\sum_{k \in g} ( \sum_{i: ki \in g} \int_0^t \frac{1}{S_{0,g(k,i)}}d\hat M_{ki}^g - P^g(t) \beta_k )^2\end{align}\] that can be computed using the particular structureof\[\begin{align}d \hat M_{ik}^g(t) & = dN_{ik}(t) - \frac{1}{S_{0,g(i,k)}}\exp(Z_{ik} \beta) dN_{g.}(t)\end{align}\]

This robust variance of the baseline and the iid decomposition for\(\beta\) is computed in mets as:

   out<-phreg(Surv(time,status)~x+cluster(cluster),data=data)summary(out)#>#>     n events#>  5000   4854#> coeffients:#>   Estimate     S.E.  dU^-1/2 P-value#> x 0.287859 0.028177 0.028897       0#>#> exp(coeffients):#>   Estimate   2.5%  97.5%#> x   1.3336 1.2619 1.4093# robust standard errors attached to output   rob<-robust.phreg(out)

We can get the iid decomposition of the\(\hat \beta - \beta\) by

# making iid decomposition of regression parameters   betaiid<-IC(out)head(betaiid)#>             x#> 1 -0.34616008#> 2 -1.44918926#> 3 -0.03898156#> 4  0.42156050#> 5  0.34253904#> 6 -0.07706668# robust standard errorscrossprod(betaiid/NROW(betaiid))^.5#>            x#> x 0.02817714# same as

We now look at the plot with robust standard errors

plot(rob,se=TRUE,robust=TRUE,col=3)

We can also make survival prediction with robust standard errorsusing the phreg.

  pp<-predict(out,data[1:20,],se=TRUE,robust=TRUE)plot(pp,se=TRUE,whichx=1:10)

Finally, just to check that we can recover the model we also estimatethe dependence parameter

tt<-twostageMLE(out,data=data)summary(tt)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#>                 Coef.         SE        z P-val Kendall tau        SE#> dependence1 0.5316753 0.03497789 15.20032     0   0.2100093 0.0109146#>#> $type#> NULL#>#> attr(,"class")#> [1] "summary.mets.twostage"

Goodness of fit

The observed score process is given by\[\begin{align}U(t,\hat \beta) & = \sum_k \sum_i \int_0^t (Z_{ki} - \hat E_g ) d\hat M_{ki}^g\end{align}\] where\(g\) isstrata\(g(k,i)\). The observed scorehas the iid decomposition\[\begin{align}\hat U(t) = \sum_k \sum_i \int_0^t (Z_{ki} - E_g) dM_{ki}^g - I(t) \sum_k \beta_k\end{align}\] where\(\beta_k\)is the iid decomposition of the score process for the true\(\beta\)\[\begin{align}\beta_k & = I(\tau)^{-1} \sum_i \int_0^\tau (Z_{ki} - E_g )d M_{ki}^g\end{align}\] and\(I(t)\) isthe derivative of the total score,\(\hatU(t,\beta))\), with respect to\(\beta\) evaluated at time\(t\).

This observed score can be resampled given it is on iid form in termsof clusters.

Now using the cumulative score process for checking proportionalhazards

gout<-gof(out)gout#> Cumulative score process test for Proportionality:#>   Sup|U(t)|  pval#> x  30.24353 0.401

The p-value reflects wheter the observed score process is consistentwith the model.

plot(gout)

Computational aspects

The score processes can be resampled as in Lin, Wei, Ying (1993)using the martingale structure, such that the observed score process isresampled by\[\begin{align} \sum_k \sum_i \int_0^t g_{ki} (Z_{ki} - E_g) dN_{ki} -I(t) I^{-1}(\tau) g_{ki} \int_0^{\tau} (Z_{ki} - E_g) dN_{ki} .\end{align}\] where\(g_{ki}\)are i.i.d. standard normals.

Based on the zero mean processes we more generally with clusters canresample the score process. For resampling of score process we need\[\begin{align}U(t,\beta) & = \sum_k \sum_i g_k \int_0^t (Z_{ki} - E_g ) dM_{ki}^g\end{align}\] where\(g\) isstrata. We write\(g_k\) as\(g_{ki}\) and thus repeating\(g_k\) within each cluster.

Computations are done using that\[\begin{align*}\int_0^t (Z_{ki} - E_{g}) dM_{ki}^g & = \int_0^t (Z_{ki} - E_{g})dN_{ki}^g - \int_0^{t} (Z_{ki} - E_{g}) Y_{ki}(u) d\Lambda^g(u) \end{align*}\] therefore and summing the compensator part withthe\(g_{ki}\) multipliers then givesfor each strata\(g\)\[\begin{align*} & \int_0^t \frac{S_{1g}^w(u)}{S_{0g}(u)} dN_{g.}(v) - \int_0^tE_{g}(u) \frac{S_{0g}^w(u)}{S_{0g}(u)} dN_{g.}(v)\end{align*}\] with
\[\begin{align*}S_{jg}^w(t) & = \sum_{ki \in g} \exp(Z_{ki} \beta) Z_{ki}^jY_{ki}(t) g_{ki} \\S_{jg}(t) & = \sum_{ki \in g} \exp(Z_{ki} \beta) Z_{ki}^jY_{ki}(t).\end{align*}\]

Cluster stratified Cox models

For clustered data it is possible to estimate the regressioncoefficient within clusters by using Cox’s partial likelihood stratifiedon clusters.

Note, here that the data is generated with a different subjectspecific structure, so we will not recover the\(\beta\) at 0.3 and the model will not be aproportional Cox model, we we would also expect to reject“proportionality” with the gof-test.

The model can be thought of as\[\lambda_{g(k,i)} (t) \exp( X_{ki}^T \beta)\] where\(\lambda_g(t)\) issome cluster specific baseline.

The regression coefficient\(\beta\)can be estimated by using the partial likelihood for clusters.

 out<-phreg(Surv(time,status)~x+strata(cluster),data=data)summary(out)#>#>     n events#>  5000   4854#> coeffients:#>   Estimate     S.E.  dU^-1/2 P-value#> x 0.406307 0.032925 0.039226       0#>#> exp(coeffients):#>   Estimate   2.5%  97.5%#> x   1.5013 1.4074 1.6013

SessionInfo

sessionInfo()#> R version 4.5.1 (2025-06-13)#> Platform: aarch64-apple-darwin25.0.0#> Running under: macOS Tahoe 26.0.1#>#> Matrix products: default#> BLAS:   /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRblas.dylib#> LAPACK: /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1#>#> locale:#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#>#> time zone: Europe/Copenhagen#> tzcode source: internal#>#> attached base packages:#> [1] stats     graphics  grDevices utils     datasets  methods   base#>#> other attached packages:#> [1] timereg_2.0.7  survival_3.8-3 mets_1.3.8#>#> loaded via a namespace (and not attached):#>  [1] cli_3.6.5           knitr_1.50          rlang_1.1.6#>  [4] xfun_0.53           jsonlite_2.0.0      future.apply_1.20.0#>  [7] listenv_0.9.1       lava_1.8.1          htmltools_0.5.8.1#> [10] sass_0.4.10         rmarkdown_2.30      grid_4.5.1#> [13] evaluate_1.0.5      jquerylib_0.1.4     fastmap_1.2.0#> [16] numDeriv_2016.8-1.1 yaml_2.3.10         mvtnorm_1.3-3#> [19] lifecycle_1.0.4     compiler_4.5.1      codetools_0.2-20#> [22] ucminf_1.2.2        Rcpp_1.1.0          future_1.67.0#> [25] lattice_0.22-7      digest_0.6.37       R6_2.6.1#> [28] parallelly_1.45.1   parallel_4.5.1      splines_4.5.1#> [31] Matrix_1.7-4        bslib_0.9.0         tools_4.5.1#> [34] globals_0.18.0      cachem_1.1.0

Movatterモバイル変換

Marginal modelling of clustered survivaldata

Klaus Holst & Thomas Scheike

2025-10-21

Overview

Goodness of fit

Computational aspects

Cluster stratified Cox models

SessionInfo