Binomial Regression for censored data

The binreg function can fit a logistic link model with IPCWadjustment for a specific time-point, and can thus be used fordescribing survival or competing risks data. The function can be usedfor large data and is completely scalable, that is, linear in the data.A nice feature is that influcence functions are computed and areavailable, and can thus be used for all other settings based on theseparameters.

In addition and to summarize

the censoring weights can be strata dependent
predictions can be computed with standard errors
computation time linear in data
- including standard errors
clusters can be given and then cluster corrected standard errors arecomputed

Details

The binreg function does direct binomial regression for onetime-point,\(t\), fitting the model\[\begin{align*}P(T \leq t, \epsilon=1 | X ) & = \mbox{expit}( X^T \beta) =F_1(t,X,\beta)\end{align*}\] to an IPCW adjusted estmating equation (EE) withresponse\(Y(t)=I(T \leq t, \epsilon=1)\)\[\begin{align*}U(\beta,\hat G_c) = & X ( Y(t) \frac{ \Delta(t) }{\hat G_c(T_i\wedge t)} - \mbox{expit}( X^T \beta)) = 0,\end{align*}\] with\(G_c(t)=P(C>t)\), the censoring survivaldistribution, and with\(\Delta(t) = I( C_i> T_i \wedge t)\) the indicator of being uncensored at time\(t\) (type=“I”). The default type=“II”is to augment with a censoring term, that is solve\[\begin{align*} & U(\beta,\hat G_c) + \int_0^t X \frac{\hat E(Y(t)| T>u)}{\hatG_c(u)} d\hat M_c(u) =0\end{align*}\] where\(M_c(u)\)is the censoring martingale, this typically improves the performance.This is equivlent to the pseudo-value approach (see Overgaard(2025)).

The influence function for the type=“II” estimator is\[\begin{align*} U(\beta,G_c) + \int_0^t X \frac{E(Y| T>u)}{G_c(u)} d M_c(u) - \int_0^t \frac{E(X| T>u) E(Y| T>u)}{G_c(u)} d M_c(u) -\int_0^t \frac{E( X Y| T>u)}{G_c(u)} d M_c(u) \end{align*}\] and for type=“I”\[\begin{align*} & U(\beta) + \int_0^t \frac{E( X Y| T>u)}{G_c(u)} d M_c(u).\end{align*}\] The means\(E(X Y(t) |T>u)\) and\(E(Y(t)|T>u)\) are estimated by IPCW estimators among survivors to getestimates of the influence functions.

The function logitIPCW instead considers\[\begin{align*}U^{glm}(\beta,\hat G_c) = & \frac{ \Delta(t) }{\hat G_c(T_i \wedget)} X ( Y(t) - \mbox{expit}( X^T \beta)) = 0.\end{align*}\] This score equation is quite similar to those ofthe binreg, and exactly the same when the censoring model isfully-nonparametric.

The logitIPCW has influence function\[\begin{align*} & U^{glm}(\beta,G_c) + \int_0^t \frac{E( X ( Y - F_1(t,\beta)) |T>u)}{G_c(u)} d M_c(u) \end{align*}\]

Which estimator performs the best depends on the censoringdistribution and it seems that the binreg with type=“II” performsoverall quite nicely (see Blanche et al (2023) and Overgaard (2024)).For the full estimated censoring model all estimators have the sameinfluence function (see Blanche et al (2023)).

Additional functions logitATE, and binregATE computes the averagetreatment effect. We demonstrate this in another vignette.

The functions logitATE/binregATE can be used there is no censoringand we thus have simple binary outcome.

The variance is based on sandwich formula with IPCW adjustment (usingthe influence functions), and naive.var is the variance under knowncensoring model. The influence functions are stored in the output.Clusters can be specified to get cluster corrected standard errors.

Examples

library(mets)options(warn=-1)set.seed(1000)# to control output in random noise just below.data(bmt) bmt$time<- bmt$time+runif(nrow(bmt))*0.01# logistic regresion with IPCW binomial regression out<-binreg(Event(time,cause)~tcell+platelet,bmt,time=50)summary(out)#>    n events#>  408    160#>#>  408 clusters#> coeffients:#>              Estimate   Std.Err      2.5%     97.5% P-value#> (Intercept) -0.180338  0.126748 -0.428760  0.068084  0.1548#> tcell       -0.418545  0.345480 -1.095675  0.258584  0.2257#> platelet    -0.437644  0.240978 -0.909952  0.034665  0.0694#>#> exp(coeffients):#>             Estimate    2.5%  97.5%#> (Intercept)  0.83499 0.65132 1.0705#> tcell        0.65800 0.33431 1.2951#> platelet     0.64556 0.40254 1.0353

We can also compute predictions using the estimates

predict(out,data.frame(tcell=c(0,1),platelet=c(1,1)),se=TRUE)#>        pred         se     lower     upper#> 1 0.3502406 0.04847582 0.2552280 0.4452533#> 2 0.2618207 0.06969334 0.1252217 0.3984196

Further the censoring model can depend on strata

 outs<-binreg(Event(time,cause)~tcell+platelet,bmt,time=50,cens.model=~strata(tcell,platelet))summary(outs)#>    n events#>  408    160#>#>  408 clusters#> coeffients:#>              Estimate   Std.Err      2.5%     97.5% P-value#> (Intercept) -0.180697  0.127414 -0.430424  0.069030  0.1561#> tcell       -0.365928  0.350632 -1.053154  0.321299  0.2967#> platelet    -0.433494  0.240270 -0.904415  0.037428  0.0712#>#> exp(coeffients):#>             Estimate    2.5%  97.5%#> (Intercept)  0.83469 0.65023 1.0715#> tcell        0.69355 0.34884 1.3789#> platelet     0.64824 0.40478 1.0381

Absolute risk differences and ratio

Now for illustrations I wish to consider the absolute risk differencedepending on tcell

 outs<-binreg(Event(time,cause)~tcell,bmt,time=50,cens.model=~strata(tcell))summary(outs)#>    n events#>  408    160#>#>  408 clusters#> coeffients:#>             Estimate  Std.Err     2.5%    97.5% P-value#> (Intercept) -0.30054  0.11153 -0.51914 -0.08194  0.0070#> tcell       -0.51741  0.33981 -1.18342  0.14860  0.1278#>#> exp(coeffients):#>             Estimate    2.5%  97.5%#> (Intercept)  0.74042 0.59503 0.9213#> tcell        0.59606 0.30623 1.1602

the risk difference is

ps<-predict(outs,data.frame(tcell=c(0,1)),se=TRUE)ps#>        pred         se     lower     upper#> 1 0.4254253 0.02726306 0.3719897 0.4788609#> 2 0.3061988 0.06819019 0.1725461 0.4398516sum(c(1,-1)* ps[,1])#> [1] 0.1192264

Getting the standard errors are easy enough since the two-groups areindependent. In the case where we in addition had adjusted for othercovariates, however, we would need the to apply the delta-theorem thususing the relevant covariances along the lines of

dd<-data.frame(tcell=c(0,1))p<-predict(outs,dd)riskdifratio<-function(p,contrast=c(1,-1)) {   outs$coef<- p   p<-predict(outs,dd)[,1]   pd<-sum(contrast*p)   r1<- p[1]/p[2]   r2<- p[2]/p[1]return(c(pd,r1,r2))}estimate(outs,f=riskdifratio,dd,null=c(0,1,1))#>      Estimate Std.Err     2.5%  97.5% P-value#> [p1]   0.1192 0.07344 -0.02471 0.2632 0.10448#> [p2]   1.3894 0.32197  0.75833 2.0204 0.22652#> [p3]   0.7197 0.16679  0.39284 1.0467 0.09291#>#>  Null Hypothesis:#>   [p1] = 0#>   [p2] = 1#>   [p3] = 1#>#> chisq = 12.0249, df = 3, p-value = 0.007298

same as

run<-0if (run==1) {library(prodlim)pl<-prodlim(Hist(time,cause)~tcell,bmt)spl<-summary(pl,times=50,asMatrix=TRUE)spl}

Augmenting the Binomial Regression

Rather than using a larger censoring model we can also compute anaugmentation term and then fit the binomial regression model based onthis augmentation term. Here we compute the augmentation based onstratified non-parametric estimates of\(F_1(t,S(X))\), where\(S(X)\) gives strata based on\(X\) as a working model.

Computes the augmentation term for each individual as well as the sum\[\begin{align*}A & = \int_0^t H(u,X) \frac{1}{S^*(u,s)} \frac{1}{G_c(u)} dM_c(u)\end{align*}\] with\[\begin{align*}H(u,X) & = F_1^*(t,S(X)) - F_1^*(u,S(X))\end{align*}\] using a KM for\(G_c(t)\) and a working model for cumulativebaseline related to\(F_1^*(t,s)\) and\(s\) is strata,\(S^*(t,s) = 1 - F_1^*(t,s) -F_2^*(t,s)\).

Standard errors computed under assumption of correct but estimated\(G_c(s)\) model.

data(bmt)dcut(bmt,breaks=2)<-~age out1<-BinAugmentCifstrata(Event(time,cause)~platelet+agecat.2+strata(platelet,agecat.2),data=bmt,cause=1,time=40)summary(out1)#>    n events#>  408    157#>#>  408 clusters#> coeffients:#>                      Estimate  Std.Err     2.5%    97.5% P-value#> (Intercept)          -0.51295  0.17090 -0.84791 -0.17799  0.0027#> platelet             -0.63011  0.23585 -1.09237 -0.16785  0.0075#> agecat.2(0.203,1.94]  0.55926  0.21211  0.14353  0.97500  0.0084#>#> exp(coeffients):#>                      Estimate    2.5%  97.5%#> (Intercept)           0.59873 0.42831 0.8370#> platelet              0.53253 0.33542 0.8455#> agecat.2(0.203,1.94]  1.74938 1.15434 2.6512 out2<-BinAugmentCifstrata(Event(time,cause)~platelet+agecat.2+strata(platelet,agecat.2)+strataC(platelet),data=bmt,cause=1,time=40)summary(out2)#>    n events#>  408    157#>#>  408 clusters#> coeffients:#>                      Estimate  Std.Err     2.5%    97.5% P-value#> (Intercept)          -0.51346  0.17109 -0.84879 -0.17814  0.0027#> platelet             -0.63636  0.23653 -1.09996 -0.17276  0.0071#> agecat.2(0.203,1.94]  0.56280  0.21229  0.14672  0.97889  0.0080#>#> exp(coeffients):#>                      Estimate    2.5%  97.5%#> (Intercept)           0.59842 0.42793 0.8368#> platelet              0.52922 0.33288 0.8413#> agecat.2(0.203,1.94]  1.75559 1.15803 2.6615

SessionInfo

sessionInfo()#> R version 4.5.1 (2025-06-13)#> Platform: aarch64-apple-darwin25.0.0#> Running under: macOS Tahoe 26.0.1#>#> Matrix products: default#> BLAS:   /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRblas.dylib#> LAPACK: /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1#>#> locale:#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#>#> time zone: Europe/Copenhagen#> tzcode source: internal#>#> attached base packages:#> [1] stats     graphics  grDevices utils     datasets  methods   base#>#> other attached packages:#> [1] timereg_2.0.7  survival_3.8-3 mets_1.3.8#>#> loaded via a namespace (and not attached):#>  [1] cli_3.6.5           knitr_1.50          rlang_1.1.6#>  [4] xfun_0.53           jsonlite_2.0.0      future.apply_1.20.0#>  [7] listenv_0.9.1       lava_1.8.1          htmltools_0.5.8.1#> [10] sass_0.4.10         rmarkdown_2.30      grid_4.5.1#> [13] evaluate_1.0.5      jquerylib_0.1.4     fastmap_1.2.0#> [16] numDeriv_2016.8-1.1 yaml_2.3.10         mvtnorm_1.3-3#> [19] lifecycle_1.0.4     compiler_4.5.1      codetools_0.2-20#> [22] ucminf_1.2.2        Rcpp_1.1.0          future_1.67.0#> [25] lattice_0.22-7      digest_0.6.37       R6_2.6.1#> [28] parallelly_1.45.1   parallel_4.5.1      splines_4.5.1#> [31] Matrix_1.7-4        bslib_0.9.0         tools_4.5.1#> [34] globals_0.18.0      cachem_1.1.0

Movatterモバイル変換

Binomial Regression for Survival andCompeting Risks Data