Background

When working withsurvival data it is important toremember that the effects may change in time. This is rarely somethingyou notice in datasets of a few hundreds but as the datasets grow largerthe chances of this happening increase. When doing a simple Coxregression with the survival package this can be easily checked forusing thecox.zph function.

The problem with non-proportional hazards

The main effect of non-proportional hazards is that you will have amean estimate over the study time. This mean does not distribute evenlythroughout the studied time period but evenly according to the observedtime, i.e. if you have the longest observation period being 20 yearswhile > 50% of your patients have a follow-up of less than 10 yearsthe effect. Note that this may further depend on how the events aredistributed. Thus it is useful to be able to deal with non-proportionalhazards.

Addressing non-proportional hazards

Using`strata`

The Cox model allows you to estimate effects in different strata andthen average them together. If your confounder that breaks theproportionality assumption is a non-continuous variable this is aconvenient method that allows you to set-up the necessary strata. Withone variable this is simple,Surv(time, event) ~ x_1 + x_2 + strata(x_np), but withmultiple variables you risk of ending up with small strata and thenon-informative error:

attempt to select less than one element

Note that you should not multiple strata but combine the variables,e.g. if you have two variables calledx_np1 andx_np2 you would set up your model with the strata asstrata(x_np1, x_np2). The strata is then handled asinteractions generatingnlevels(x_np1) * nlevels(x_np2)which seems also to be the core reason for why this fails.

Using the`tt` argument

The survival package has an excellent vignette om time-dependentvariables and time-dependent coefficients, seevignette("timedep", package = "survival"). Prof. Therneauexplains there common pitfalls and how to use a time-transforming optionprovided by thecoxph function through thettargument. It is a neat an simple solution that transforms thosevariables that you have indicated for transformation using thett function. The vignette provides some simple approachesbut it also allows for a rather sophisticated use:

library(survival)coxph(Surv(time, event)~ age+ sex+        type_of_surgery+tt(type_of_surgery)+tt(surgery_length),data = my_surgical_data,tt =list(function(type_of_surgery, time, ...){# As type_of_surgery is a categorical variable# we must transform it into a design matrix that# we can use for multiplication with time# Note: the [,-1] is for dropping the intercept          mtrx<-model.matrix(~type_of_surgery)[,-1]          mtrx* time        },function(surgery_length, time, ...){# Note that the surgery_length and the time# should probably have similar ranges. If you# report operating time in minutes, follow-up# in years the t will be dwarfed by thepspline(surgery_length+ time)        }      ))

The main problem is that it doesn’t scale all that well to largerdatasets. A common error unless you have a large amount of memoryis:

Could not allocate vector of *** MB

Using the timeSplitter

Thett approach is based upon the idea of timesplitting. Time splitting is possible since the Cox proportional hazardsmodel studies the derivative of the survival function, i.e. the hazard,and thus doesn’t care how many observations were present before thecurrent derivative. This allows including patients/observations after 2years by ignoring them prior to 2 years. The method is referred to asinterval time where theSurv(time, event) simplychanges intoSurv(Start_time, Stop_time, event).

This allows us to adjust for time interactions as theStart_time is independent of the event. In our standardsetting theStart_time is always 0 but if we split anobservation into multiple time steps and use theinterval timethe variable will become meaningful in an interaction setting. Note that[!ref] suggested that one uses theEnd_time after splittingthe observation time, while I’ve found that it in practice doesn’tmatter that much - it makes intuitive sense to use theStart_time if we make the time interval too large theEnd_time will convey information about the event status andthereby by nature become significant. The approach explained in moredetail below.

Poor-mans time split

An alternative is to split the data into a few intervals, select oneinterval at the time and perform separate models on each. This willresult in multiple estimates per variable, poorer statistical power andmost likely an incomplete handling of the non-proportional hazards.Despite these downsides it may still be a viable solution whenpresenting to a less statistically savvy audience in order to gainacceptance for above methods. ThetimeSplitter can helpgenerating the datasets necessary, here’s a convenient example usingdplyr:

library(plyr)models<-timeSplitter(your_data,by =4,event_var ="status",time_var ="years",event_start_status ="alive",time_related_vars =c("age","date"))|>dlply("Start_time",function(data){coxph(Surv(Start_time, End_time, status)~ age+ sex+ treatment,data = data)        })

Other options

These approaches are probably just a subset of possible approaches. Iknow of thetimeregpackage that has some very fancy time coefficient handling. My personalexperience with the package is limited and I’ve been discouraged by thevisually less appealing graphs provided in the examples and there isn’ta proper vignette explaining the details (Last checked v 1.8.9).Similarly theflexsurv shouldbe able to deal with the proportional hazards assumption. If you knowany other then please send me an e-mail.

id	event	age	date	Start_time	Stop_time
1	alive	62.2	2002.999	0	2.0
1	censored	64.2	2004.999	2	4.0
2	alive	55.3	2010.246	0	2.0
2	dead	57.3	2012.246	2	3.5
3	alive	73.7	2013.718	0	1.0
4	alive	46.3	2002.145	0	2.0
4	alive	48.3	2004.145	2	4.0
4	dead	50.3	2006.145	4	5.0

A real example

I haven’t found any good datasets with non-proportional hazards butthe melanoma dataset is largish and allows some exploration.

# First we start with loading the datasetdata("melanoma",package ="boot")# Then we munge it according to ?boot::melanomamelanoma<-mutate(melanoma,status =factor(status,levels =1:3,labels =c("Died from melanoma","Alive","Died from other causes")),ulcer =factor(ulcer,levels =0:1,labels =c("Absent","Present")),time = time/365.25,# All variables should be in the same time unitsex =factor(sex,levels =0:1,labels =c("Female","Male")))

Now we can fit a regular cox regression:

library(survival)regular_model<-coxph(Surv(time, status=="Died from melanoma")~                         age+ sex+ year+ thickness+ ulcer,data = melanoma,x =TRUE,y =TRUE)summary(regular_model)

## Call:## coxph(formula = Surv(time, status == "Died from melanoma") ~ ##     age + sex + year + thickness + ulcer, data = melanoma, x = TRUE, ##     y = TRUE)## ##   n= 205, number of events= 57 ## ##                   coef exp(coef)  se(coef)      z Pr(>|z|)    ## age           0.016805  1.016947  0.008578  1.959 0.050094 .  ## sexMale       0.448121  1.565368  0.266861  1.679 0.093107 .  ## year         -0.102566  0.902518  0.061007 -1.681 0.092719 .  ## thickness     0.100312  1.105516  0.038212  2.625 0.008660 ** ## ulcerPresent  1.194555  3.302087  0.309254  3.863 0.000112 ***## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ##              exp(coef) exp(-coef) lower .95 upper .95## age             1.0169     0.9833    1.0000     1.034## sexMale         1.5654     0.6388    0.9278     2.641## year            0.9025     1.1080    0.8008     1.017## thickness       1.1055     0.9046    1.0257     1.191## ulcerPresent    3.3021     0.3028    1.8012     6.054## ## Concordance= 0.757  (se = 0.031 )## Likelihood ratio test= 44.4  on 5 df,   p=2e-08## Wald test            = 40.89  on 5 df,   p=1e-07## Score (logrank) test = 48.14  on 5 df,   p=3e-09

If we do the same with a split dataset:

spl_melanoma<-  melanoma|>timeSplitter(by = .5,event_var ="status",event_start_status ="Alive",time_var ="time",time_related_vars =c("age","year"))interval_model<-update(regular_model,Surv(Start_time, Stop_time, status=="Died from melanoma")~ .,data = spl_melanoma)summary(interval_model)

## Call:## coxph(formula = Surv(Start_time, Stop_time, status == "Died from melanoma") ~ ##     age + sex + year + thickness + ulcer, data = spl_melanoma, ##     x = TRUE, y = TRUE)## ##   n= 2522, number of events= 57 ## ##                   coef exp(coef)  se(coef)      z Pr(>|z|)    ## age           0.016805  1.016947  0.008578  1.959 0.050094 .  ## sexMale       0.448121  1.565368  0.266861  1.679 0.093107 .  ## year         -0.102566  0.902518  0.061007 -1.681 0.092719 .  ## thickness     0.100312  1.105516  0.038212  2.625 0.008660 ** ## ulcerPresent  1.194555  3.302087  0.309254  3.863 0.000112 ***## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ##              exp(coef) exp(-coef) lower .95 upper .95## age             1.0169     0.9833    1.0000     1.034## sexMale         1.5654     0.6388    0.9278     2.641## year            0.9025     1.1080    0.8008     1.017## thickness       1.1055     0.9046    1.0257     1.191## ulcerPresent    3.3021     0.3028    1.8012     6.054## ## Concordance= 0.757  (se = 0.03 )## Likelihood ratio test= 44.4  on 5 df,   p=2e-08## Wald test            = 40.89  on 5 df,   p=1e-07## Score (logrank) test = 48.14  on 5 df,   p=3e-09

As you can see the difference between the models is negligible:

library(htmlTable)cbind(Regular =coef(regular_model),Interval =coef(interval_model),Difference =coef(regular_model)-coef(interval_model))|>txtRound(digits =5)|>  knitr::kable(align ="r")

	Regular	Interval
age	0.01681	0.01681
sexMale	0.44812	0.44812
year	-0.10257	-0.10257
thickness	0.10031	0.10031
ulcerPresent	1.19455	1.19455

Now we can look for time varying coefficients using thesurvival::cox.zph() function:

cox.zph(regular_model)|>  purrr::pluck("table")|>txtRound(digits =2)|>  knitr::kable(align ="r")

	chisq	df	p
age	2.51	1.00	0.11
sex	1.50	1.00	0.22
year	1.26	1.00	0.26
thickness	4.40	1.00	0.04
ulcer	3.26	1.00	0.07
GLOBAL	9.97	5.00	0.08

The two variable that give a hint of time variation are age andthickness. It seems reasonable that melanoma thickness is less importantas time increases, either the tumor was adequately removed or there wassome remaining piece that caused the patient to die within a few years.We will therefore add a time interaction using the:variant (note using the* for interactionsgives a separate variable for the time and that is not of interest inthis case):

time_int_model<-update(interval_model,         .~.+thickness:Start_time)summary(time_int_model)

## Call:## coxph(formula = Surv(Start_time, Stop_time, status == "Died from melanoma") ~ ##     age + sex + year + thickness + ulcer + thickness:Start_time, ##     data = spl_melanoma, x = TRUE, y = TRUE)## ##   n= 2522, number of events= 57 ## ##                           coef exp(coef)  se(coef)      z Pr(>|z|)    ## age                   0.014126  1.014226  0.008591  1.644 0.100115    ## sexMale               0.511881  1.668427  0.268960  1.903 0.057016 .  ## year                 -0.098459  0.906233  0.061241 -1.608 0.107896    ## thickness             0.209025  1.232476  0.061820  3.381 0.000722 ***## ulcerPresent          1.286214  3.619060  0.313838  4.098 4.16e-05 ***## thickness:Start_time -0.045630  0.955395  0.022909 -1.992 0.046388 *  ## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ##                      exp(coef) exp(-coef) lower .95 upper .95## age                     1.0142     0.9860    0.9973    1.0314## sexMale                 1.6684     0.5994    0.9848    2.8265## year                    0.9062     1.1035    0.8037    1.0218## thickness               1.2325     0.8114    1.0918    1.3912## ulcerPresent            3.6191     0.2763    1.9564    6.6948## thickness:Start_time    0.9554     1.0467    0.9134    0.9993## ## Concordance= 0.762  (se = 0.03 )## Likelihood ratio test= 48.96  on 6 df,   p=8e-09## Wald test            = 45.28  on 6 df,   p=4e-08## Score (logrank) test = 56.29  on 6 df,   p=3e-10

As suspected the thickness effect is reduced with time. A linearmodel is hard to explain from a biological standpoint, we may want tosee if we can detect if the interaction follows a non-linear trajectoryby adding a quadratic variable:

# First we need to manually add an interaction termspl_melanoma<-mutate(spl_melanoma,thickness_start = thickness* Start_time)anova(time_int_model,update(time_int_model, .~.+I(thickness_start^2)))

## Analysis of Deviance Table##  Cox model: response is  Surv(Start_time, Stop_time, status == "Died from melanoma")##  Model 1: ~ age + sex + year + thickness + ulcer + thickness:Start_time##  Model 2: ~ age + sex + year + thickness + ulcer + I(thickness_start^2) + thickness:Start_time##    loglik  Chisq Df Pr(>|Chi|)## 1 -258.72                     ## 2 -258.51 0.4178  1      0.518

As you can see this doesn’t support that the variable is non-linear.An alternative would be to use thesurvival::psplinemethod:

update(time_int_model, .~.-thickness:Start_time+pspline(thickness_start))

## Call:## coxph(formula = Surv(Start_time, Stop_time, status == "Died from melanoma") ~ ##     age + sex + year + thickness + ulcer + pspline(thickness_start), ##     data = spl_melanoma, x = TRUE, y = TRUE)## ##                               coef se(coef)      se2    Chisq   DF       p## age                        0.01291  0.00857  0.00857  2.26993 1.00 0.13191## sexMale                    0.49378  0.27138  0.27125  3.31055 1.00 0.06884## year                      -0.09235  0.06070  0.06068  2.31457 1.00 0.12817## thickness                  0.15334  0.07679  0.07525  3.98710 1.00 0.04585## ulcerPresent               1.10814  0.32615  0.32381 11.54370 1.00 0.00068## pspline(thickness_start), -0.04236  0.02321  0.02314  3.32997 1.00 0.06803## pspline(thickness_start),                             4.19818 2.99 0.23948## ## Iterations: 4 outer, 14 Newton-Raphson##      Theta= 0.107 ## Degrees of freedom for terms= 1 1 1 1 1 4 ## Likelihood ratio test=54  on 8.93 df, p=2e-08## n= 2522, number of events= 57

If you are only investigating confounders that you want to adjust forwe are done. If you actually want to convey the results to your readersthen we need to think about how to display the interaction, especiallyif they turn out to follow a non-linear pattern. If you have twocontinuous variables I you have basically two options, go with a3-dimensional graph that where confidence interval are hard toillustrate or categorize one of the variables and use a regular2-dimensional graph. I usually go for the latter:

# Lets create an evenly distributed categorical thickness variable# and interactionsspl_melanoma<-mutate(spl_melanoma,thickness_cat =cut(thickness,breaks =c(0,1,5,Inf),labels =c("less than 1.0","1.0 to 4.9","at least 5.0")))# Now create interaction variablesfor (linlevels(spl_melanoma$thickness_cat)[-1]) {  spl_melanoma[[sprintf("thickness_%s_time",gsub(" ","_", l))]]<-    (spl_melanoma$thickness_cat== l)*spl_melanoma$Start_time}# Now for the model specification where we use a# pspline for the two interaction variablesadv_int_model<-coxph(Surv(Start_time, Stop_time, status=="Died from melanoma")~          age+ sex+ year+ ulcer+          thickness_cat+pspline(thickness_1.0_to_4.9_time)+pspline(thickness_at_least_5.0_time),data = spl_melanoma,x =TRUE,y =TRUE,iter.max =1000)# To get the estimates we use the predict functionnew_data<-data.frame(thickness_cat =rep(levels(spl_melanoma$thickness_cat)[-1],each =100),Start_time =2^seq(-3,3,length.out =100),stringsAsFactors =FALSE)|>mutate(thickness_1.0_to_4.9_time = (thickness_cat==levels(spl_melanoma$thickness_cat)[2])*           Start_time,thickness_at_least_5.0_time = (thickness_cat==levels(spl_melanoma$thickness_cat)[3])*           Start_time)new_data$sex="Female"new_data$age=median(melanoma$age)new_data$year=median(melanoma$year)new_data$ulcer="Absent"adv_pred<-predict(adv_int_model,newdata = new_data,type ="terms",terms =c("thickness_cat","pspline(thickness_1.0_to_4.9_time)","pspline(thickness_at_least_5.0_time)"),se.fit =TRUE)new_data$fit<-rowSums(adv_pred$fit)new_data$se.fit<-apply(adv_pred$se.fit,1,function(x) x^2)|>colSums()|>sqrt()new_data<-mutate(new_data,risk =exp(fit),upper =exp(fit+1.96*se.fit),lower =exp(fit-1.96*se.fit))

library(ggplot2)new_data|>mutate(adapted_risk =sapply(risk,function(x)min(max(2^-4, x),2^5)),adapted_upper =sapply(upper,function(x)min(max(2^-4, x),2^5)),adapted_lower =sapply(lower,function(x)min(max(2^-4, x),2^5)))|>ggplot(aes(y = adapted_risk,x = new_data$Start_time,group = thickness_cat,col = thickness_cat,fill = thickness_cat))+# The confidence intervals are too wide to display in this case# geom_ribbon(aes(ymax = adapted_upper, ymin = adapted_lower), fill = "red") +geom_line()+scale_x_log10(breaks =2^(-3:4),limits =c(2^-3,8),expand =c(0,0))+scale_y_log10(breaks =2^(-4:5),labels =txtRound(2^(-4:5),2),expand =c(0,0))+scale_color_brewer(type ="qual",guide =guide_legend(title ="Thickness (mm)"))+ylab("Hazard ratio")+xlab("Time (years)")+theme_bw()

Movatterモバイル変換

How to use the timeSplitter

Max Gordon

2025-08-19

Background

The problem with non-proportional hazards

Addressing non-proportional hazards

Using`strata`

Using the`tt` argument

Using the timeSplitter

Poor-mans time split

Other options

How the timeSplitter works

A real example

A few tips and notes

Drop unnecessary variables

Choose adequate interval length

Using the time-splitted model

Warning: don’t use the`I()` option

Movatterモバイル変換

How to use the timeSplitter

Max Gordon

2025-08-19

Background

The problem with non-proportional hazards

Addressing non-proportional hazards

Usingstrata

Using thett argument

Using the timeSplitter

Poor-mans time split

Other options

How the timeSplitter works

A real example

A few tips and notes

Drop unnecessary variables

Choose adequate interval length

Using the time-splitted model

Warning: don’t use theI() option

Using`strata`

Using the`tt` argument

Warning: don’t use the`I()` option