This article/vignette provides a summary of functions in thegsDesign package supporting design andevaluation of trial designs for time-to-event outcomes. We do not focuson detailed output options, but what numbers summarizing the design arebased on. If you are not looking for this level of detail and just wantto see how to design a fixed or group sequential design for atime-to-event endpoint, see the vignetteBasic time-to-event groupsequential design using gsSurv.
The following functions support use of the very straightforwardSchoenfeld (1981) approximation for 2-armtrials:
nEvents(): number of events to achieve power or powergiven number of events with no interim analysis.zn2hr(): approximate the observed hazard ratio (HR)required to achieve a targeted Z-value for a given number ofevents.hrn2z(): approximate Z-value corresponding to aspecified HR and event count.hrz2n(): approximate event count corresponding to aspecified HR and Z-value.The above functions do not directly support sample size calculations.This is done with theLachin and Foulkes(1986) method. Functions include:
nSurv(): More flexible enrollment scenarios; singleanalysis.gsSurv(): Group sequential design extension ofnSurv().nSurvival(): Sample size restricted to singleenrollment rate, single analysis; this has been effectively replaced andgeneralized bynSurv() andgsSurv().Output for survival design information is supported in variousformats:
gsBoundSummary(): Tabular summary of a design in a dataframe.plot.gsDesign(): Various plot summaries of adesign.gsHR(): Approximate HR required to cross a bound.We will assume a hazard ratio\(\nu <1\) represents a benefit of experimental treatment over control.We let\(\delta = \log\nu\) denote theso-callednatural parameter for this case. Asymptotically thedistribution of the Cox model estimate\(\hat{\delta}\) under the proportionalhazards assumption is (Schoenfeld (1981))\[\hat\delta\sim\text{Normal}(\delta=\log\nu, (1+r)^2/nr).\] The inverse of thevariance is the statistical information:\[\mathcal I = nr/(1 + r)^2.\] Using a Coxmodel to estimate\(\delta\), the Waldtest for\(\text{H}_0: \delta=0\) canbe approximated with the asymptotic variance from above as:
\[Z_W\approx \frac{\sqrt{nr}}{1+r}\hat\delta=\frac{\ln(\hat\nu)\sqrt{nr}}{1+r}.\]
Also, we know that the Wald test\(Z_W\) and a standard normal version of thelogrank\(Z\) are both asymptoticallyefficient and therefore asymptotically equivalent, at least under alocal hypothesis framework. We denote thestandardized effectsize as
\[\theta = \delta\sqrt r / (1+r)=\log(\nu)\sqrt r / (1+r).\] Letting\(\hat\theta = -\sqrt r/(1+r)\hat\delta\) and\(n\) representing the number of eventsobserved, we have\[\hat \theta \sim\text{Normal}(\theta, 1/ n).\] Thus, the standardized Z versionof the logrank is approximately distributed as
\[Z\sim\text{Normal}(\sqrtn\theta,1).\] Treatment effect favoring experimental treatmentcompared to control in this notation corresponds to a hazard ratio\(\nu < 1\), as well as negative values ofthe standardized effect\(\theta\),natural parameter\(\delta\) andstandardized Z-test.
nEvents()Based on the above, the power for the logrank test when\(n\) events have been observed isapproximated by
\[P[Z\le z]=\Phi(z -\sqrt n\theta)=\Phi(z-\sqrt{nr}/(1+r)\log\nu).\] Thus, assuming\(n=100\) events and\(\delta = \log\nu=-\log(.7)\), and\(r=1\) (equal randomization) we approximatepower for the logrank test when\(\alpha=0.025\) as
n<-100hr<- .7delta<-log(hr)alpha<- .025r<-1pnorm(qnorm(alpha)-sqrt(n* r)/ (1+ r)* delta)#> [1] 0.4299155We can compute this withgsDesign::nEvents() as:
We solve for the number of events\(n\) to see how many events are required toobtain a desired power
\[1-\beta=P(Z\ge\Phi^{-1}(1-\alpha))\] with
\[n = \left(\frac{\Phi^{-1}(1-\alpha)+\Phi^{-1}(1-\beta)}{\theta}\right)^2=\frac{(1+r)^2}{r(\log\nu)^2}\left({\Phi^{-1}(1-\alpha)+\Phi^{-1}(1-\beta)}\right)^2.\] Thus, the approximatenumber of events required to power for HR=0.7 with\(\alpha=0.025\) one-sided and power\(1-\beta=0.9\) is
which, rounding up, matches (with tabular output):
| hr | n | alpha | sided | beta | Power | delta | ratio | hr0 | se |
|---|---|---|---|---|---|---|---|---|---|
| 0.7 | 331 | 0.025 | 1 | 0.1 | 0.9 | 0.1783375 | 1 | 1 | 0.1099299 |
The notationdelta in the above table changes the signfor the standardized treatment effect\(\theta\) in the above:
These in the table is the estimated standard error forthe log hazard ratio\(\delta=\log\hat\nu\)
We can create a group sequential design for the above problem eitherwith\(\theta\) or with the fixeddesign sample size. The parameterdelta ingsDesign() corresponds to standardized effect size withsign changed\(-\theta\) in notationused above and byJennison and Turnbull(2000), while the natural parameter,\(\log(\text{HR})\) is in the parameterdelta1 passed togsDesign(). The name of theeffect size is specified indeltaname and the parameterlogdelta = TRUE indicates thatdelta inputneeds to be exponentiated to obtain HR in the output below. This examplecode can be useful in practice. We begin by passing the number of eventsfor a fixed design in the parametern.fix (continuous, notrounded) to adapt to a group sequential design. By rounding to integerevent counts with thetoInteger() function we increase thepower slightly over the targeted 90%.
Schoenfeld<-gsDesign(k =2,n.fix =nEvents(hr = hr,alpha = alpha,beta = beta,r =1),delta1 =log(hr))%>%toInteger()#> toInteger: rounding done to nearest integer since ratio was not specified as postive integer .Schoenfeld%>%gsBoundSummary(deltaname ="HR",logdelta =TRUE,Nname ="Events")%>%kable(row.names =FALSE)| Analysis | Value | Efficacy | Futility |
|---|---|---|---|
| IA 1: 50% | Z | 2.7522 | 0.4084 |
| Events: 172 | p (1-sided) | 0.0030 | 0.3415 |
| ~HR at bound | 0.6572 | 0.9396 | |
| P(Cross) if HR=1 | 0.0030 | 0.6585 | |
| P(Cross) if HR=0.7 | 0.3397 | 0.0268 | |
| Final | Z | 1.9810 | 1.9810 |
| Events: 345 | p (1-sided) | 0.0238 | 0.0238 |
| ~HR at bound | 0.8079 | 0.8079 | |
| P(Cross) if HR=1 | 0.0239 | 0.9761 | |
| P(Cross) if HR=0.7 | 0.9004 | 0.0996 |
Exactly the same result can be obtained with the following, passingthe standardized effect sizetheta from above to theparameterdelta ingsDesign().
We noted above that the asymptotic variance for\(\hat\theta\) is\(1/n\) which corresponds to statisticalinformation\(\mathcal I=n\) for theparameter\(\theta\). Thus, thevalue
corresponds both to the number of events and the statisticalinformation for the standardized effect size\(\theta\) required to power the trial at thedesired level. Note that if you plug in the natural parameter\(\delta= -\log\nu > 0\), then\(n.I\) returns the statistical informationfor the log hazard ratio.
The reader may wish to look above to derive the exact relationshipbetween events and statistical information for\(\delta\).
Another application of theSchoenfeld(1981) method is to approximate boundary characteristics of adesign. We will considerzn2hr(),gsHR() andgsBoundSummary() to approximate the treatment effectrequired to cross design bounds.zn2hr() is complemented bythe functionshrn2z() andhrz2n(). We beginwith the basic approximation used across all of these functions in thissection and follow with a sub-section with example code to reproducesome of what is in the table above.
We return to the following equation from above:
\[Z\approx Z_W\approx \frac{\sqrt{nr}}{1+r}\hat\delta=\frac{\ln(\hat\nu)\sqrt{nr}}{1+r}.\] Byfixing\(Z=z, n\) we can solve for\(\hat\nu\) from the above:
\[\hat{\nu} =\exp(z(1+r)/\sqrt{rn}).\] By fixing\(\hat\nu\) and\(z\), we can solve for the correspondingnumber of events required:\[ n =(z(1+r)/\log(\hat{\nu}))^2/r.\]
We continue with theSchoenfeld example eventcounts:
We reproduce the approximate hazard ratios required to cross efficacybounds using the Schoenfeld approximations above:
gsHR(z = Schoenfeld$upper$bound,# Z-values at boundi =1:2,# Analysis numberx = Schoenfeld,# Group sequential design from aboveratio = r# Experimental/control randomization ratio)#> [1] 0.6572433 0.8079049For the following examples, we assume\(r=1\).
hr<- .73# Observed hrevents<-125# Events in analysisz<-log(hr)*sqrt(events* r)/ (1+ r)c(z,pnorm(z))# Z- and p-value#> [1] -1.75928655 0.03926443We replicate the Z-value with
We can reproduce this withzn2hr() by switching the signofz above; note that the default isratio = 1for all of these functions and often is not specified:
This is replicated with
For the purpose of sample size and power for group sequential design,theLachin and Foulkes (1986) isrecommended based on substantial evaluation not documented further here.We try to make clear here what some of the strengths and weaknesses ofboth theLachin and Foulkes (1986) methodas well as its implementation in thegsDesign::nSurv()(fixed design) andgsDesign::gsSurv() (group sequential)functions. For historical and testing purposes, we also discuss use ofthe less flexiblegsDesign::nSurvival() function that wasindependently programmed and can be used for some limited validations ofgsDesign::nSurv().
Some detail in specification comes With the flexibility allowed bytheLachin and Foulkes (1986) method. Themodel assumes
Other than the proportional hazards assumption, this allows a greatdeal of flexibility in trial design assumptions. WhileLachin and Foulkes (1986) adjusts the piecewiseconstant enrollment rates proportionately to derive a sample size,gsDesign::nSurv() also enables the approach ofKim and Tsiatis (1990) which fixes enrollmentrates and extends the final enrollment rate duration to power the trial;the minimum follow-up period is still assumed with this approach. We donot enable the drop-in option proposed inLachinand Foulkes (1986).
The two practical differences theLachin andFoulkes (1986) method has from theSchoenfeld (1981) method are:
We will use the same hazard ratio 0.7 as for theSchoenfeld (1981) sample size calculationsabove. We assume further that the trial will enroll for a constant ratefor 12 months, have a control group median of 8 months (exponentialfailure rate\(\lambda = \log(2)/8\)),a dropout rate of 0.001 per month, and 16 months of minimum follow-up.As before, we assume a randomization ratio\(r=1\), one-sided Type I error\(\alpha=0.025\), 90% power which isequivalent to Type II error\(\beta=0.1\).
r<-1# Experimental/control randomization ratioalpha<-0.025# 1-sided Type I errorbeta<-0.1# Type II error (1 - power)hr<-0.7# Hazard ratio (experimental / control)controlMedian<-8dropoutRate<-0.001# Exponential dropout rate per time unitenrollDuration<-12minfup<-16# Minimum follow-upNlf<-nSurv(lambdaC =log(2)/ controlMedian,hr = hr,eta = dropoutRate,T = enrollDuration+ minfup,# Trial durationminfup = minfup,ratio = r,alpha = alpha,beta = beta)cat(paste("Sample size: ",ceiling(Nlf$n),"Events: ",ceiling(Nlf$d),"\n"))#> Sample size: 422 Events: 330Recall that theSchoenfeld (1981)method recommended 331 events. The two methods tend to yield verysimilar event count recommendations, but not the same. Other methodswill also differ slightly; seeLachin and Foulkes(1986). Sample size recommendations can vary more betweenmethods.
We can get the same result with thenSurvival() routinesince only a single enrollment, failure and dropout rate is proposed forthis example.
lambda1<-log(2)/ controlMediannSurvival(lambda1 = lambda1,lambda2 = lambda1* hr,Ts = enrollDuration+ minfup,Tr = enrollDuration,eta = dropoutRate,ratio = r,alpha = alpha,beta = beta)#> Fixed design, two-arm trial with time-to-event#> outcome (Lachin and Foulkes, 1986).#> Study duration (fixed): Ts=28#> Accrual duration (fixed): Tr=12#> Uniform accrual: entry="unif"#> Control median: log(2)/lambda1=8#> Experimental median: log(2)/lambda2=11.4#> Censoring median: log(2)/eta=693.1#> Control failure rate: lambda1=0.087#> Experimental failure rate: lambda2=0.061#> Censoring rate: eta=0.001#> Power: 100*(1-beta)=90%#> Type I error (1-sided): 100*alpha=2.5%#> Equal randomization: ratio=1#> Sample size based on hazard ratio=0.7 (type="rr")#> Sample size (computed): n=422#> Events required (computed): nEvents=330Now we produce a group sequential design with a default asymmetricdesign with a futility bound based on\(\beta\)-spending. We round interim eventcounts and round up the final event count to ensure the targetedpower.
k<-2# Total number of analyseslfgs<-gsSurv(k =2,lambdaC =log(2)/ controlMedian,hr = hr,eta = dropoutRate,T = enrollDuration+ minfup,# Trial durationminfup = minfup,ratio = r,alpha = alpha,beta = beta)%>%toInteger()lfgs%>%gsBoundSummary()%>%kable(row.names =FALSE)| Analysis | Value | Efficacy | Futility |
|---|---|---|---|
| IA 1: 50% | Z | 2.7500 | 0.4150 |
| N: 440 | p (1-sided) | 0.0030 | 0.3391 |
| Events: 172 | ~HR at bound | 0.6575 | 0.9387 |
| Month: 13 | P(Cross) if HR=1 | 0.0030 | 0.6609 |
| P(Cross) if HR=0.7 | 0.3422 | 0.0269 | |
| Final | Z | 1.9811 | 1.9811 |
| N: 440 | p (1-sided) | 0.0238 | 0.0238 |
| Events: 344 | ~HR at bound | 0.8076 | 0.8076 |
| Month: 28 | P(Cross) if HR=1 | 0.0239 | 0.9761 |
| P(Cross) if HR=0.7 | 0.9006 | 0.0994 |
Although we did not use theSchoenfeld(1981) for sample size, it is still used for the approximate HRat bound calculation above:
There are various plots available. The approximate hazard ratiosrequired to cross bounds again use theSchoenfeld(1981) approximation. For aggplot2 version ofthis plot, use the defaultbase = FALSE.
The variance calculations for the Lachin and Foulkes method aremostly determined by expected event accrual under the null and alternatehypotheses. The null hypothesis characterized above is seeminglydesigned so that event accrual will be similar under each hypothesis.You can see the expected events accrued at each analysis under thealternate hypothesis with:
| Analysis | Control events | Experimental events |
|---|---|---|
| 1 | 97.04664 | 74.95336 |
| 2 | 184.48403 | 159.51599 |
It is worth noting that if events accrue at the same rate in both thenull and alternate hypothesis, then the expected duration of time toachieve the targeted events would be shortened. Keep in mind that therecan be many reasons events will accrue at a different rate than in thedesign plan.
The expected event accrual of events over time for a design can becomputed as follows:
Month<-seq(0.025, enrollDuration+ minfup, .025)plot(c(0, Month),c(0,sapply(Month,function(x) {nEventsIA(tIA = x,x = lfgs) })),type ="l",xlab ="Month",ylab ="Expected events",main ="Expected event accrual over time")On the other hand, if you want to know the expected time to accrue25% of the final events and what the expected enrollment accrual is atthat time, you compute using:
b<-tEventsIA(x = lfgs,timing =0.25)cat(paste(" Time: ",round(b$T,1),"\n Expected enrollment:",round(b$eNC+ b$eNE,1),"\n Expected control events:",round(b$eDC,1),"\n Expected experimental events:",round(b$eDE,1),"\n"))#> Time: 8.9#> Expected enrollment: 325.7#> Expected control events: 49.1#> Expected experimental events: 36.9For expected accrual of events without a design returned bygsDesign::gsSurv(), see the help file forgsDesign::eEvents().