Movatterモバイル変換


[0]ホーム

URL:


Overview of survival endpoint design

Introduction

This article/vignette provides a summary of functions in thegsDesign package supporting design andevaluation of trial designs for time-to-event outcomes. We do not focuson detailed output options, but what numbers summarizing the design arebased on. If you are not looking for this level of detail and just wantto see how to design a fixed or group sequential design for atime-to-event endpoint, see the vignetteBasic time-to-event groupsequential design using gsSurv.

The following functions support use of the very straightforwardSchoenfeld (1981) approximation for 2-armtrials:

The above functions do not directly support sample size calculations.This is done with theLachin and Foulkes(1986) method. Functions include:

Output for survival design information is supported in variousformats:

Schoenfeld approximation support

We will assume a hazard ratio\(\nu <1\) represents a benefit of experimental treatment over control.We let\(\delta = \log\nu\) denote theso-callednatural parameter for this case. Asymptotically thedistribution of the Cox model estimate\(\hat{\delta}\) under the proportionalhazards assumption is (Schoenfeld (1981))\[\hat\delta\sim\text{Normal}(\delta=\log\nu, (1+r)^2/nr).\] The inverse of thevariance is the statistical information:\[\mathcal I = nr/(1 + r)^2.\] Using a Coxmodel to estimate\(\delta\), the Waldtest for\(\text{H}_0: \delta=0\) canbe approximated with the asymptotic variance from above as:

\[Z_W\approx \frac{\sqrt{nr}}{1+r}\hat\delta=\frac{\ln(\hat\nu)\sqrt{nr}}{1+r}.\]

Also, we know that the Wald test\(Z_W\) and a standard normal version of thelogrank\(Z\) are both asymptoticallyefficient and therefore asymptotically equivalent, at least under alocal hypothesis framework. We denote thestandardized effectsize as

\[\theta = \delta\sqrt r / (1+r)=\log(\nu)\sqrt r / (1+r).\] Letting\(\hat\theta = -\sqrt r/(1+r)\hat\delta\) and\(n\) representing the number of eventsobserved, we have\[\hat \theta \sim\text{Normal}(\theta, 1/ n).\] Thus, the standardized Z versionof the logrank is approximately distributed as

\[Z\sim\text{Normal}(\sqrtn\theta,1).\] Treatment effect favoring experimental treatmentcompared to control in this notation corresponds to a hazard ratio\(\nu < 1\), as well as negative values ofthe standardized effect\(\theta\),natural parameter\(\delta\) andstandardized Z-test.

Power and sample size withnEvents()

Based on the above, the power for the logrank test when\(n\) events have been observed isapproximated by

\[P[Z\le z]=\Phi(z -\sqrt n\theta)=\Phi(z-\sqrt{nr}/(1+r)\log\nu).\] Thus, assuming\(n=100\) events and\(\delta = \log\nu=-\log(.7)\), and\(r=1\) (equal randomization) we approximatepower for the logrank test when\(\alpha=0.025\) as

n<-100hr<- .7delta<-log(hr)alpha<- .025r<-1pnorm(qnorm(alpha)-sqrt(n* r)/ (1+ r)* delta)#> [1] 0.4299155

We can compute this withgsDesign::nEvents() as:

nEvents(n = n,alpha = alpha,hr = hr,r = r)#> [1] 0.4299155

We solve for the number of events\(n\) to see how many events are required toobtain a desired power

\[1-\beta=P(Z\ge\Phi^{-1}(1-\alpha))\] with

\[n = \left(\frac{\Phi^{-1}(1-\alpha)+\Phi^{-1}(1-\beta)}{\theta}\right)^2=\frac{(1+r)^2}{r(\log\nu)^2}\left({\Phi^{-1}(1-\alpha)+\Phi^{-1}(1-\beta)}\right)^2.\] Thus, the approximatenumber of events required to power for HR=0.7 with\(\alpha=0.025\) one-sided and power\(1-\beta=0.9\) is

beta<-0.1(1+ r)^2/ r/log(hr)^2* ((qnorm(1- alpha)+qnorm(1- beta)))^2#> [1] 330.3779

which, rounding up, matches (with tabular output):

nEvents(hr = hr,alpha = alpha,beta = beta,r =1,tbl =TRUE)%>%kable()
hrnalphasidedbetaPowerdeltaratiohr0se
0.73310.02510.10.90.1783375110.1099299

The notationdelta in the above table changes the signfor the standardized treatment effect\(\theta\) in the above:

theta<- delta*sqrt(r)/ (1+ r)theta#> [1] -0.1783375

These in the table is the estimated standard error forthe log hazard ratio\(\delta=\log\hat\nu\)

(1+ r)/sqrt(331* r)#> [1] 0.1099299

Group sequential design

We can create a group sequential design for the above problem eitherwith\(\theta\) or with the fixeddesign sample size. The parameterdelta ingsDesign() corresponds to standardized effect size withsign changed\(-\theta\) in notationused above and byJennison and Turnbull(2000), while the natural parameter,\(\log(\text{HR})\) is in the parameterdelta1 passed togsDesign(). The name of theeffect size is specified indeltaname and the parameterlogdelta = TRUE indicates thatdelta inputneeds to be exponentiated to obtain HR in the output below. This examplecode can be useful in practice. We begin by passing the number of eventsfor a fixed design in the parametern.fix (continuous, notrounded) to adapt to a group sequential design. By rounding to integerevent counts with thetoInteger() function we increase thepower slightly over the targeted 90%.

Schoenfeld<-gsDesign(k =2,n.fix =nEvents(hr = hr,alpha = alpha,beta = beta,r =1),delta1 =log(hr))%>%toInteger()#> toInteger: rounding done to nearest integer since ratio was not specified as postive integer .Schoenfeld%>%gsBoundSummary(deltaname ="HR",logdelta =TRUE,Nname ="Events")%>%kable(row.names =FALSE)
AnalysisValueEfficacyFutility
IA 1: 50%Z2.75220.4084
Events: 172p (1-sided)0.00300.3415
~HR at bound0.65720.9396
P(Cross) if HR=10.00300.6585
P(Cross) if HR=0.70.33970.0268
FinalZ1.98101.9810
Events: 345p (1-sided)0.02380.0238
~HR at bound0.80790.8079
P(Cross) if HR=10.02390.9761
P(Cross) if HR=0.70.90040.0996

Information based design

Exactly the same result can be obtained with the following, passingthe standardized effect sizetheta from above to theparameterdelta ingsDesign().

Schoenfeld<-gsDesign(k =2,delta =-theta,delta1 =log(hr))%>%toInteger()

We noted above that the asymptotic variance for\(\hat\theta\) is\(1/n\) which corresponds to statisticalinformation\(\mathcal I=n\) for theparameter\(\theta\). Thus, thevalue

Schoenfeld$n.I#> [1] 172 345

corresponds both to the number of events and the statisticalinformation for the standardized effect size\(\theta\) required to power the trial at thedesired level. Note that if you plug in the natural parameter\(\delta= -\log\nu > 0\), then\(n.I\) returns the statistical informationfor the log hazard ratio.

gsDesign(k =2,delta =-log(hr))$n.I#> [1] 43.06893 86.13786

The reader may wish to look above to derive the exact relationshipbetween events and statistical information for\(\delta\).

Approximating boundary characteristics

Another application of theSchoenfeld(1981) method is to approximate boundary characteristics of adesign. We will considerzn2hr(),gsHR() andgsBoundSummary() to approximate the treatment effectrequired to cross design bounds.zn2hr() is complemented bythe functionshrn2z() andhrz2n(). We beginwith the basic approximation used across all of these functions in thissection and follow with a sub-section with example code to reproducesome of what is in the table above.

We return to the following equation from above:

\[Z\approx Z_W\approx \frac{\sqrt{nr}}{1+r}\hat\delta=\frac{\ln(\hat\nu)\sqrt{nr}}{1+r}.\] Byfixing\(Z=z, n\) we can solve for\(\hat\nu\) from the above:

\[\hat{\nu} =\exp(z(1+r)/\sqrt{rn}).\] By fixing\(\hat\nu\) and\(z\), we can solve for the correspondingnumber of events required:\[ n =(z(1+r)/\log(\hat{\nu}))^2/r.\]

Examples

We continue with theSchoenfeld example eventcounts:

Schoenfeld$n.I#> [1] 172 345

We reproduce the approximate hazard ratios required to cross efficacybounds using the Schoenfeld approximations above:

gsHR(z = Schoenfeld$upper$bound,# Z-values at boundi =1:2,# Analysis numberx = Schoenfeld,# Group sequential design from aboveratio = r# Experimental/control randomization ratio)#> [1] 0.6572433 0.8079049

For the following examples, we assume\(r=1\).

r<-1
  1. Assuming a Cox model estimate\(\hat\nu\) and a corresponding event count,approximately what Z-value (p-value) does this correspond to? We use thefirst equation above:
hr<- .73# Observed hrevents<-125# Events in analysisz<-log(hr)*sqrt(events* r)/ (1+ r)c(z,pnorm(z))# Z- and p-value#> [1] -1.75928655  0.03926443

We replicate the Z-value with

hrn2z(hr = hr,n = events,ratio = r)#> [1] -1.759287
  1. Assuming an efficacy bound Z-value and event count, approximatelywhat hazard ratio must be observed to cross the bound? We use the secondequation above:
z<-qnorm(.025)events<-120exp(z* (1+ r)/sqrt(r* events))#> [1] 0.6991858

We can reproduce this withzn2hr() by switching the signofz above; note that the default isratio = 1for all of these functions and often is not specified:

zn2hr(z =-z,n = events,ratio = r)#> [1] 0.6991858
  1. Finally, if we want an observed hazard ratio\(\hat\nu = .8\) to represent a positiveresult, how many events would be need to observe to achieve a 1-sidedp-value of 0.025? assuming 2:1 randomization? We use the third equationabove:
r<-2hr<- .8z<-qnorm(.025)events<- (z* (1+ r)/log(hr))^2/ revents#> [1] 347.1683

This is replicated with

hrz2n(hr = hr,z = z,ratio = r)#> [1] 347.1683

Lachin and Foulkes design

For the purpose of sample size and power for group sequential design,theLachin and Foulkes (1986) isrecommended based on substantial evaluation not documented further here.We try to make clear here what some of the strengths and weaknesses ofboth theLachin and Foulkes (1986) methodas well as its implementation in thegsDesign::nSurv()(fixed design) andgsDesign::gsSurv() (group sequential)functions. For historical and testing purposes, we also discuss use ofthe less flexiblegsDesign::nSurvival() function that wasindependently programmed and can be used for some limited validations ofgsDesign::nSurv().

Model assumptions

Some detail in specification comes With the flexibility allowed bytheLachin and Foulkes (1986) method. Themodel assumes

Other than the proportional hazards assumption, this allows a greatdeal of flexibility in trial design assumptions. WhileLachin and Foulkes (1986) adjusts the piecewiseconstant enrollment rates proportionately to derive a sample size,gsDesign::nSurv() also enables the approach ofKim and Tsiatis (1990) which fixes enrollmentrates and extends the final enrollment rate duration to power the trial;the minimum follow-up period is still assumed with this approach. We donot enable the drop-in option proposed inLachinand Foulkes (1986).

The two practical differences theLachin andFoulkes (1986) method has from theSchoenfeld (1981) method are:

  1. By assuming enrollment, failure and dropout rates the methoddelivers sample size\(N\) as well asevents required.
  2. The variance for the log hazard ratio\(\hat\delta\) is computed differently andboth a null (\(\sigma^2_0\)) andalternate hypothesis (\(\sigma^2_1\))variance are incorporated through the formula\[N = \left(\frac{\Phi^{-1}(1-\alpha)\sigma_0 +\Phi^{-1}(1-\beta)\sigma_1}{\delta}\right).\] The null hypothesisis derived by averaging the alternate hypothesis rates, weightingaccording to the proportion randomized in each group.

Fixed design

We will use the same hazard ratio 0.7 as for theSchoenfeld (1981) sample size calculationsabove. We assume further that the trial will enroll for a constant ratefor 12 months, have a control group median of 8 months (exponentialfailure rate\(\lambda = \log(2)/8\)),a dropout rate of 0.001 per month, and 16 months of minimum follow-up.As before, we assume a randomization ratio\(r=1\), one-sided Type I error\(\alpha=0.025\), 90% power which isequivalent to Type II error\(\beta=0.1\).

r<-1# Experimental/control randomization ratioalpha<-0.025# 1-sided Type I errorbeta<-0.1# Type II error (1 - power)hr<-0.7# Hazard ratio (experimental / control)controlMedian<-8dropoutRate<-0.001# Exponential dropout rate per time unitenrollDuration<-12minfup<-16# Minimum follow-upNlf<-nSurv(lambdaC =log(2)/ controlMedian,hr = hr,eta = dropoutRate,T = enrollDuration+ minfup,# Trial durationminfup = minfup,ratio = r,alpha = alpha,beta = beta)cat(paste("Sample size: ",ceiling(Nlf$n),"Events: ",ceiling(Nlf$d),"\n"))#> Sample size:  422 Events:  330

Recall that theSchoenfeld (1981)method recommended 331 events. The two methods tend to yield verysimilar event count recommendations, but not the same. Other methodswill also differ slightly; seeLachin and Foulkes(1986). Sample size recommendations can vary more betweenmethods.

We can get the same result with thenSurvival() routinesince only a single enrollment, failure and dropout rate is proposed forthis example.

lambda1<-log(2)/ controlMediannSurvival(lambda1 = lambda1,lambda2 = lambda1* hr,Ts = enrollDuration+ minfup,Tr = enrollDuration,eta = dropoutRate,ratio = r,alpha = alpha,beta = beta)#> Fixed design, two-arm trial with time-to-event#> outcome (Lachin and Foulkes, 1986).#> Study duration (fixed):          Ts=28#> Accrual duration (fixed):        Tr=12#> Uniform accrual:              entry="unif"#> Control median:      log(2)/lambda1=8#> Experimental median: log(2)/lambda2=11.4#> Censoring median:        log(2)/eta=693.1#> Control failure rate:       lambda1=0.087#> Experimental failure rate:  lambda2=0.061#> Censoring rate:                 eta=0.001#> Power:                 100*(1-beta)=90%#> Type I error (1-sided):   100*alpha=2.5%#> Equal randomization:          ratio=1#> Sample size based on hazard ratio=0.7 (type="rr")#> Sample size (computed):           n=422#> Events required (computed): nEvents=330

Group sequential design

Now we produce a group sequential design with a default asymmetricdesign with a futility bound based on\(\beta\)-spending. We round interim eventcounts and round up the final event count to ensure the targetedpower.

k<-2# Total number of analyseslfgs<-gsSurv(k =2,lambdaC =log(2)/ controlMedian,hr = hr,eta = dropoutRate,T = enrollDuration+ minfup,# Trial durationminfup = minfup,ratio = r,alpha = alpha,beta = beta)%>%toInteger()lfgs%>%gsBoundSummary()%>%kable(row.names =FALSE)
AnalysisValueEfficacyFutility
IA 1: 50%Z2.75000.4150
N: 440p (1-sided)0.00300.3391
Events: 172~HR at bound0.65750.9387
Month: 13P(Cross) if HR=10.00300.6609
P(Cross) if HR=0.70.34220.0269
FinalZ1.98111.9811
N: 440p (1-sided)0.02380.0238
Events: 344~HR at bound0.80760.8076
Month: 28P(Cross) if HR=10.02390.9761
P(Cross) if HR=0.70.90060.0994

Although we did not use theSchoenfeld(1981) for sample size, it is still used for the approximate HRat bound calculation above:

events<- lfgs$n.Iz<- lfgs$upper$boundzn2hr(z = z,n = events)# Schoenfeld approximation to HR#> [1] 0.6574636 0.8076464

Plotting

There are various plots available. The approximate hazard ratiosrequired to cross bounds again use theSchoenfeld(1981) approximation. For aggplot2 version ofthis plot, use the defaultbase = FALSE.

plot(lfgs,pl ="hr",dgt =2,base =TRUE)

Event accrual

The variance calculations for the Lachin and Foulkes method aremostly determined by expected event accrual under the null and alternatehypotheses. The null hypothesis characterized above is seeminglydesigned so that event accrual will be similar under each hypothesis.You can see the expected events accrued at each analysis under thealternate hypothesis with:

tibble::tibble(Analysis =1:2,`Control events`= lfgs$eDC,`Experimental events`= lfgs$eDE)%>%kable()
AnalysisControl eventsExperimental events
197.0466474.95336
2184.48403159.51599

It is worth noting that if events accrue at the same rate in both thenull and alternate hypothesis, then the expected duration of time toachieve the targeted events would be shortened. Keep in mind that therecan be many reasons events will accrue at a different rate than in thedesign plan.

The expected event accrual of events over time for a design can becomputed as follows:

Month<-seq(0.025, enrollDuration+ minfup, .025)plot(c(0, Month),c(0,sapply(Month,function(x) {nEventsIA(tIA = x,x = lfgs)  })),type ="l",xlab ="Month",ylab ="Expected events",main ="Expected event accrual over time")

On the other hand, if you want to know the expected time to accrue25% of the final events and what the expected enrollment accrual is atthat time, you compute using:

b<-tEventsIA(x = lfgs,timing =0.25)cat(paste(" Time: ",round(b$T,1),"\n Expected enrollment:",round(b$eNC+ b$eNE,1),"\n Expected control events:",round(b$eDC,1),"\n Expected experimental events:",round(b$eDE,1),"\n"))#>  Time:  8.9#>  Expected enrollment: 325.7#>  Expected control events: 49.1#>  Expected experimental events: 36.9

For expected accrual of events without a design returned bygsDesign::gsSurv(), see the help file forgsDesign::eEvents().

References

Jennison, Christopher, and Bruce W. Turnbull. 2000.Group SequentialMethods with Applications to Clinical Trials. Boca Raton, FL:Chapman; Hall/CRC.
Kim, Kyungmann, and Anastasios A. Tsiatis. 1990.“Study Durationfor Clinical Trials with Survival Response and Early StoppingRule.”Biometrics 46: 81–92.
Lachin, John M., and Mary A. Foulkes. 1986.“Evaluation of SampleSize and Power for Analyses of Survival with Allowance for NonuniformPatient Entry, Losses to Follow-up, Noncompliance, andStratification.”Biometrics 42: 507–19.
Schoenfeld, David. 1981.“The Asymptotic Properties ofNonparametric Tests for Comparing Survival Distributions.”Biometrika 68 (1): 316–19.

[8]ページ先頭

©2009-2025 Movatter.jp