Movatterモバイル変換

Introduction

This article is intended to give a gentle mathematical andstatistical introduction to group sequential design. We also providerelatively simple examples from the literature to explain clinicalapplications. There is no programming shown, but by accessing the sourcefor the article all required programming can be accessed; substantialcommenting is provided in the source in the hope that users canunderstand how to implement the concepts developed here. Hopefully, thefew mathematical and statistical concepts introduced will not discouragethose wishing to understand some underlying concepts for groupsequential design.

A group sequential design enables repeated analysis of an endpointfor a clinical trial to enable possible early stopping of a trial foreither a positive result, for futility, or for a safety issue. Thisapproach can

limit exposure risk to patients and clinical trial investment pastthe time where known unacceptable safety risks have been established forthe endpoint of interest,
limit investment in a trial where interim results suggest furtherevaluation for a positive efficacy finding is futile, or
accelerate the availability of a highly effective treatment byenabling early approval following an early positive finding.

Examples of outcomes that might be considered include:

a continuous outcome such as change from baseline at some fixedfollow-up time in the HAM-D depression score,
absolute or difference or risk ratio for a response rate (e.g., inoncology) or failure rate for a binary (yes/no) outcome, and
a hazard ratio for a time-to-event out such such as time-to-death ordisease progression in an oncology trial or for time until acardiovascular event (death, myocardial infarction or unstableangina).

Examples of the above include:

a new treatment for major depression where an interim analysis of acontinuous outcome stopped the trial for futility (Binneman et al. (2008)),
a new treatment for patients with unstable angina undergoing balloonangioplasty with a positive interim finding for a binary outcome ofdeath, myocardial infarction or urgent repeat intervention within 30days (The CAPTURE Investigators (1997)),and
a new treatment for patients with lung cancer based on a positiveinterim finding for time-to-death (Gandhi et al.(2018)).

Group sequential design framework

We assume

A two-arm clinical trial with a control and experimental group.
There are\(k\) analyses plannedfor some integer\(k> 1.\)
There is a natural parameter\(\delta\) describing the underlyingtreatment difference with an estimate that has an asymptotically normaland efficient estimate\(\hat\delta_j\)with variance\(\sigma_j^2\) andcorresponding statistical information\(\mathcal{I}_j=1/\sigma_j^2\), at analysis\(j=1,2,\ldots,k\). A positive valuefavoring experimental treatment and negative value favoring control. Weassume a consistent estimate\(\hat\sigma_j^2\) of\(\sigma_j^2, j=1,2,\ldots,k\).
The information fraction is defined as\(t_j=\mathcal{I}_i/\mathcal{I}_j\) atanalysis\(j=1,\ldots,k\).
Correlations between estimates at different analyses are\(\text{Corr}(\hat\delta_i,\hat\delta_j)=\sqrt{\mathcal{I}_i/\mathcal{I}_j}=\sqrt{t_j}\)for\(1\le i\le j\le k.\)
There is a test test\(Z_j\approx\hat\delta_j/\hat{\sigma}^2_j.\)

For a time-to-event outcome,\(\delta\) would typically represent thelogarithm of the hazard ratio for the control group versus theexperimental group. For a difference in response rates,\(\delta\) would represent the underlyingresponse rates. For a continuous outcome such as the HAM-D, we wouldexamine the difference in change from baseline at a milestone time point(e.g., at 6 weeks as inBinneman et al.(2008)). For\(j=1,\ldots,k\),the tests\(Z_j\) are asymptoticallymultivariate normal with correlations as above, and for\(i=1,\ldots,k\) have\(\text{Cov}(Z_i,Z_j)=\text{Corr}(\hat\delta_i,\hat\delta_j)\)and\(E(Z_j)=\delta\sqrt{I_j}.\)

This multivariate asymptotic normal distribution for\(Z_1,\ldots,Z_k\) is referred to as thecanonical form byJennison and Turnbull(2000) who have also summarized much of the surroundingliterature.

Bounds for testing

One-sided testing

We assume that the primary test the null hypothesis\(H_{0}\):\(\delta=0\) against the alternative\(H_{1}\):\(\delta= \delta_1\) for a fixed effect size\(\delta_1 > 0\) which represents abenefit of experimental treatment compared to control. We assume furtherthat there is interest in stopping early if there is good evidence toreject one hypothesis in favor of the other. For\(i=1,2,\ldots,k-1\), interim cutoffs\(l_{i}< u_{i}\) are set; final cutoffs\(l_{k}\leq u_{k}\) are also set. For\(i=1,2,\ldots,k\), the trial isstopped at analysis\(i\) to reject\(H_{0}\) if\(l_{j}<Z_{j}< u_{j}\),\(j=1,2,\dots,i-1\) and\(Z_{i}\geq u_{i}\). If the trial continuesuntil stage\(i\),\(H_{0}\) is not rejected at stage\(i\), and\(Z_{i}\leq l_{i}\) then\(H_{1}\) is rejected in favor of\(H_{0}\),\(i=1,2,\ldots,k\). Thus,\(3k\) parameters define a group sequentialdesign:\(l_{i}\),\(u_{i}\), and\(\mathcal{I}_{i}\),\(i=1,2,\ldots,k\). Note that if\(l_{k}< u_{k}\) there is the possibilityof completing the trial without rejecting\(H_{0}\) or\(H_{1}\). We will often restrict\(l_{k}= u_{k}\) so that one hypothesis isrejected.

We begin with a one-sided test. In this case there is no interest instopping early for a lower bound and thus\(l_i= -\infty\),\(i=1,2,\ldots,k\). The probability of firstcrossing an upper bound at analysis\(i\),\(i=1,2,\ldots,k\), is

\[\alpha_{i}^{+}(\delta)=P_{\delta}\{\{Z_{i}\gequ_{i}\}\cap_{j=1}^{i-1}\{Z_{j}< u_{j}\}\}\]

The Type I error is the probability of ever crossing the upper boundwhen\(\delta=0\). The value\(\alpha^+_{i}(0)\) is commonly referred toas the amount of Type I error spent at analysis\(i\),\(1\leqi\leq k\). The total upper boundary crossing probability for atrial is denoted in this one-sided scenario by

\[\alpha^+(\delta) \equiv\sum_{i=1}^{k}\alpha^+_{i}(\delta)\]

and the total Type I error by\(\alpha^+(0)\). Assuming\(\alpha^+(0)=\alpha\) the design will besaid to provide a one-sided group sequential test at level\(\alpha\).

Asymmetric two-sided testing

With both lower and upper bounds for testing and any real value\(\delta\) representing treatment effect wedenote the probability of crossing the upper boundary at analysis\(i\) without previously crossing a boundby

\[\alpha_{i}(\delta)=P_{\delta}\{\{Z_{i}\gequ_{i}\}\cap_{j=1}^{i-1}\{ l_{j}<Z_{j}< u_{j}\}\},\]

\(i=1,2,\ldots,k.\) The totalprobability of crossing an upper bound prior to crossing a lower boundis denoted by

\[\alpha(\delta)\equiv\sum_{i=1}^{k}\alpha_{i}(\delta).\]

Next, we consider analogous notation for the lower bound. For\(i=1,2,\ldots,k\) denote the probability ofcrossing a lower bound at analysis\(i\) without previously crossing any boundby\[\beta_{i}(\delta)=P_{\delta}\{\{Z_{i}\leql_{i}\}\cap_{j=1}^{i-1}\{ l_{j}<Z_{j}< u_{j}\}\}.\] The total lower boundary crossingprobability in this case is written as\[\beta(\delta)={\sum\limits_{i=1}^{k}}\beta_{i}(\delta).\]

When a design has final bounds equal (\(l_k=u_k\)),\(\beta(\delta_1)\) is the Type II errorwhich is equal to 1 minus the power of the design. In this case,\(\beta_i(\delta)\) is referred to as the\(\beta\)-spending at analysis\(i, i=1,\ldots,k\).

Spending function design

Type I error is most often defined with\(\alpha_i^+(0), i=1,\ldots,k\). This isreferred to as non-binding Type I error since any lower bound is ignoredin the calculation. This means that if a trial is continued in spite ofa lower bound being crossed at an interim analysis that Type I error isstill controlled at the design\(\alpha\)-level. For Phase III trials usedfor approvals of new treatments, non-binding Type I error calculation isgenerally expected by regulators.

For any given\(0<\alpha<1\)we define a non-decreasing\(\alpha\)-spending function\(f(t; \alpha)\) for\(t\geq 0\) with\(\alpha\left( 0\right) =0\) and for\(t\geq 1\),\(f(t; \alpha) =\alpha\). Letting\(t_0=0\), we set\(\alpha_j(0)\) for\(j=1,\ldots,k\) through the equation\[\alpha^+_{j}(0) = f(t_j;\alpha)-f(t_{j-1};\alpha).\] Assuming an asymmetric lower bound, we similarly use a\(\beta\)-spending function and to set\(\beta\)-spending at analysis\(j=1,\ldots, k\) as:\[\beta_{j}(\delta_1) = g(t_j;\delta_1, \beta) -g(t_{j-1};\delta_1, \beta).\]

In the following example, the function\(\Phi()\) represents the cumulativedistribution function for the standard normal distribution function(i.e., mean 0, standard deviation 1). The major depression study ofBinneman et al. (2008) considered aboveused theLan and DeMets (1983) spendingfunction approximating an O’Brien-Fleming bound for a single interimanalysis half way through the trial with

\[f(t; \alpha) =2\left( 1-\Phi\left( \frac{\Phi^{-1}(\alpha/2)}{\sqrt{t}}\right) \right).\]

\[g(t; \beta) =2\left( 1-\Phi\left( \frac{\Phi^{-1}(\beta/2)}{\sqrt{t}}\right) \right).\]

library(gsDesign)

delta1<-3# Treatment effect, alternate hypothesisdelta0<-0# Treatment effect, null hypothesisratio<-1# Randomization ratio (experimental / control)sd<-7.5# Standard deviation for change in HAM-D scorealpha<-0.1# 1-sided Type I errorbeta<-0.17# Targeted Type II error (1 - targeted power)k<-2# Number of planned analysestest.type<-4# Asymmetric bound design with non-binding futility boundtiming<- .5# information fraction at interim analysessfu<- sfLDOF# O'Brien-Fleming spending function for alpha-spendingsfupar<-0# Parameter for upper spending functionsfl<- sfLDOF# O'Brien-Fleming spending function for beta-spendingsflpar<-0# Parameter for lower spending functiondelta<-0endpoint<-"normal"

# Derive normal fixed design sample sizen<-nNormal(delta1 = delta1,delta0 = delta0,ratio = ratio,sd = sd,alpha = alpha,beta = beta)

# Derive group sequential design based on parameters abovex<-gsDesign(k = k,test.type = test.type,alpha = alpha,beta = beta,timing = timing,sfu = sfu,sfupar = sfupar,sfl = sfl,sflpar = sflpar,delta = delta,# Not used since n.fix is provideddelta1 = delta1,delta0 = delta0,endpoint ="normal",n.fix = n)# Convert sample size at each analysis to integer valuesx<-toInteger(x)

#> toInteger: rounding done to nearest integer since ratio was not specified as postive integer .

The planned design used\(\alpha=0.1\), one-sided and Type II error17% (83% power) with an interim analysis at 50% of the final plannedobservations. This leads to Type I\(\alpha\)-spending of 0.02 and\(\beta\)-spending of 0.052 at the plannedinterim. An advantage of the spending function approach is that boundscan be adjusted when the number of observations at analyses aredifferent than planned. The actual observations for experimental versuscontrol at the analysis were 59 as opposed to the planned 67, whichresulted in interim spending fraction\(t_1=\) 0.4403. With the Lan-DeMets spendingfunction to approximate O’Brien-Fleming bounds this results in\(\alpha\)-spending of 0.0132(P(Cross) if delta=0 row in Efficacy column) and\(\beta\)-spending of 0.0386(P(Cross) if delta=3 row in Futility column). We note thatthe Z-value and 1-sided p-values in the table below correspond exactlyand either can be used for evaluation of statistical significance for atrial result. The rows labeled~delta at bound areapproximations that describe approximately what treatment difference isrequired to cross a bound; these should not be used for a formalevaluation of whether a bound has been crossed. The O’Brien-Flemingspending function is generally felt to provide conservative bounds forstopping at interim analysis. Most of the error spending is reserved forthe final analysis in this example. The futility bound only required asmall trend in the wrong direction to stop the trial; a nominal p-valueof 0.77 was observed which crossed the futility bound, stopping thetrial since this was greater than the futility p-value bound of 0.59.Finally, we note that at the final analysis, the cumulative probabilityforP(Cross) if delta=0 is less than the planned\(\alpha=0.10\). This probability represents\(\alpha(0)\) which excludes theprobability of crossing the lower bound at the interim analysis and thefinal analysis. The value of the non-binding Type I error is still\(\alpha^+(0) = 0.10\).

# Updated alpha is unchangedalphau<-0.1# Updated sample size at each analysisn.I<-c(59,134)# Updated number of analysesku<-length(n.I)# Information fraction is used for spendingusTime<- n.I/ x$n.I[x$k]lsTime<- usTime

# Update design based on actual interim sample size and planned final sample sizexu<-gsDesign(k = ku,test.type = test.type,alpha = alphau,beta = x$beta,sfu = sfu,sfupar = sfupar,sfl = sfl,sflpar = sflpar,n.I = n.I,maxn.IPlan = x$n.I[x$k],delta = x$delta,delta1 = x$delta1,delta0 = x$delta0,endpoint = endpoint,n.fix = n,usTime = usTime,lsTime = lsTime)

# Summarize boundsgsBoundSummary(xu,Nname ="N",digits =4,ddigits =2,tdigits =1)

#>   Analysis               Value Efficacy Futility#>  IA 1: 44%                   Z   2.2209  -0.2304#>      N: 59         p (1-sided)   0.0132   0.5911#>                ~delta at bound   4.3370  -0.4500#>            P(Cross) if delta=0   0.0132   0.4089#>            P(Cross) if delta=3   0.2468   0.0386#>      Final                   Z   1.3047   1.3047#>     N: 134         p (1-sided)   0.0960   0.0960#>                ~delta at bound   1.6907   1.6907#>            P(Cross) if delta=0   0.0965   0.9035#>            P(Cross) if delta=3   0.8350   0.1650