Limited support is provided for 2-sample design with a normallydistributed random variable as the outcome. Users are encouraged to lookat guidance such as inJennison and Turnbull(2000). We provide a tool where for a large sample case where areasonable estimate of standard deviation is available, a reasonablesample size can be computed based straightforward distribution theoryoutlined below.
The overall sample size notation used forgsDesignis to consider a standardized effect size parameter which is referred toas\(\theta\) inJennison and Turnbull (2000). We begin with the2-sample normal problem where we assume a possibly different standarddeviation in each treatment group. For\(j =1, 2\), we let\(X_{j, i}\),\(i = 1, 2, \ldots n_j\) representindependent and identically distributed observations following a normaldistribution with mean\(\mu_j\) andstandard deviation\(\sigma_j\). Thenatural parameter for comparing the two distributions is
\[\delta = \mu_2 - \mu_1\]
and we wish to test if\(\delta >0\) in a one-sided testing scenario to test for superiority oftreatment 2 over treatment 1. We could also consider testing, say,\(\delta > \delta_0\) for anon-inferiority scenario with\(\delta_0<0\) orsupersuperiority if\(\delta_0>0\). While normally a t-testwould be used for this, for large sample sizes this would be nearlyequivalent to a Z-test defined by:
\[Z=\frac{\bar X_2 - \barX_1-\delta_0}{\sqrt{\sigma^2_2/n_2 + \sigma_1^2/n_1}}\approx \frac{\barX_2 - \bar X_1}{\sqrt{s^2_2/n_2 + s_1^2/n_1}}=t\] where\(\bar X_j\) is the sample mean and\(s_j^2\) is the sample variance for group\(j=1,2\). The far right hand side ofthis is Welch’s t-test. For our examples we use this\(t\)-test and show that the sample sizecomputation based on the\(Z\)-testabove works well for the chosen problems.
Thus,\(n_2=rn/(1+r)\),\(n_1=n/(1+r)\) and when\(r=1\) we have\(n_1=n_2=n/2\). Now that we have completedneeded notation, those not interested in the theory behind the samplesize and power calculation used may skip the rest of this section.
We let\[\sigma^2=(1+r)(\sigma_1^2+\sigma_2^2/r)\]and define\[ \theta= (\delta-\delta_0)/\sigma.\] Under the given assumptions,\[Z \sim \text{Normal}\left(\sqrtn\theta,1\right).\] Under the null hypothesis that\(\delta=\delta_0\), we have\(Z\sim \text{Normal}(0,1)\). Thus,regardless of\(n\) we have\[P_0[Z\ge \Phi^{-1}(1-\alpha)]=\alpha.\]Under the alternate hypothesis that\(\delta=\delta_1\) and we denote acorresponding\(\theta_1\). We definethe type II error\(\beta\) and power\(1-\beta\) by
\[\begin{align}1-\beta =& P_1[Z\ge \Phi^{-1}(1-\alpha)]\\=& P[Z-\sqrt n\theta_1\ge \Phi^{-1}(1-\alpha)-\sqrt n\theta_1]\\=&\Phi(\Phi^{-1}(1-\alpha)-\sqrt n\theta_1)).\end{align}\]
If the power\(1-\beta\) is fixed,we can invert this formula to compute sample size with:
\[n=\left(\frac{\Phi^{-1}(1-\beta)+\Phi^{-1}(1-\alpha)}{\theta_1}\right)^2.\]
For 2-sided testing, we simply substitute\(\alpha/2\) for\(\alpha\) in the above two formulas.
We consider two examples to check the above formulasvs. nNormal(). We then confirm that the approximation isworking well by simulating and confirming that the power and Type Ierror approximations are useful. Finally, we provide a simple groupsequential design example.
We consider an example with\(\sigma_2=1.25\),\(\sigma_1=1.6\),\(\delta=0.8\) and\(\delta_0=0\). We let the sample size ratiobe 2 experimental group observations per control observation. We computesample size withnNormal() assuming one-sided Type I error\(\alpha=0.025\) and 90% power (\(1-\beta=0.9\)).
Checking using the sample size formula above, we have:
Now, assume we let the sample size be 200 and compute power under thesame scenario.
From the power formula above, we duplicate this with:
If we want to plot power for a variety of sample sizes, we can inputn as a vector:
n<-100:200pwrn<-nNormal(delta1 =0.8,sd =1.6,sd2 =1.25,alpha =0.025,n = n,ratio =2)plot(n, pwrn,type ="l")Alternatively, you could fix sample size at 200 and plot power underdifferent treatment effect assumptions:
Rather than simulate individual observations, we will take advantageof the fact that for\(j=1,2\)
\[\bar X_j\sim\text{Normal}(\mu_j,\sigma_j^2/n_j)\]
and
\[(n_j-1)s_j^2/\sigma_j^2=\sum_{i=1}^{n_j}(X_{ij}-\bar X_j)/\sigma^2 \sim \chi ^2_{n_j-1}\]
are independent. Thus, we can simulate trial power with\(n=200\) 1 million times with a t-statisticwith unequal variances quickly as follows under the alternatehypothesis:
nsim<-1000000delta<-0.8sd1<-1.6sd2<-1.25n1<-67n2<-133deltahat<-rnorm(n = nsim,mean = delta,sd = sd1/sqrt(n1))-rnorm(n = nsim,mean =0,sd = sd2/sqrt(n2))s<-sqrt( sd1^2*rchisq(n = nsim,df = n1-1)/ (n1-1)/ n1+ sd2^2*rchisq(n = nsim,df = n2-1)/ (n2-1)/ n2)z<- deltahat/ smean(z>=qnorm(.975))#> [1] 0.946407The standard error for this simulation power calculation isapproximately
suggesting we should be within less than about 0.001 if the actualpower, which suggests the normal power approximation is reasonable forthis scenario.
Now we derive a group sequential design under the above scenario. Wewill largely use default parameters and show two methods. For the first,we plug in the fixed sample size above as follows:
d<-gsDesign(k =2,n.fix =nNormal(delta1 =0.8,sd =1.6,sd2 =1.25,alpha =0.025,beta = .1,ratio =2),delta1 =0.8)d%>%gsBoundSummary(deltaname ="Mean difference")%>%kable(row.names =FALSE)| Analysis | Value | Efficacy | Futility |
|---|---|---|---|
| IA 1: 50% | Z | 2.7500 | 0.4122 |
| N: 86 | p (1-sided) | 0.0030 | 0.3401 |
| ~Mean difference at bound | 0.9399 | 0.1409 | |
| P(Cross) if Mean difference=0 | 0.0030 | 0.6599 | |
| P(Cross) if Mean difference=0.8 | 0.3412 | 0.0269 | |
| Final | Z | 1.9811 | 1.9811 |
| N: 172 | p (1-sided) | 0.0238 | 0.0238 |
| ~Mean difference at bound | 0.4788 | 0.4788 | |
| P(Cross) if Mean difference=0 | 0.0239 | 0.9761 | |
| P(Cross) if Mean difference=0.8 | 0.9000 | 0.1000 |
A textual summary of the design is given by:
Asymmetric two-sided group sequential design with non-bindingfutility bound, 2 analyses, sample size 172, 90 percent power, 2.5percent (1-sided) Type I error. Efficacy bounds derived using aHwang-Shih-DeCani spending function with gamma = -4. Futility boundsderived using a Hwang-Shih-DeCani spending function with gamma = -2.
We can get the same answer by plugging in the standardized effectsize we computed above:
gsDesign(k =2,delta = theta,delta1 =0.8)%>%gsBoundSummary(deltaname ="Mean difference")%>%kable(row.names =FALSE)| Analysis | Value | Efficacy | Futility |
|---|---|---|---|
| IA 1: 50% | Z | 2.7500 | 0.4122 |
| N: 86 | p (1-sided) | 0.0030 | 0.3401 |
| ~Mean difference at bound | 0.9399 | 0.1409 | |
| P(Cross) if Mean difference=0 | 0.0030 | 0.6599 | |
| P(Cross) if Mean difference=0.8 | 0.3412 | 0.0269 | |
| Final | Z | 1.9811 | 1.9811 |
| N: 172 | p (1-sided) | 0.0238 | 0.0238 |
| ~Mean difference at bound | 0.4788 | 0.4788 | |
| P(Cross) if Mean difference=0 | 0.0239 | 0.9761 | |
| P(Cross) if Mean difference=0.8 | 0.9000 | 0.1000 |
We leave it to the reader to verify the properties of the abovedesign using simulation as in the fixed design example.