This vignette provides an overview of the binomial two arm trialdesign and analysis. We consider designs for superiority,non-inferiority, and super-superiority trials. These can reflect binaryendpoints measuring successful outcomes (e.g., response) or unsuccessfuloutcomes (e.g., failure). We focus here on risk-difference and fixed(not group sequential) designs to limit the scope. Both risk ratio andodds ratio are also available in the package as outlined inMiettinen and Nurminen (1985).
The basic method for computing the fixed sample size that is thebasis for group sequential design sizes for superiority was developed byFleiss, Tytun, and Ury (1980), but isapplied here without the continuity correction as recommended byGordon and Watson (1996). This method wasextended to non-inferiority and super-superiority trials byFarrington and Manning (1990).
We will see that while asymptotic formulations are generally goodapproximations, fast simulation methods can provide more accurateresults both for Type I error and power.
The R packages we use are:
The rate arguments innBinomial() arep1andp2.p1 is the rate in group 1 andp2 is the rate in group 2. For a simple example, we cancompute the sample size for a superiority design with a 2:1 sample sizeratio where the experimental and control groups have assumed successrates of 0.2 and 0.1, respectively. One-sided Type I error is 0.025 andType II error is 0.15 (power is 85%). This gives the same result if theroles ofp1 andp2 are changed andratio is inverted.
nBinomial(p1 =0.2,p2 =0.1,ratio =2,alpha =0.025,beta =0.15)%>%ceiling()#> [1] 496nBinomial(p1 =0.1,p2 =0.2,ratio =0.5,alpha =0.025,beta =0.15)%>%ceiling()#> [1] 496The above results apply as well to a failure endpoint, but now 0.2would be the control group rate and 0.1 would be the experimental grouprate. Results can also be computed using the risk-ratio and odds-ratiomethods ofMiettinen and Nurminen (1985).For 2:1 randomization (Experimental:Control) we setratio = 0.5 since group 1 represents the control group inthe following. We see the results are the same for the risk-ratio methodas the risk-difference method from above (defaultscale = "Difference"). For the odds-ratio method, we seethe sample size is larger in this case.
scale<-c("Difference","RR","OR")tibble(scale,"Sample size"=c(nBinomial(p1 =0.2,p2 =0.1,ratio =0.5,alpha =0.025,beta =0.15,scale = scale[1])%>%ceiling(),nBinomial(p1 =0.2,p2 =0.1,ratio =0.5,alpha =0.025,beta =0.15,scale = scale[2])%>%ceiling(),nBinomial(p1 =0.2,p2 =0.1,ratio =0.5,alpha =0.025,beta =0.15,scale = scale[3])%>%ceiling()))%>%gt()%>%tab_header("Sample size by scale for a superiority design",subtitle ="alpha = 0.025, beta = 0.15, pE = 0.2, pC = 0.1" )| Sample size by scale for a superiority design | |
| alpha = 0.025, beta = 0.15, pE = 0.2, pC = 0.1 | |
| scale | Sample size |
|---|---|
| Difference | 525 |
| RR | 525 |
| OR | 489 |
Next we assume we have results from a trial with 20 / 30 and 10 / 30successes in the two groups. ThetestBinomial() functioncomputes the Z-value for a binomial test. We see that the scale chosen(default is"Difference") does not matter for theZ-value.
testBinomial(x1 =20,n1 =30,x2 =10,n2 =30)#> [1] 2.581989testBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="RR")#> [1] 2.581989testBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="OR")#> [1] 2.581989As opposed tonBinomial(), it makes a difference howtreatment groups are assigned. For example, if we have 20 / 30 and 10 /30 successes in the two groups, we can compute the Z-value for abinomial test of the null hypothesis that the difference between the twogroups is equal to 0.
We compute a one-sided asymptotic\(p\)-value using the default method ofFarrington and Manning (1990). Withadj = 1, we use theMiettinen andNurminen (1985) method which multiplies the variance estimate ofthe rate difference byn / (n - 1) (continuity correction).Generally, not using the continuity correction is preferred (Gordon and Watson (1996)). Theadjargument is available intestBinomial(),ciBinomial() andsimBinomial(), but is notused innBinomial(). Since the Chi-square test is oftenused, we show that it gives a 2-sided\(p\)-value that is twice the one-sided\(p\)-value.
testBinomial(x1 =10,n1 =30,x2 =20,n2 =30)%>%pnorm(lower.tail =TRUE)#> [1] 0.004911637testBinomial(x1 =10,n1 =30,x2 =20,n2 =30,adj =1)%>%pnorm(lower.tail =TRUE)#> [1] 0.005227859testBinomial(x1 =10,n1 =30,x2 =20,n2 =30,chisq =1)%>%pchisq(df =1,lower.tail =FALSE)/2#> [1] 0.004911637We can compute a confidence interval for the rate difference usingtheciBinomial() function.
p1<-20/30p2<-10/30rd<- p1- p2rr<- p1/ p2orr<- (p1* (1- p2))/ (p2* (1- p1))rbind(ciBinomial(x1 =20,n1 =30,x2 =10,n2 =30),ciBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="RR"),ciBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="OR"))%>%mutate(scale =c("Risk difference","Risk-ratio","Odds-ratio"),Effect =c(rd, rr, orr) )%>%gt()%>%tab_header("Confidence intervals for a binomial effect size",subtitle ="x1 = 20, n1 = 30, x2 = 10, n2 = 30" )%>%fmt_number(columns =c(lower, upper, Effect),n_sigfig =3)| Confidence intervals for a binomial effect size | |||
| x1 = 20, n1 = 30, x2 = 10, n2 = 30 | |||
| lower | upper | scale | Effect |
|---|---|---|---|
| 0.0812 | 0.545 | Risk difference | 0.333 |
| 1.17 | 3.62 | Risk-ratio | 2.00 |
| 1.38 | 11.6 | Odds-ratio | 4.00 |
Again, how treatment groups are assigned makes a difference.
By setting a non-inferiority margin, we can loosen the stringency onwhat is required compared to a superiority study. The trial need onlyshow the experimental treatment isclose to as good ascontrol treatment. By setting a super-superiority margin, we can showthat the experimental treatment isbetter than thecontrol treatment by at least a specified amount. This increases thestringency of the design compared to a superiority study. The method ofFarrington and Manning (1990) is used forboth non-inferiority and super-superiority designs. The parameterdelta0 sets the margin withdelta0 = 0 forsuperiority,delta0 < 0 for non-inferiority, anddelta0 > 0 for super-superiority. We see that this has asubstantial impact on the sample size requirement to achieve desiredpower.
tibble(Design =c("Superiority","Non-inferiority","Super-superiority"),`p1 (pE)`=c(0.2,0.2,0.2),`p2 (pC)`=c(0.1,0.1,0.1),`delta0`=c(0,-0.02,0.02),`Sample size`=c(ceiling(nBinomial(p1 =0.2,p2 =0.1,alpha =0.025,beta =0.15,ratio =0.5)),ceiling(nBinomial(p1 =0.2,p2 =0.1,alpha =0.025,beta =0.15,ratio =0.5,delta0 =-0.02)),ceiling(nBinomial(p1 =0.2,p2 =0.1,alpha =0.025,beta =0.15,ratio =0.5,delta0 =0.02)) ))%>%gt()%>%tab_header("Sample size for binomial two arm trial design",subtitle ="alpha = 0.025, beta = 0.15" )%>%fmt_number(columns =c(`p1 (pE)`,`p2 (pC)`),decimals =2)%>%cols_label(Design ="Design",`p1 (pE)`="Experimental group rate",`p2 (pC)`="Control group rate",delta0 ="Null hypothesis value of rate difference (delta0)",`Sample size`="Sample size" )%>%tab_footnote("Randomization ratio is 2:1 (Experimental:Control) with assumed control failure rate p1 = 0.2 and experimental rate 0.1.")| Sample size for binomial two arm trial design | ||||
| alpha = 0.025, beta = 0.15 | ||||
| Design | Experimental group rate | Control group rate | Null hypothesis value of rate difference (delta0) | Sample size |
|---|---|---|---|---|
| Superiority | 0.20 | 0.10 | 0.00 | 525 |
| Non-inferiority | 0.20 | 0.10 | -0.02 | 375 |
| Super-superiority | 0.20 | 0.10 | 0.02 | 796 |
| Randomization ratio is 2:1 (Experimental:Control) with assumed control failure rate p1 = 0.2 and experimental rate 0.1. | ||||
Testing for non-inferiority and super-superiority is equivalent towhether or not the confidence intervals contain the margindelta0. Since both the superiority and non-inferiorityZ-tests exceed the critical value of 1.96, both tests are significant.The third test fails to establish super-superiority since the Z-value isless than 1.96. Since the confidence interval contains 0.02 but neither0 nor -0.02, it comes to the same conclusion as all 3 of thesetests.
testBinomial(x1 =18,n1 =30,x2 =10,n2 =30,delta0 =0)# superiority#> [1] 2.070197testBinomial(x1 =18,n1 =30,x2 =10,n2 =30,delta0 =-0.02)# non-inferiority#> [1] 2.225796testBinomial(x1 =18,n1 =30,x2 =10,n2 =30,delta0 =0.02)# super-superiority#> [1] 1.915403ciBinomial(x1 =18,n1 =30,x2 =10,n2 =30)# CI#> lower upper#> 1 0.01426069 0.4872467ThesimBinomial() function simulates the binomialdistribution to compute the Z-value. It does not simulate individualobservations, but rather simulates the binomial distribution.simBinomial() has the same arguments astestBinomial() but also includes the number of simulatedtrials in the argumentnsim. A vector of Z-values isreturned.
simBinomial(p1 =0.2,p2 =0.1,n1 =30,n2 =30,nsim =10)#> [1] 1.7213259 0.0000000 1.0846523 2.6832816#> [5] 2.0107528 1.5668414 0.0000000 0.4670994#> [9] -0.3615508 1.3856406To see if the asymptotic method controls Type I error at the desiredlevel, we can compute the Type I error rate from the simulatedZ-values.
z<-simBinomial(p1 =0.15,p2 =0.15,n1 =30,n2 =30,nsim =1000000)mean(z>qnorm(0.975))# Type I error rate#> [1] 0.026378We see that the Type I error rate is slightly inflated. To get anexact Type I error rate, we can compute the appropriate quantile of thesimulated Z-values. We say exact, but the degree of exactness depends onthe number of simulations which is 1 million in this case. Since thesimBinomial() function is fast, we can use a large numberof simulations. In any case, this produces a slightly conservative TypeI error rate.
zcut<-quantile(z,0.975)tibble("Z cutoff"= zcut,"p cutoff"=pnorm(zcut,lower.tail =FALSE))%>%gt()%>%fmt_number(columns =c("Z cutoff","p cutoff"),n_sigfig =3)%>%tab_header("Exact cutoff for Type I error rate",subtitle ="Based on 1 million simulations" )%>%tab_footnote("The Z cutoff is the quantile of the simulated Z-values at 0.975 using p1 = p2 = 0.15.")| Exact cutoff for Type I error rate | |
| Based on 1 million simulations | |
| Z cutoff | p cutoff |
|---|---|
| 2.01 | 0.0222 |
| The Z cutoff is the quantile of the simulated Z-values at 0.975 using p1 = p2 = 0.15. | |
Now we examine power with the asymptotic and exact cutoffs.
z<-simBinomial(p1 =0.2,p2 =0.1,n1 =30,n2 =30,nsim =1000000)cat("Power with asymptotic cutoff ",mean(z>qnorm(0.975)))#> Power with asymptotic cutoff 0.192717cat("\nPower with exact cutoff",mean(z> zcut))#>#> Power with exact cutoff 0.166431Finally, we compute power based on simulation for the sample sizecalculation above. We see in this case that the sample size of 489 basedon the odds-ratio formulation more accurately approximates the targeted85% power.
ptab<-tibble(Scale =c("Risk-difference","Odds-ratio"),n =c(525,489),Power =c(mean(simBinomial(p1 =0.2,p2 =0.1,n1 =525/3,n2 =525*2/3,nsim =100000)>qnorm(0.975)),mean(simBinomial(p1 =0.2,p2 =0.1,n1 =489/3,n2 =489*2/3,nsim =100000)>qnorm(0.975)) ))ptab%>%gt()%>%tab_header("Simulation power for sample size based on risk-difference and odds-ratio",subtitle ="pE = 0.2, pC = 0.1, alpha = 0.025, beta = 0.15" )%>%fmt_number(columns =c(n, Power),n_sigfig =3)%>%cols_label(Scale ="Scale",n ="Sample size",Power ="Power")%>%tab_footnote("Power based on 100,000 simulated trials and nominal alpha = 0.025 test; 2 x simulation error = 0.002")%>%tab_footnote("Power based on Z-test for risk-difference with no continuity correction.",location =cells_column_labels("Power"))| Simulation power for sample size based on risk-difference and odds-ratio | ||
| pE = 0.2, pC = 0.1, alpha = 0.025, beta = 0.15 | ||
| Scale | Sample size | Power1 |
|---|---|---|
| Risk-difference | 525 | 0.874 |
| Odds-ratio | 489 | 0.854 |
| Power based on 100,000 simulated trials and nominal alpha = 0.025 test; 2 x simulation error = 0.002 | ||
| 1 Power based on Z-test for risk-difference with no continuity correction. | ||
ThebinomialPowerTable() function computes power for agiven sample size and treatment effect across a range of underlyingcontrol event rates (pC) and effect sizes(delta). Both the asymptotic and simulation methods areavailable. The simulation method with a large simulation size is anaccurate exact method. Thus,binomialPowerTable() providesa shortcut for the exact method power calculation. We see that the TypeI error achieved depends using the asymptotic cutoff varies by theunderlying control rate, reaching almost 0.03 when the underlyingfailure rate is 0.10.
binomialPowerTable(pC =seq(0.1,0.2,0.02),delta =0,delta0 =0,n =70,failureEndpoint =TRUE,ratio =1,alpha =0.025,simulation =TRUE,nsim =1e6,adj =0)%>%rename("Type I error"="Power")%>%gt()%>%fmt_number(columns ="Type I error",n_sigfig =3)%>%tab_header("Type I error is not controlled with nominal p = 0.025 cutoff")| Type I error is not controlled with nominal p = 0.025 cutoff | |||
| pC | delta | pE | Type I error |
|---|---|---|---|
| 0.10 | 0 | 0.10 | 0.0285 |
| 0.12 | 0 | 0.12 | 0.0280 |
| 0.14 | 0 | 0.14 | 0.0273 |
| 0.16 | 0 | 0.16 | 0.0256 |
| 0.18 | 0 | 0.18 | 0.0250 |
| 0.20 | 0 | 0.20 | 0.0244 |
Adding the continuity correction (adj = 1) helps a smallamount at better controlling Type I error in this case. However,changing the nominal cutoff toalpha = 0.023 withoutcontinuity correction is more successful at controlling Type I errorconsistently.
binomialPowerTable(pC =seq(0.1,0.2,0.02),delta =0,delta0 =0,n =70,failureEndpoint =TRUE,ratio =1,alpha =0.023,simulation =TRUE,nsim =1e6,adj =0)%>%rename("Type I error"="Power")%>%gt()%>%fmt_number(columns ="Type I error",n_sigfig =3)%>%tab_header("Type I error is controlled at 0.025 with nominal p = 0.023 cutoff")| Type I error is controlled at 0.025 with nominal p = 0.023 cutoff | |||
| pC | delta | pE | Type I error |
|---|---|---|---|
| 0.10 | 0 | 0.10 | 0.0213 |
| 0.12 | 0 | 0.12 | 0.0215 |
| 0.14 | 0 | 0.14 | 0.0225 |
| 0.16 | 0 | 0.16 | 0.0233 |
| 0.18 | 0 | 0.18 | 0.0237 |
| 0.20 | 0 | 0.20 | 0.0236 |
Now we look at power for a range of control rates and treatmenteffects.binomialPowerTable() uses all combinations of theinput control group rate (pC) and treatment effect(delta). We use a nominalalpha = 0.023 basedon the simulations above to control Type I error. Initially we basepower on the asymptotic approximation.
power_table_asymptotic<-binomialPowerTable(pC =seq(0.1,0.2,0.025),delta =seq(0.15,0.25,0.02),n =70,ratio =1,alpha =0.023)Now we produce the same table based on 1 million simulations.
power_table_simulation<-binomialPowerTable(pC =seq(0.1,0.2,0.025),delta =seq(0.15,0.25,0.02),n =70,ratio =1,alpha =0.023,simulation =TRUE,nsim =1000000)We summarize these results in a plot and see that simulations suggestthe asymptotic approximation slightly underestimates power. However, forall thepC anddelta combinations evaluated,power is above 90% for all cases.
rbind( power_table_asymptotic%>%mutate(Method ="Asymptotic"), power_table_simulation%>%mutate(Method ="Simulation"))%>%ggplot(aes(x = delta,y = Power,color =factor(pC),lty = Method))+geom_line()+labs(x ="Treatment effect (delta)",y ="Power",color ="Control rate")+scale_color_brewer(palette ="Set1")+theme_minimal()+theme(legend.position ="bottom")+# Grid points on the x-axis at 0.05 intervalsscale_x_continuous(breaks =seq(0.15,0.25,by =0.05))+# Put y-axis scale on percent scalescale_y_continuous(labels = scales::percent_format(accuracy =1),breaks =seq(0.2,0.8,by =0.2) )+coord_cartesian(ylim =c(0.2,0.8))+ggtitle("Power for Binomial Two Arm Trial Design with n = 70")Following is a table of the simulation results from above in a wideformat.
# Transform table with values from Power to a wide format with# Put "Control group rate" (pC) in rows and Treatment effect (delta) in columns# Put a spanner label over columns after first column with label "Treatment effect (delta)"power_table_simulation%>%select(-pE)%>% tidyr::pivot_wider(names_from = delta,values_from = Power )%>% dplyr::rename(`Control group rate`= pC)%>% gt::gt()%>% gt::tab_spanner(label ="Treatment effect (delta)",columns =2:7 )%>% gt::fmt_percent(decimals =1)%>% gt::tab_header("Power by Control Group Rate and Treatment Effect")| Power by Control Group Rate and Treatment Effect | ||||||
| Control group rate | Treatment effect (delta) | |||||
|---|---|---|---|---|---|---|
| 0.15 | 0.17 | 0.19 | 0.21 | 0.23 | 0.25 | |
| 10.0% | 37.3% | 44.9% | 52.4% | 59.7% | 66.5% | 72.6% |
| 12.5% | 34.2% | 41.1% | 48.4% | 55.4% | 62.3% | 68.8% |
| 15.0% | 31.7% | 38.2% | 45.2% | 52.2% | 59.2% | 65.8% |
| 17.5% | 29.7% | 36.1% | 42.9% | 49.9% | 56.8% | 63.5% |
| 20.0% | 28.3% | 34.5% | 41.1% | 47.9% | 54.6% | 61.2% |
We have demonstrated sample size and power calculations for 2-armbinomial trial designs. Both asymptotic approximations and exact methodsbased on large simulations have been used. We focus on theMiettinen and Nurminen (1985) risk-differenceasymptotic method for testing of differences, but without continuitycorrection as recommended byGordon and Watson(1996). This method generalizes to non-inferiority andsuper-superiority trials as proposed byFarrington and Manning (1990). The risk-ratioand odds-ratio methods are also available. We have also shown how tocompute Type I error and power using simulation methods. Based on thehigh-speed of simulations, there may be some value in using simulationto evaluate Type I error across a range of event rates and then useappropriate nominal cutoffs to evaluate power. These adjusted cutoffswould then also be used for testing when the trial is completed. This isdifferent than Fisher’s exact test which which conditions on the totalnumber of events observed. Results may vary under differentcircumstances, so that while the odds-ratio sample size method was moreaccurate for the given example, this result has not beengeneralized.