Movatterモバイル変換

Overview and notation

This vignette provides an overview of the binomial two arm trialdesign and analysis. We consider designs for superiority,non-inferiority, and super-superiority trials. These can reflect binaryendpoints measuring successful outcomes (e.g., response) or unsuccessfuloutcomes (e.g., failure). We focus here on risk-difference and fixed(not group sequential) designs to limit the scope. Both risk ratio andodds ratio are also available in the package as outlined inMiettinen and Nurminen (1985).

The basic method for computing the fixed sample size that is thebasis for group sequential design sizes for superiority was developed byFleiss, Tytun, and Ury (1980), but isapplied here without the continuity correction as recommended byGordon and Watson (1996). This method wasextended to non-inferiority and super-superiority trials byFarrington and Manning (1990).

We will see that while asymptotic formulations are generally goodapproximations, fast simulation methods can provide more accurateresults both for Type I error and power.

The R packages we use are:

library(gsDesign)library(ggplot2)library(tidyr)library(gt)library(dplyr)

Sample size

The rate arguments innBinomial() arep1andp2.p1 is the rate in group 1 andp2 is the rate in group 2. For a simple example, we cancompute the sample size for a superiority design with a 2:1 sample sizeratio where the experimental and control groups have assumed successrates of 0.2 and 0.1, respectively. One-sided Type I error is 0.025 andType II error is 0.15 (power is 85%). This gives the same result if theroles ofp1 andp2 are changed andratio is inverted.

nBinomial(p1 =0.2,p2 =0.1,ratio =2,alpha =0.025,beta =0.15)%>%ceiling()#> [1] 496nBinomial(p1 =0.1,p2 =0.2,ratio =0.5,alpha =0.025,beta =0.15)%>%ceiling()#> [1] 496

The above results apply as well to a failure endpoint, but now 0.2would be the control group rate and 0.1 would be the experimental grouprate. Results can also be computed using the risk-ratio and odds-ratiomethods ofMiettinen and Nurminen (1985).For 2:1 randomization (Experimental:Control) we setratio = 0.5 since group 1 represents the control group inthe following. We see the results are the same for the risk-ratio methodas the risk-difference method from above (defaultscale = "Difference"). For the odds-ratio method, we seethe sample size is larger in this case.

scale<-c("Difference","RR","OR")tibble(scale,"Sample size"=c(nBinomial(p1 =0.2,p2 =0.1,ratio =0.5,alpha =0.025,beta =0.15,scale = scale[1])%>%ceiling(),nBinomial(p1 =0.2,p2 =0.1,ratio =0.5,alpha =0.025,beta =0.15,scale = scale[2])%>%ceiling(),nBinomial(p1 =0.2,p2 =0.1,ratio =0.5,alpha =0.025,beta =0.15,scale = scale[3])%>%ceiling()))%>%gt()%>%tab_header("Sample size by scale for a superiority design",subtitle ="alpha = 0.025, beta = 0.15, pE = 0.2, pC = 0.1"  )

scale	Sample size
Sample size by scale for a superiority design
alpha = 0.025, beta = 0.15, pE = 0.2, pC = 0.1
Difference	525
RR	525
OR	489

Testing and confidence intervals

Next we assume we have results from a trial with 20 / 30 and 10 / 30successes in the two groups. ThetestBinomial() functioncomputes the Z-value for a binomial test. We see that the scale chosen(default is"Difference") does not matter for theZ-value.

testBinomial(x1 =20,n1 =30,x2 =10,n2 =30)#> [1] 2.581989testBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="RR")#> [1] 2.581989testBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="OR")#> [1] 2.581989

As opposed tonBinomial(), it makes a difference howtreatment groups are assigned. For example, if we have 20 / 30 and 10 /30 successes in the two groups, we can compute the Z-value for abinomial test of the null hypothesis that the difference between the twogroups is equal to 0.

testBinomial(x1 =10,n1 =30,x2 =20,n2 =30)#> [1] -2.581989

We compute a one-sided asymptotic\(p\)-value using the default method ofFarrington and Manning (1990). Withadj = 1, we use theMiettinen andNurminen (1985) method which multiplies the variance estimate ofthe rate difference byn / (n - 1) (continuity correction).Generally, not using the continuity correction is preferred (Gordon and Watson (1996)). Theadjargument is available intestBinomial(),ciBinomial() andsimBinomial(), but is notused innBinomial(). Since the Chi-square test is oftenused, we show that it gives a 2-sided\(p\)-value that is twice the one-sided\(p\)-value.

testBinomial(x1 =10,n1 =30,x2 =20,n2 =30)%>%pnorm(lower.tail =TRUE)#> [1] 0.004911637testBinomial(x1 =10,n1 =30,x2 =20,n2 =30,adj =1)%>%pnorm(lower.tail =TRUE)#> [1] 0.005227859testBinomial(x1 =10,n1 =30,x2 =20,n2 =30,chisq =1)%>%pchisq(df =1,lower.tail =FALSE)/2#> [1] 0.004911637

We can compute a confidence interval for the rate difference usingtheciBinomial() function.

p1<-20/30p2<-10/30rd<- p1- p2rr<- p1/ p2orr<- (p1* (1- p2))/ (p2* (1- p1))rbind(ciBinomial(x1 =20,n1 =30,x2 =10,n2 =30),ciBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="RR"),ciBinomial(x1 =20,n1 =30,x2 =10,n2 =30,scale ="OR"))%>%mutate(scale =c("Risk difference","Risk-ratio","Odds-ratio"),Effect =c(rd, rr, orr)  )%>%gt()%>%tab_header("Confidence intervals for a binomial effect size",subtitle ="x1 = 20, n1 = 30, x2 = 10, n2 = 30"  )%>%fmt_number(columns =c(lower, upper, Effect),n_sigfig =3)

lower	upper	scale	Effect
Confidence intervals for a binomial effect size
x1 = 20, n1 = 30, x2 = 10, n2 = 30
0.0812	0.545	Risk difference	0.333
1.17	3.62	Risk-ratio	2.00
1.38	11.6	Odds-ratio	4.00

Again, how treatment groups are assigned makes a difference.

ciBinomial(x1 =10,n1 =30,x2 =20,n2 =30)#>        lower       upper#> 1 -0.5454184 -0.08115662

Non-inferiority and super-superiority

By setting a non-inferiority margin, we can loosen the stringency onwhat is required compared to a superiority study. The trial need onlyshow the experimental treatment isclose to as good ascontrol treatment. By setting a super-superiority margin, we can showthat the experimental treatment isbetter than thecontrol treatment by at least a specified amount. This increases thestringency of the design compared to a superiority study. The method ofFarrington and Manning (1990) is used forboth non-inferiority and super-superiority designs. The parameterdelta0 sets the margin withdelta0 = 0 forsuperiority,delta0 < 0 for non-inferiority, anddelta0 > 0 for super-superiority. We see that this has asubstantial impact on the sample size requirement to achieve desiredpower.

tibble(Design =c("Superiority","Non-inferiority","Super-superiority"),`p1 (pE)`=c(0.2,0.2,0.2),`p2 (pC)`=c(0.1,0.1,0.1),`delta0`=c(0,-0.02,0.02),`Sample size`=c(ceiling(nBinomial(p1 =0.2,p2 =0.1,alpha =0.025,beta =0.15,ratio =0.5)),ceiling(nBinomial(p1 =0.2,p2 =0.1,alpha =0.025,beta =0.15,ratio =0.5,delta0 =-0.02)),ceiling(nBinomial(p1 =0.2,p2 =0.1,alpha =0.025,beta =0.15,ratio =0.5,delta0 =0.02))  ))%>%gt()%>%tab_header("Sample size for binomial two arm trial design",subtitle ="alpha = 0.025, beta = 0.15"  )%>%fmt_number(columns =c(`p1 (pE)`,`p2 (pC)`),decimals =2)%>%cols_label(Design ="Design",`p1 (pE)`="Experimental group rate",`p2 (pC)`="Control group rate",delta0 ="Null hypothesis value of rate difference (delta0)",`Sample size`="Sample size"  )%>%tab_footnote("Randomization ratio is 2:1 (Experimental:Control) with assumed control failure rate p1 = 0.2 and experimental rate 0.1.")

Design	Experimental group rate	Control group rate	Null hypothesis value of rate difference (delta0)	Sample size
Sample size for binomial two arm trial design
alpha = 0.025, beta = 0.15
Superiority	0.20	0.10	0.00	525
Non-inferiority	0.20	0.10	-0.02	375
Super-superiority	0.20	0.10	0.02	796
Randomization ratio is 2:1 (Experimental:Control) with assumed control failure rate p1 = 0.2 and experimental rate 0.1.

Testing for non-inferiority and super-superiority is equivalent towhether or not the confidence intervals contain the margindelta0. Since both the superiority and non-inferiorityZ-tests exceed the critical value of 1.96, both tests are significant.The third test fails to establish super-superiority since the Z-value isless than 1.96. Since the confidence interval contains 0.02 but neither0 nor -0.02, it comes to the same conclusion as all 3 of thesetests.

testBinomial(x1 =18,n1 =30,x2 =10,n2 =30,delta0 =0)# superiority#> [1] 2.070197testBinomial(x1 =18,n1 =30,x2 =10,n2 =30,delta0 =-0.02)# non-inferiority#> [1] 2.225796testBinomial(x1 =18,n1 =30,x2 =10,n2 =30,delta0 =0.02)# super-superiority#> [1] 1.915403ciBinomial(x1 =18,n1 =30,x2 =10,n2 =30)# CI#>        lower     upper#> 1 0.01426069 0.4872467

Simulation

ThesimBinomial() function simulates the binomialdistribution to compute the Z-value. It does not simulate individualobservations, but rather simulates the binomial distribution.simBinomial() has the same arguments astestBinomial() but also includes the number of simulatedtrials in the argumentnsim. A vector of Z-values isreturned.

simBinomial(p1 =0.2,p2 =0.1,n1 =30,n2 =30,nsim =10)#>  [1]  1.7213259  0.0000000  1.0846523  2.6832816#>  [5]  2.0107528  1.5668414  0.0000000  0.4670994#>  [9] -0.3615508  1.3856406

To see if the asymptotic method controls Type I error at the desiredlevel, we can compute the Type I error rate from the simulatedZ-values.

z<-simBinomial(p1 =0.15,p2 =0.15,n1 =30,n2 =30,nsim =1000000)mean(z>qnorm(0.975))# Type I error rate#> [1] 0.026378

We see that the Type I error rate is slightly inflated. To get anexact Type I error rate, we can compute the appropriate quantile of thesimulated Z-values. We say exact, but the degree of exactness depends onthe number of simulations which is 1 million in this case. Since thesimBinomial() function is fast, we can use a large numberof simulations. In any case, this produces a slightly conservative TypeI error rate.

zcut<-quantile(z,0.975)tibble("Z cutoff"= zcut,"p cutoff"=pnorm(zcut,lower.tail =FALSE))%>%gt()%>%fmt_number(columns =c("Z cutoff","p cutoff"),n_sigfig =3)%>%tab_header("Exact cutoff for Type I error rate",subtitle ="Based on 1 million simulations"  )%>%tab_footnote("The Z cutoff is the quantile of the simulated Z-values at 0.975 using p1 = p2 = 0.15.")

Z cutoff	p cutoff
Exact cutoff for Type I error rate
Based on 1 million simulations
2.01	0.0222
The Z cutoff is the quantile of the simulated Z-values at 0.975 using p1 = p2 = 0.15.

Now we examine power with the asymptotic and exact cutoffs.

z<-simBinomial(p1 =0.2,p2 =0.1,n1 =30,n2 =30,nsim =1000000)cat("Power with asymptotic cutoff ",mean(z>qnorm(0.975)))#> Power with asymptotic cutoff  0.192717cat("\nPower with exact cutoff",mean(z> zcut))#>#> Power with exact cutoff 0.166431

Finally, we compute power based on simulation for the sample sizecalculation above. We see in this case that the sample size of 489 basedon the odds-ratio formulation more accurately approximates the targeted85% power.

ptab<-tibble(Scale =c("Risk-difference","Odds-ratio"),n =c(525,489),Power =c(mean(simBinomial(p1 =0.2,p2 =0.1,n1 =525/3,n2 =525*2/3,nsim =100000)>qnorm(0.975)),mean(simBinomial(p1 =0.2,p2 =0.1,n1 =489/3,n2 =489*2/3,nsim =100000)>qnorm(0.975))  ))ptab%>%gt()%>%tab_header("Simulation power for sample size based on risk-difference and odds-ratio",subtitle ="pE = 0.2, pC = 0.1, alpha = 0.025, beta = 0.15"  )%>%fmt_number(columns =c(n, Power),n_sigfig =3)%>%cols_label(Scale ="Scale",n ="Sample size",Power ="Power")%>%tab_footnote("Power based on 100,000 simulated trials and nominal alpha = 0.025 test; 2 x simulation error = 0.002")%>%tab_footnote("Power based on Z-test for risk-difference with no continuity correction.",location =cells_column_labels("Power"))

Scale	Sample size	Power¹
Simulation power for sample size based on risk-difference and odds-ratio
pE = 0.2, pC = 0.1, alpha = 0.025, beta = 0.15
Risk-difference	525	0.874
Odds-ratio	489	0.854
Power based on 100,000 simulated trials and nominal alpha = 0.025 test; 2 x simulation error = 0.002
¹ Power based on Z-test for risk-difference with no continuity correction.

Power table

ThebinomialPowerTable() function computes power for agiven sample size and treatment effect across a range of underlyingcontrol event rates (pC) and effect sizes(delta). Both the asymptotic and simulation methods areavailable. The simulation method with a large simulation size is anaccurate exact method. Thus,binomialPowerTable() providesa shortcut for the exact method power calculation. We see that the TypeI error achieved depends using the asymptotic cutoff varies by theunderlying control rate, reaching almost 0.03 when the underlyingfailure rate is 0.10.

binomialPowerTable(pC =seq(0.1,0.2,0.02),delta =0,delta0 =0,n =70,failureEndpoint =TRUE,ratio =1,alpha =0.025,simulation =TRUE,nsim =1e6,adj =0)%>%rename("Type I error"="Power")%>%gt()%>%fmt_number(columns ="Type I error",n_sigfig =3)%>%tab_header("Type I error is not controlled with nominal p = 0.025 cutoff")

pC	delta	pE	Type I error
Type I error is not controlled with nominal p = 0.025 cutoff
0.10	0	0.10	0.0285
0.12	0	0.12	0.0280
0.14	0	0.14	0.0273
0.16	0	0.16	0.0256
0.18	0	0.18	0.0250
0.20	0	0.20	0.0244

Adding the continuity correction (adj = 1) helps a smallamount at better controlling Type I error in this case. However,changing the nominal cutoff toalpha = 0.023 withoutcontinuity correction is more successful at controlling Type I errorconsistently.

binomialPowerTable(pC =seq(0.1,0.2,0.02),delta =0,delta0 =0,n =70,failureEndpoint =TRUE,ratio =1,alpha =0.023,simulation =TRUE,nsim =1e6,adj =0)%>%rename("Type I error"="Power")%>%gt()%>%fmt_number(columns ="Type I error",n_sigfig =3)%>%tab_header("Type I error is controlled at 0.025 with nominal p = 0.023 cutoff")

pC	delta	pE	Type I error
Type I error is controlled at 0.025 with nominal p = 0.023 cutoff
0.10	0	0.10	0.0213
0.12	0	0.12	0.0215
0.14	0	0.14	0.0225
0.16	0	0.16	0.0233
0.18	0	0.18	0.0237
0.20	0	0.20	0.0236

Now we look at power for a range of control rates and treatmenteffects.binomialPowerTable() uses all combinations of theinput control group rate (pC) and treatment effect(delta). We use a nominalalpha = 0.023 basedon the simulations above to control Type I error. Initially we basepower on the asymptotic approximation.

power_table_asymptotic<-binomialPowerTable(pC =seq(0.1,0.2,0.025),delta =seq(0.15,0.25,0.02),n =70,ratio =1,alpha =0.023)

Now we produce the same table based on 1 million simulations.

power_table_simulation<-binomialPowerTable(pC =seq(0.1,0.2,0.025),delta =seq(0.15,0.25,0.02),n =70,ratio =1,alpha =0.023,simulation =TRUE,nsim =1000000)

We summarize these results in a plot and see that simulations suggestthe asymptotic approximation slightly underestimates power. However, forall thepC anddelta combinations evaluated,power is above 90% for all cases.

rbind(  power_table_asymptotic%>%mutate(Method ="Asymptotic"),  power_table_simulation%>%mutate(Method ="Simulation"))%>%ggplot(aes(x = delta,y = Power,color =factor(pC),lty = Method))+geom_line()+labs(x ="Treatment effect (delta)",y ="Power",color ="Control rate")+scale_color_brewer(palette ="Set1")+theme_minimal()+theme(legend.position ="bottom")+# Grid points on the x-axis at 0.05 intervalsscale_x_continuous(breaks =seq(0.15,0.25,by =0.05))+# Put y-axis scale on percent scalescale_y_continuous(labels = scales::percent_format(accuracy =1),breaks =seq(0.2,0.8,by =0.2)  )+coord_cartesian(ylim =c(0.2,0.8))+ggtitle("Power for Binomial Two Arm Trial Design with n = 70")

Following is a table of the simulation results from above in a wideformat.

# Transform table with values from Power to a wide format with# Put "Control group rate" (pC) in rows and Treatment effect (delta) in columns# Put a spanner label over columns after first column with label "Treatment effect (delta)"power_table_simulation%>%select(-pE)%>%  tidyr::pivot_wider(names_from = delta,values_from = Power  )%>%  dplyr::rename(`Control group rate`= pC)%>%  gt::gt()%>%  gt::tab_spanner(label ="Treatment effect (delta)",columns =2:7  )%>%  gt::fmt_percent(decimals =1)%>%  gt::tab_header("Power by Control Group Rate and Treatment Effect")

Control group rate	Treatment effect (delta)
Power by Control Group Rate and Treatment Effect
Control group rate	0.15	0.17	0.19	0.21	0.23	0.25
10.0%	37.3%	44.9%	52.4%	59.7%	66.5%	72.6%
12.5%	34.2%	41.1%	48.4%	55.4%	62.3%	68.8%
15.0%	31.7%	38.2%	45.2%	52.2%	59.2%	65.8%
17.5%	29.7%	36.1%	42.9%	49.9%	56.8%	63.5%
20.0%	28.3%	34.5%	41.1%	47.9%	54.6%	61.2%