Movatterモバイル変換

Introduction

In this example, we illustrate how to use Bayesian dynamic borrowing(BDB) with the inclusion of inverse probability weighting to balancebaseline covariate distributions between external and internal datasets(Psioda et al.,2025). Thisparticular example considers a hypothetical trial with a time-to-eventoutcome which we assume to follow a Weibull distribution; i.e.,\(Y_i \sim \mbox{Weibull}(\alpha, \sigma)\)where\[f(y_i \mid \alpha, \sigma) = \left(\frac{\alpha}{\sigma} \right) \left( \frac{y_i}{\sigma}\right)^{\alpha - 1} \exp \left( -\left( \frac{y_i}{\sigma}\right)^\alpha\right).\] Define\(\boldsymbol{\theta}= \{\log(\alpha), \beta\}\) where\(\beta = -\log(\sigma)\) is the intercept(i.e., log-inverse-scale) parameter of a Weibull proportional hazardsregression model and\(\alpha\) is theshape parameter.

Our objective is to use BDB with IPWs to construct a posteriordistribution for the probability of surviving past time\(t\) in the control arm,\(S_C(t | \boldsymbol{\theta})\) (hereafterdenoted as\(S_C(t)\) for notationalconvenience). For each treatment arm, we will define our priordistributions with respect to\(\boldsymbol{\theta}\) before eventuallyobtaining MCMC samples from the posterior distributions of\(S_C(t)\) and\(S_T(t)\) (i.e., the survival probability attime\(t\) for the active treatmentarm). In this example, suppose we are interested in survivalprobabilities at\(t=12\) months.

Data Description

We will use simulated internal and external datasets from the packagewhere each dataset has a time-to-event response variable (the observedtime at which a participant either had an event or was censored), anevent indicator (1: event; 0: censored), the enrollment time in thestudy, the total time since the start of the study, and four baselinecovariates which we will balance.

The external control dataset has a sample size of 150 participants,and the distributions of the four covariates are as follows:

Covariate 1: normal with a mean and standard deviation ofapproximately 65 and 10, respectively
Covariate 2: binary (0 vs. 1) with approximately 30% ofparticipants with level 1
Covariate 3: binary (0 vs. 1) with approximately 40% ofparticipants with level 1
Covariate 4: binary (0 vs. 1) with approximately 50% ofparticipants with level 1

The internal dataset has 160 participants with 80 participants ineach of the control arm and the active treatment arms. The covariatedistributions of each arm are as follows:

Covariate 1: normal with a mean and standard deviation ofapproximately 62 and 8, respectively
Covariate 2: binary (0 vs. 1) with approximately 40% ofparticipants with level 1
Covariate 3: binary (0 vs. 1) with approximately 40% ofparticipants with level 1
Covariate 4: binary (0 vs. 1) with approximately 60% ofparticipants with level 1

Code

library(tibble)library(distributional)library(dplyr)library(ggplot2)library(rstan)#> Loading required package: StanHeaders#>#> rstan version 2.35.0.9000 (Stan version 2.35.0)#> For execution on a local, multicore CPU with excess RAM we recommend calling#> options(mc.cores = parallel::detectCores()).#> To avoid recompilation of unchanged Stan programs, we recommend calling#> rstan_options(auto_write = TRUE)#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,#> change `threads_per_chain` option:#> rstan_options(threads_per_chain = 1)set.seed(1234)summary(int_tte_df)#>      subjid             y              enr_time         total_time#>  Min.   :  1.00   Min.   : 0.7026   Min.   :0.01367   Min.   : 1.309#>  1st Qu.: 40.75   1st Qu.: 6.4188   1st Qu.:0.83781   1st Qu.: 7.268#>  Median : 80.50   Median :12.1179   Median :1.27502   Median :14.245#>  Mean   : 80.50   Mean   : 9.6651   Mean   :1.19144   Mean   :15.427#>  3rd Qu.:120.25   3rd Qu.:12.7287   3rd Qu.:1.58949   3rd Qu.:20.699#>  Max.   :160.00   Max.   :13.9913   Max.   :1.98912   Max.   :58.644#>      event           trt           cov1            cov2             cov3#>  Min.   :0.00   Min.   :0.0   Min.   :46.00   Min.   :0.0000   Min.   :0.0000#>  1st Qu.:0.00   1st Qu.:0.0   1st Qu.:57.00   1st Qu.:0.0000   1st Qu.:0.0000#>  Median :0.00   Median :0.5   Median :62.00   Median :0.0000   Median :0.0000#>  Mean   :0.45   Mean   :0.5   Mean   :61.83   Mean   :0.3688   Mean   :0.3625#>  3rd Qu.:1.00   3rd Qu.:1.0   3rd Qu.:67.00   3rd Qu.:1.0000   3rd Qu.:1.0000#>  Max.   :1.00   Max.   :1.0   Max.   :85.00   Max.   :1.0000   Max.   :1.0000#>       cov4#>  Min.   :0.0000#>  1st Qu.:0.0000#>  Median :1.0000#>  Mean   :0.5563#>  3rd Qu.:1.0000#>  Max.   :1.0000summary(ex_tte_df)#>      subjid             y               enr_time          total_time#>  Min.   :  1.00   Min.   : 0.05804   Min.   :0.005329   Min.   : 1.191#>  1st Qu.: 38.25   1st Qu.: 4.45983   1st Qu.:0.857692   1st Qu.: 5.782#>  Median : 75.50   Median : 9.42004   Median :1.308708   Median :10.576#>  Mean   : 75.50   Mean   : 8.58877   Mean   :1.232657   Mean   :12.533#>  3rd Qu.:112.75   3rd Qu.:12.71370   3rd Qu.:1.673582   3rd Qu.:16.549#>  Max.   :150.00   Max.   :14.00703   Max.   :1.975702   Max.   :64.793#>      event             cov1            cov2             cov3#>  Min.   :0.0000   Min.   :37.00   Min.   :0.0000   Min.   :0.0000#>  1st Qu.:0.0000   1st Qu.:58.00   1st Qu.:0.0000   1st Qu.:0.0000#>  Median :1.0000   Median :64.00   Median :0.0000   Median :0.0000#>  Mean   :0.6267   Mean   :64.28   Mean   :0.3533   Mean   :0.4533#>  3rd Qu.:1.0000   3rd Qu.:70.00   3rd Qu.:1.0000   3rd Qu.:1.0000#>  Max.   :1.0000   Max.   :90.00   Max.   :1.0000   Max.   :1.0000#>       cov4#>  Min.   :0.0000#>  1st Qu.:0.0000#>  Median :0.0000#>  Mean   :0.4733#>  3rd Qu.:1.0000#>  Max.   :1.0000

Propensity Scores and Inverse Probability Weights

With the covariate data from both the external and internal datasets,we can calculate the propensity scores and ATT inverse probabilityweights (IPWs) for the internal and external control participants usingthecalc_prop_scr function. This creates a propensity scoreobject which we can use for calculating an approximate inverseprobability weighted power prior in the next step.

Note: when reading external and internal datasets intocalc_prop_scr, be sure to include only the arms in whichyou want to balance the covariate distributions (typically the internaland external control arms). In this example, we want to balancethe covariate distributions of the external control arm to be similar tothose of the internal control arm, so we will exclude the internalactive treatment arm data from this function.

Code

ps_obj<-calc_prop_scr(internal_df =filter(int_tte_df, trt==0),external_df = ex_tte_df,id_col = subjid,model =~ cov1+ cov2+ cov3+ cov4)ps_obj#>#> ── Model ───────────────────────────────────────────────────────────────────────#> • cov1 + cov2 + cov3 + cov4#>#> ── Propensity Scores and Weights ───────────────────────────────────────────────#> • Effective sample size of the external arm: 81#> # A tibble: 150 × 4#>    subjid Internal `Propensity Score` `Inverse Probability Weight`#>     <int> <lgl>                 <dbl>                        <dbl>#>  1      1 FALSE                 0.333                        0.500#>  2      2 FALSE                 0.288                        0.405#>  3      3 FALSE                 0.539                        1.17#>  4      4 FALSE                 0.546                        1.20#>  5      5 FALSE                 0.344                        0.524#>  6      6 FALSE                 0.393                        0.646#>  7      7 FALSE                 0.390                        0.639#>  8      8 FALSE                 0.340                        0.515#>  9      9 FALSE                 0.227                        0.294#> 10     10 FALSE                 0.280                        0.389#> # ℹ 140 more rows#>#> ── Absolute Standardized Mean Difference ───────────────────────────────────────#> # A tibble: 4 × 3#>   covariate diff_unadj diff_adj#>   <chr>          <dbl>    <dbl>#> 1 cov1          0.339  0.0461#> 2 cov2          0.0450 0.0204#> 3 cov3          0.160  0.000791#> 4 cov4          0.308  0.00857

In order to check the suitability of the external data, we can createa variety of diagnostic plots. The first plot we might want is ahistogram of the overlapping propensity score distributions from bothdatasets. To get this, we use theprop_scr_hist function.This function takes in the propensity score object made in the previousstep, and we can optionally supply the variable we want to look at(either the propensity score or the IPW). By default, it will plot thepropensity scores. Additionally, we can look at the densities ratherthan histograms by using theprop_scr_dens function. Whenlooking at the IPWs with either the histogram or the density functions,it is important to note that only the IPWs for external controlparticipants will be shown because the ATT IPWs for all internal controlparticipants are equal to 1.

Code

prop_scr_hist(ps_obj)

Code

prop_scr_dens(ps_obj,variable ="ipw")

The final plot we might want to look at is a love plot to visualizethe absolute standardized mean differences (both unadjusted and adjustedby the IPWs) of the covariates between the internal and external data.To do this, we use theprop_scr_love function. Like theprevious function, the only required parameter for this function is thepropensity score object, but we can also provide a location along thex-axis for a vertical reference line.

Code

prop_scr_love(ps_obj,reference_line =0.1)

Approximate Inverse Probability Weighted Power Prior

Now that we have created and assessed our propensity score object, wecan read it into thecalc_power_prior_weibull function tocalculate an approximate inverse probability weighted power prior for\(\boldsymbol{\theta}\) under thecontrol arm, which we denote as\(\boldsymbol{\theta}_C = \{\log(\alpha_C),\beta_C\}\). Specifically, we approximate the power prior with abivariate normal distribution using one of two approximation methods:(1) Laplace approximation or (2) estimation of the mean vector andcovariance matrix using MCMC samples from the unnormalized power prior(see the details section of thecalc_power_prior_weibulldocumentation for more information). In this example, we use the Laplaceapproximation which is considerably faster than the MCMC approach.

To approximate the power prior, we need to supply the followinginformation:

weighted object (the propensity score object we createdabove)
response variable name (in this case\(y\))
event indicator variable name (in this case\(event\))
initial prior for the intercept parameter, in the form of anormal distributional object (e.g.,\(\mbox{N}(0, \mbox{sd} = 10)\))
scale hyperparameter for the half-normal initial prior for theshape parameter
approximation method (either “Laplace” or “MCMC”)

Code

pwr_prior<-calc_power_prior_weibull(ps_obj,response = y,event = event,intercept =dist_normal(0,10),shape =50,approximation ="Laplace")plot_dist(pwr_prior)

Inverse Probability Weighted Robust Mixture Prior

We can robustify the approximate multivariate normal (MVN) powerprior for\(\boldsymbol{\theta}_C\) byadding a vague component to create a robust mixture prior (RMP). Wedefine the vague component to be a MVN distribution with the same meanvector as the approximate power prior and a covariance matrix that isequal to the covariance matrix of the approximate power prior multipliedby\(r_{ex}\), where\(r_{ex}\) denotes the number of observedevents in the external control arm. To construct this RMP, we can useeither therobustify_norm orrobustify_mvnormfunctions, and we place 0.5 weight on each component. The two componentsof the resulting RMP are labeled as “informative” and “vague”.

We can print the mean vectors and covariance matrices of each MVNcomponent using the functionsmix_means andmix_sigmas, respectively.

Code

r_external<-sum(ex_tte_df$event)# number of observed eventsmix_prior<-robustify_mvnorm(pwr_prior, r_external,weights =c(0.5,0.5))# RMPmix_means(mix_prior)# mean vectors#> $informative#> [1]  0.2940907 -2.5655689#>#> $vague#> [1]  0.2940907 -2.5655689mix_sigmas(mix_prior)# mean covariance matrices#> $informative#>             [,1]        [,2]#> [1,] 0.016251652 0.003325476#> [2,] 0.003325476 0.011868788#>#> $vague#>           [,1]      [,2]#> [1,] 1.5276553 0.3125948#> [2,] 0.3125948 1.1156660#plot_dist(mix_prior)

Posterior Distributions

To create a posterior distribution for\(\boldsymbol{\theta}_C\), we can pass theresulting RMP and the internal control data to thecalc_post_weibull function which returns a stanfit objectfrom which we can extract the MCMC samples for the control parameters.In addition to returning posterior samples for\(\log(\alpha_C)\) and\(\beta_C\), the function returns posteriorsamples for the marginal survival probability\(S_C(t)\) where the time(s)\(t\) can be specified as either a scalar orvector of numbers using theanalysis_time argument.

Note: when reading internal data directly intocalc_post_weibull, be sure to include only the arm ofinterest (e.g., the internal control arm if creating a posteriordistribution for\(\boldsymbol{\theta}_C\)).

Code

post_control<-calc_post_weibull(filter(int_tte_df, trt==0),response = y,event = event,prior = mix_prior,analysis_time =12)summary(post_control)$summary#>                     mean      se_mean         sd         2.5%          25%#> beta0         -2.6884167 0.0006511485 0.08942692   -2.8843141   -2.7401878#> log_alpha      0.3618129 0.0007708218 0.10351942    0.1608290    0.2927414#> alpha          1.4436743 0.0011283985 0.15084802    1.1744841    1.3400963#> survProb[1]    0.4729591 0.0002977883 0.04396397    0.3933141    0.4436353#> lp__        -148.4735997 0.0109019328 1.14076850 -151.5707465 -148.8926620#>                      50%          75%        97.5%    n_eff      Rhat#> beta0         -2.6826632   -2.6286451   -2.5327716 18861.51 0.9999771#> log_alpha      0.3608862    0.4289646    0.5692678 18035.81 0.9999667#> alpha          1.4346001    1.5356667    1.7669728 17871.22 0.9999672#> survProb[1]    0.4706902    0.4989838    0.5696035 21796.08 0.9999688#> lp__        -148.1031918 -147.6684684 -147.3832850 10949.34 1.0000250#plot_dist(post_control)

We can extract and plot the posterior samples of\(S_C(t)\). Here, we plot the samples using ahistogram, however, additional posterior plots (e.g., density curves,trace plots) can easily be obtained using thebayesplotpackage.

Code

surv_prob_control<-as.data.frame(extract(post_control,pars =c("survProb")))[,1]ggplot(data.frame(samp = surv_prob_control),aes(x = samp))+labs(y ="Density",x =expression(paste(S[C],"(t=12)")))+ggtitle(expression(paste("Posterior Samples of ", S[C],"(t=12)")))+geom_histogram(aes(y =after_stat(density)),color ="#5398BE",fill ="#5398BE",position ="identity",binwidth = .01,alpha =0.5)+geom_density(color ="black")+coord_cartesian(xlim =c(-0.2,0.8))+theme_bw()

Next, we create a posterior distribution for the survival probability\(S_T(t)\) for the active treatment armat time\(t=12\) by reading theinternal data for the corresponding arm into thecalc_post_weibull function. In this case, we use the vaguecomponent of the RMP as our MVN prior.

As noted earlier, be sure to read in only the data for theinternal active treatment arm while excluding the internal controldata.

Code

vague_prior<-dist_multivariate_normal(mu =list(mix_means(mix_prior)[[2]]),sigma =list(mix_sigmas(mix_prior)[[2]]))post_treated<-calc_post_weibull(filter(int_tte_df, trt==1),response = y,event = event,prior = vague_prior,analysis_time =12)summary(post_treated)$summary#>                     mean      se_mean         sd          2.5%          25%#> beta0         -2.9455467 0.0014284234 0.14611873   -3.26940110   -3.0357568#> log_alpha      0.3522533 0.0015596465 0.15980650    0.03000401    0.2458085#> alpha          1.4404002 0.0022059385 0.22887972    1.03045867    1.2786547#> survProb[1]    0.5897693 0.0003813672 0.05189239    0.48621382    0.5546960#> lp__        -141.6946571 0.0093809319 0.99696774 -144.38164422 -142.0949896#>                      50%          75%        97.5%    n_eff      Rhat#> beta0         -2.9321293   -2.8404786   -2.7001795 10464.00 0.9999707#> log_alpha      0.3572260    0.4616433    0.6563496 10498.72 0.9999868#> alpha          1.4293588    1.5866792    1.9277425 10765.34 0.9999832#> survProb[1]    0.5908005    0.6254122    0.6889950 18514.86 0.9999667#> lp__        -141.3863903 -140.9761950 -140.7037686 11294.58 0.9999667#plot_dist(post_treated)

As was previously done, we can extract and plot the posterior samplesof\(S_T(t)\).

Code

surv_prob_treated<-as.data.frame(extract(post_treated,pars =c("survProb")))[,1]ggplot(data.frame(samp = surv_prob_treated),aes(x = samp))+labs(y ="Density",x =expression(paste(S[T],"(t=12)")))+ggtitle(expression(paste("Posterior Samples of ", S[T],"(t=12)")))+geom_histogram(aes(y =after_stat(density)),color ="#FFA21F",fill ="#FFA21F",position ="identity",binwidth = .01,alpha =0.5)+geom_density(color ="black")+coord_cartesian(xlim =c(-0.2,0.8))+theme_bw()

We define our marginal treatment effect to be the difference insurvival probabilities at 12 months between the active treatment andcontrol arms (i.e.,\(S_T(t=12) -S_C(t=12)\)). We can obtain a sample from the posteriordistribution for\(S_T(t=12) -S_C(t=12)\) by subtracting the posterior sample of\(S_C(t=12)\) from the posterior sample of\(S_T(t=12)\).

Code

samp_trt_diff<- surv_prob_treated- surv_prob_controlggplot(data.frame(samp = samp_trt_diff),aes(x = samp))+labs(y ="Density",x =expression(paste(S[T],"(t=12) - ", S[C],"(t=12)")))+ggtitle(expression(paste("Posterior Samples of ", S[T],"(t=12) - ", S[C],"(t=12)")))+geom_histogram(aes(y =after_stat(density)),color ="#FF0000",fill ="#FF0000",position ="identity",binwidth = .01,alpha =0.5)+geom_density(color ="black")+coord_cartesian(xlim =c(-0.2,0.8))+theme_bw()

Posterior Summary Statistics

Suppose we want to test the hypotheses\(H_0: S_T(t=12) - S_C(t=12) \le 0\) versus\(H_1: S_T(t=12) - S_C(t=12) > 0\).We can use our posterior sample for\(S_T(t=12) - S_C(t=12)\) to calculate theposterior probability\(Pr(S_T(t=12) -S_C(t=12) > 0 \mid D)\) (i.e., the probability in favor of\(H_1\)), and we conclude that we havesufficient evidence in favor of the alternative hypothesis if\(Pr(S_T(t=12) - S_C(t=12) > 0 \mid D) >0.975\).

Code

mean(samp_trt_diff>0)#> [1] 0.9546667

We see that this posterior probability is less than 0.975, and hencewe do not have sufficient evidence in support of the alternativehypothesis.

With MCMC samples from our posterior distributions, we can calculateposterior summary statistics such as the mean, median, and standarddeviation. As an example, we calculate these statistics using theposterior distribution for\(S_T(t=12) -S_C(t=12)\).

Code

c(mean =mean(samp_trt_diff),median =median(samp_trt_diff),SD =sd(samp_trt_diff))#>       mean     median         SD#> 0.11681014 0.11800355 0.06793154

We can also calculate credible intervals using thequantile function.

Code

quantile(samp_trt_diff,c(.025, .975))# 95% CrI#>        2.5%       97.5%#> -0.02067236  0.24475577

Lastly, we calculate the effective sample size of the posteriordistribution for\(S_C(t=12)\) usingthe method by Pennello and Thompson (2008). To do so, we first mustconstruct the posterior distribution of\(S_C(t=12)\)without borrowing from theexternal control data (e.g., using a vague prior).

Code

post_ctrl_no_brrw<-calc_post_weibull(filter(int_tte_df, trt==0),response = y,event = event,prior = vague_prior,analysis_time =12)surv_prob_ctrl_nb<-as.data.frame(extract(post_ctrl_no_brrw,pars =c("survProb")))[,1]n_int_ctrl<-nrow(filter(int_tte_df, trt==0))# sample size of internal control armvar_no_brrw<-var(surv_prob_ctrl_nb)# post variance of S_C(t) without borrowingvar_brrw<-var(surv_prob_control)# post variance of S_C(t) with borrowingess<- n_int_ctrl* var_no_brrw/ var_brrw# effective sample sizeess#> [1] 120.0989

Movatterモバイル変換

Time-to-Event Outcome

Introduction

Data Description

Propensity Scores and Inverse Probability Weights

Approximate Inverse Probability Weighted Power Prior

Inverse Probability Weighted Robust Mixture Prior

Posterior Distributions

Posterior Summary Statistics

References