Movatterモバイル変換

Type:

Package

Title:

Estimating Marginal Effects with Prognostic Covariate Adjustment

Version:

1.1.0

Description:

Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025) <doi:10.48550/arXiv.2503.22284>.

License:

MIT + file LICENSE

URL:

https://novonordisk-opensource.github.io/postcard/,https://github.com/NovoNordisk-OpenSource/postcard

BugReports:

https://github.com/NovoNordisk-OpenSource/postcard/issues

Imports:

cli, Deriv, dplyr, earth, generics, gggrid, ggplot2, options,parsnip, rlang, rsample, scales, stats, stringr, tune, utils,workflowsets, xgboost, yardstick

Suggests:

knitr, LiblineaR, MASS, ranger, rmarkdown, testthat (≥3.0.0), vdiffr, withr

Config/testthat/edition:

Encoding:

UTF-8

RoxygenNote:

7.3.2

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-09-17 12:22:14 UTC; Mathias

Author:

Mathias Lerbech Jeppesen [aut, cre], Emilie Hojbjerre-Frandsen [aut], Novo Nordisk A/S [cph]

Maintainer:

Mathias Lerbech Jeppesen <mathiasljeppesen@outlook.com>

Repository:

CRAN

Date/Publication:

2025-09-17 12:50:02 UTC

postcard: Estimating Marginal Effects with Prognostic Covariate Adjustment

Description

logo

Author(s)

Maintainer: Mathias Lerbech Jeppesenmathiasljeppesen@outlook.com

Authors:

Emilie Hojbjerre-Frandsenehfd@novonordisk.com

Other contributors:

Novo Nordisk A/S [copyright holder]

Creates a list of learners

Description

This function creates a list of learners compatible with thelearnersargument offit_best_learner, which is used as the default argument.

Usage

default_learners()

Value

a namedlist of learners, where each element consists of a

model: Aparsnip model specification
grid: Adata.frame with columns corresponding to tuning parameters

Examples

default_learners()

Find the best learner in terms of RMSE among specified learners using cross validation

Description

Find the best learner in terms of RMSE among specified learners using cross validation

Usage

fit_best_learner(  preproc,  data,  cv_folds = 5,  learners = default_learners(),  verbose = options::opt("verbose"))

Arguments

preproc

A list (preferably named) with preprocessing objects:formulas, recipes, orworkflows::workflow_variables(). Passed toworkflowsets::workflow_set().

data

A data frame.

cv_folds

anumeric with the number of cross-validation folds used when fitting andevaluating models

learners

alist (preferably named) containing named lists of elementsmodel and optionallygrid. Themodel element should be aparsnipmodel specification, which is passed toworkflowsets::workflow_set as themodel argument, while thegrid element is passed as thegrid argumentofworkflowsets::option_add

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution (Defaults to2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

Details

Ensure data compatibility with the learners.

Value

a trainedworkflow

Examples

# Generate some synthetic 2-armed RCT data along with historical controlsn <- 100dat_rct <- glm_data(  Y ~ 1+2*x1+3*a,  x1 = rnorm(n, 2),  a = rbinom (n, 1, .5),  family = gaussian())dat_hist <- glm_data(  Y ~ 1+2*x1,  x1 = rnorm(n, 2),  family = gaussian())# Fit a learner to the historical control datalearners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    )  ))fit <- fit_best_learner(  preproc = list(mod = Y ~ .),  data = dat_hist,  learners = learners)# Use it fx. to predict the "control outcome" in the 2-armed RCTpredict(fit, new_data = dat_rct)

Generate data simulated from a GLM

Description

Provide a formula, variables and a family to generate a linear predictorusing the formula and provided variables before using the inverse link ofthe family to generate the GLM modelled mean, mu, which is then used tosimulate the response with this mean from the generating function accordingto the chosen family.

Usage

glm_data(formula, ..., family = gaussian(), family_args = NULL)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation.

...

adata.frame with columns corresponding to variables usedinformula, a namedlist of those variables, or individually providednamed arguments of variables

family

thefamily of the response. this can be acharacterstring naming a family function, a familyfunction or the result ofacall to a family function

family_args

a namedlist with values of arguments passed tofamily relevant⁠r<family_name>⁠ function for simulating the data

Value

adata.frame

Examples

# Generate a gaussian response from a single covariateglm_data(Y ~ 1+2*x1,                x1 = rnorm(10))# Generate a gaussian response from a single covariate with non-linear# effects. Specify that the response should have standard deviation sqrt(3)glm_data(Y ~ 1+2*log(x1),                x1 = runif(10, min = 1, max = 10),                family_args = list(sd = sqrt(3)))# Generate a negative binomial responseglm_data(Y ~ 1+2*x1-x2,                x1 = rnorm(10),                x2 = rgamma(10, shape = 2),                family = MASS::negative.binomial(2))# Provide variables as a list/data.frame and pass a link to the negative.binomial# functionglm_data(resp ~ 1+2*x1-x2,                data.frame(                  x1 = rnorm(10),                  x2 = rgamma(10, shape = 2)                ),                family = MASS::negative.binomial(2),                family_args = list(link = "identity"))

postcard Options

Description

Internally used, package-specific options. All options will prioritize R options() values, and fall back to environment variables if undefined. If neither the option nor the environment variable is set, a default value is used.

Arguments

verbose

Checking Option Values

Option values specific topostcard can beaccessed by passing the package name toenv.

options::opts(env = "postcard")options::opt(x, default, env = "postcard")

Options

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution

default:

option:

postcard.verbose

envvar:

R_POSTCARD_VERBOSE (evaluated if possible, raw string otherwise)

Power and sample size estimation for linear models

Description

variance_ancova provides a convenient function for estimating avariance to use for power and sample size approximation.

Thepower_gs andsamplesize_gs functions calculate the Guenther-Schoutenpower approximation for ANOVA or ANCOVA.The approximation is based in (Guenther WC. Sample Size Formulas for Normal Theory T Tests.The American Statistician. 1981;35(4):243–244) and (Schouten HJA. Sample size formula witha continuous outcome for unequal group sizes and unequal variances. Statistics in Medicine.1999;18(1):87–91).

The functionpower_nc calculates the power for ANOVA or ANCOVA based on thenon-centrality parameter and the exact t-distributions.

See more details about each function inDetails and in sections afterValue.

Usage

variance_ancova(formula, data, inflation = 1, deflation = 1, ...)power_gs(variance, ate, n, r = 1, margin = 0, alpha = 0.05, ...)samplesize_gs(variance, ate, r = 1, margin = 0, power = 0.9, alpha = 0.05, ...)power_nc(variance, df, ate, n, r = 1, margin = 0, alpha = 0.05, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description used instats::model.frame() to create adata.frame withresponse and covariates. This data.frame is used to estimate theR^2, which isthen used to find the variance. See more in details.

data

a data frame, list or environment (or objectcoercible byas.data.frame to a data frame),containing the variables informula. Neither a matrix nor anarray will be accepted.

inflation

anumeric to multiply the marginal variance of the response by.Default is1 which estimates the variance directly from data. Use values above1to obtain a more conservative estimate of the marginal response variance.

deflation

anumeric to multiply theR^2 by.Default is1 which means the estimate ofR^2 is unchanged. Use valuesbelow1 to obtain a more conservative estimate of the coefficient of determination.See details about howR^2 related to the estimation.

...

Not currently used. Here to accomodate implementation of... for therepeat_power_linear() function.

variance

anumeric variance to use for the approximation. See more detailsin documentation sections of each power approximating function.

ate

anumeric minimum effect size that we should be able to detect.

n

anumeric with number of participants in total.From this number of participants in the treatment group isn1=(r/(1+r))nand the control group isn0=(1/(1+r))n.

r

anumeric allocation ratior=n1/n0. For one-to-one randomisationr=1.

margin

anumeric superiority margin(for non-inferiority margin, a negative value can be provided).

alpha

anumeric significance level. The critical value used fortesting corresponds to using the significance level for a two-sided test. I.e.we use the quantile1-\alpha/2 of the null distribution as the critical value.

power

anumeric giving the desired power when calculating the sample size

df

anumeric degrees of freedom to use in the t-distribution.

Details

This details section provides information about relation between arguments tofunctions and the formulas described in sections below for each powerapproximation formula.

Note that all entities that carry the same name as an argument and in the formulawill not be mentioned below, as they are obviously linked (n, r, alpha)

ate:\beta_1-\beta_0
margin:\Delta_s
variance:\widehat{\sigma}^2(1-\widehat{R}^2)

Finding the`variance` to use for approximation

Thevariance_ancova function estimates\sigma^2(1-R^2) in data andreturns it as anumeric that can be passed directly as thevarianceinpower_gs. Corresponds to estimating the power from using anlm withthe sameformula as specified invariance_ancova.

The user can estimate thevariance any way they see fit.

Value

All functions return anumeric.variance_ancova returns anumeric witha variance estimated from data to used for power estimation and sample sizeestimation.power_xx andsamplesize_xx functions return anumeric withthe power or sample size approximation.

Guenther-Schouten power approximation

The estimation formula in the case of an ANCOVA model with multiple covariate adjustment is (see description for reference):

n=\frac{(1+r)^2}{r}\frac{(z_{1-\alpha/2}+z_{1-\beta})^2\widehat{\sigma}^2(1-\widehat{R}^2)}{(\beta_1-\beta_0-\Delta_s)^2}+\frac{(z_{1-\alpha/2})^2}{2}

where\widehat{R}^2= \frac{\widehat{\sigma}_{XY}^\top \widehat{\Sigma}_X^{-1}\widehat{\sigma}_{XY}}{\widehat{\sigma}^2},we denote by\widehat{\sigma^2} an estimate of the variance of the outcome,\widehat{\Sigma_X} and estimate of the covariance matrix of thecovariates, and\widehat{\sigma_{XY}} ap-dimensional column vector consisting ofan estimate of the covariancebetween the outcome variable and each covariate.In the univariate caseR^2 is replaced by\rho^2

Power approximation using non-centrality parameter

The prospective power estimations are based on (Kieser M. Methods and Applications of Sample Size Calculation and Recalculation in Clinical Trials. Springer; 2020).The ANOVA power is calculated based on the non-centrality parameter given as

nc =\sqrt{\frac{r}{(1+r)^2}\cdot n}\cdot\frac{\beta_1-\beta_0-\Delta_s}{\sigma},

where we denote by\sigma^2 the variance of the outcome, such that the power can be estimated as

1-\beta = 1 - F_{t,n-2,nc}\left(F_{t, n-2, 0}^{-1}(1-\alpha/2)\right).

The power of ANCOVA with univariate covariate adjustment and no interaction is calculated based on the non-centrality parameter given as

nc =\sqrt{\frac{rn}{(1+r)^2}}\frac{\beta_1-\beta_0-\Delta_s}{\sigma\sqrt{1-\rho^2}},

such that the power can be estimated as

1-\beta = 1 - F_{t,n-3,nc}\left(F_{t, n-3, 0}^{-1}(1-\alpha/2)\right).

The power of ANCOVA with either univariate covariate adjustment and interaction or multiple covariate adjustement with or without interaction is calculated based on the non-centrality parameter given as

nc =\frac{\beta_1-\beta_0-\Delta_s}{\sqrt{\left(\frac{1}{n_1}+\frac{1}{n_0} + X_d^\top\left((n-2)\Sigma_X\right)^{-1}X_d \right)\sigma^2\left(1-\widehat{R}^2\right)}}.

whereX_d = \left(\overline{X}_1^1-\overline{X}_0^1, \ldots, \overline{X}_1^p-\overline{X}_0^p\right)^\top,\widehat{R}^2= \frac{\widehat{\sigma}_{XY}^\top \widehat{\Sigma}_X^{-1}\widehat{\sigma}_{XY}}{\widehat{\sigma}^2},we denote by\widehat{\sigma^2} an estimate of the variance of the outcome,\widehat{\Sigma_X} and estimate of the covariance matrix of thecovariates, and\widehat{\sigma_{XY}} ap-dimensional column vector consisting ofan estimate of the covariancebetween the outcome variable and each covariate.Since we are in the case of randomized trials the expected difference between the covariatevalues between the to groups is 0. Furthermore, the elements of\Sigma_X^{-1} will be small, unless the variances are close to 0, or the covariates exhibit strong linear dependencies, so that the correlations are close to 1.These scenarios are excluded since they could lead to potentially serious problems regarding inference either way. These arguments are used by Zimmermann et. al(Zimmermann G, Kieser M, Bathke AC. Sample Size Calculation and Blinded Recalculation for Analysis of Covariance Models with Multiple Random Covariates. Journal of Biopharmaceutical Statistics. 2020;30(1):143–159.) to approximatethe non-centrality parameter as in the univariate case where\rho^2 is replaced byR^2.

Then the power for ANCOVA withd degrees of freedom can be estimated as

1-\beta = 1 - F_{t,d,nc}\left(F_{t, d,0), 0}^{-1}(1-\alpha/2)\right).

Examples

# Generate a data set to use as an examplen_train <- 1e3n_test <- 100dat_gaus <- glm_data(Y ~ 1+2*X1-X2+3*A,                X1 = rnorm(n_train),                X2 = rgamma(n_train, shape = 2),                A = rbinom(n_train, size = 1, prob = 0.5),                family = gaussian())# Approximate the power using no adjustment covariatesva_nocov <- var(dat_gaus$Y)power_gs(n = n_test, variance = va_nocov, ate = 1)# Approximate the power with a model adjusting for both variables in the# data generating process## First estimate the variance sigma^2 * (1-R^2)va_cov <- variance_ancova(Y ~ X1 + X2 + A, dat_gaus)## Then estimate the power using this variancepower_gs(n = n_test, variance = va_cov, ate = 1.8, margin = 1, r = 2)# Approximate the sample size needed to obtain 90% power with same model as# abovesamplesize_gs(  variance = va_cov, ate = 1.8, power = 0.9, margin = 1, r = 2)# No adjustment covariatespower_nc(n = n_test, variance = va_nocov, df = 199, ate = 1)# Adjusting for all covariates in data generating processpower_nc(n = n_test, variance = va_cov, df = 196, ate = 1.8, margin = 1, r = 2)

Power approximation for estimating marginal effects in GLMs

Description

The functions implements the algorithm for power estimation described inPowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025). See a bit more context in details andall details in reference.

Usage

power_marginaleffect(  response,  predictions,  target_effect,  exposure_prob,  var1 = NULL,  kappa1_squared = NULL,  estimand_fun = "ate",  estimand_fun_deriv0 = NULL,  estimand_fun_deriv1 = NULL,  inv_estimand_fun = NULL,  margin = estimand_fun(1, 1),  alpha = 0.05,  tolerance = 0.001,  verbose = options::opt("verbose"),  ...)

Arguments

response

a vector of modenumeric with the response variable fromcomparator participants

predictions

a vector of modenumeric with predictions of the response

target_effect

anumeric minimum effect size that we should be able to detect.See more in details.

exposure_prob

anumeric with the probability of being in"group 1" (rather than group 0). See more in details.

var1

anumeric variance of the potential outcome corresponding to group 1,or afunction with a single argument meant to obtainvar1 as a transformationof the variance of the potential outcome corresponding to group 0.See more in details.

kappa1_squared

anumeric mean-squared error from predicting potentialoutcome corresponding to group 1, or afunction with a single arguments meantto obtainkappa1_squared as a transformation of the MSE in group 0.See more in details.

estimand_fun

afunction with argumentspsi1 andpsi0 specifyingthe estimand. Alternative, specify "ate" or "rate_ratio" as acharacterto use one of the default estimand functions. Seemore details in the "Estimand" section ofrctglm.

estimand_fun_deriv0

afunction specifying the derivative ofestimand_fun wrt.psi0. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

estimand_fun_deriv1

afunction specifying the derivative ofestimand_fun wrt.psi1. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

inv_estimand_fun

(optional) afunction with argumentspsi0 andtarget_effect, soestimand_fun(psi1 = y, psi0 = x) = z andinv_estimand_fun(psi0 = x, target_effect = z) = y for all x, y, z.If left asNULL, the inverse will be found numerically.

margin

anumeric superiority margin. As a default, theestimand_funis evaluated with the same counterfactual meanspsi1 andpsi0, correspondingto a superiority margin assuming no difference (fx. 0 for ATE and 1 for rate ratio).

alpha

tolerance

passed toall.equal when comparing calculatedtarget_effectfrom derivations and giventarget_effect during numeric derivation ofinv_estimand_fun. Thus only relevant wheninv_estimand_fun isNULL.

verbose

...

arguments passed tostats::uniroot, which is used to find theinverse ofestimand_fun, wheninv_estimand_fun isNULL.

Details

The reference in the description shows in its "Prospective power" section aderivation of a variance bound

v_\infty^{\uparrow 2} = r_0'^{\, 2}\sigma_0^2+r_1'^{\, 2}\sigma_1^2+\pi_0\pi_1\left(\frac{|r_0'|\kappa_0}{\pi_0} +\frac{|r_1'|\kappa_1}{\pi_1} \right)^2

wherer_a' is the derivative of theestimand_fun with respect to\Psi_a,\sigma_a^2 is the variance of the potential outcome corresponding togroupa,\pi_a is the probability of being assigned to groupa,and\kappa_a is the expected mean-squared error when predicting thepotential outcome corresponding to groupa.

The variance bound is then used for calculating a lower bound of the power usingthe distributions corresponding to the null and alternative hypotheses\mathcal{H}_0: \hat{\Psi} \sim F_0 = \mathcal{N}(\Delta ,v_\infty^{\uparrow 2} / n)and\mathcal{H}_1: \hat{\Psi} \sim F_1 = \mathcal{N}(\Psi,v_\infty^{\uparrow 2} / n).See more details in the reference.

Relating arguments to formulas

response: Used to obtain both\sigma_0^2 (by taking the samplevariance of the response) and\kappa_0.
predictions: Used when calculating the MSE\kappa_0.
var1:\sigma_1^2. As a default, chosen to be the same as\sigma_0^2. Can specify differently through this argument fx. by
- Inflating or deflating the value of\sigma_0^2 by a scalar accordingto prior beliefs. Fx. specifyvar1 = function(x) 1.2 * x to inflate\sigma_0^2 by 1.2.
- If historical data is available for group 1, an estimate of the variancefrom that data can be provided here.
kappa1_squared:\kappa_1. Same as forvar1, default is to assumethe same value askappa0_squared, which is found by using theresponseandpredictions vectors. Adjust the value according to prior beliefs ifrelevant.
target_effect:\Psi.
exposure_prob:\pi_1

Value

anumeric with the estimated power.

Examples

# Generate a data set to use as an examplen <- 100exposure_prob <- .5dat_gaus <- glm_data(Y ~ 1+2*X1-X2+3*A+1.6*A*X2,                X1 = rnorm(n),                X2 = rgamma(n, shape = 2),                A = rbinom(n, size = 1, prob = exposure_prob),                family = gaussian())# ---------------------------------------------------------------------------# Obtain out-of-sample (OOS) prediction using glm model# ---------------------------------------------------------------------------gaus1 <- dat_gaus[1:(n/2), ]gaus2 <- dat_gaus[(n/2+1):n, ]glm1 <- glm(Y ~ X1 + X2 + A, data = gaus1)glm2 <- glm(Y ~ X1 + X2 + A, data = gaus2)preds_glm1 <- predict(glm2, newdata = gaus1, type = "response")preds_glm2 <- predict(glm1, newdata = gaus2, type = "response")preds_glm <- c(preds_glm1, preds_glm2)# Obtain powerpower_marginaleffect(  response = dat_gaus$Y,  predictions = preds_glm,  target_effect = 2,  exposure_prob = exposure_prob)# ---------------------------------------------------------------------------# Get OOS predictions using discrete super learner and adjust variance# ---------------------------------------------------------------------------learners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    ) ),    lm = list(      model = parsnip::set_engine(        parsnip::linear_reg(),        "lm"      )    ))lrnr1 <- fit_best_learner(preproc = list(mod = Y ~ X1 + X2 + A),                          data = gaus1,                          learners = learners)lrnr2 <- fit_best_learner(preproc = list(mod = Y ~ X1 + X2 + A),                          data = gaus2,                          learners = learners)preds_lrnr1 <- dplyr::pull(predict(lrnr2, new_data = gaus1))preds_lrnr2 <- dplyr::pull(predict(lrnr1, new_data = gaus2))preds_lrnr <- c(preds_lrnr1, preds_lrnr2)# Estimate the power AND adjust the assumed variance in the "unknown"# group 1 to be 20% larger than in group 0power_marginaleffect(  response = dat_gaus$Y,  predictions = preds_lrnr,  target_effect = 2,  exposure_prob = exposure_prob,  var1 = function(var0) 1.2 * var0)

Extract information about the fitted prognostic model

Description

Extracts theprognostic_info list element from anrctglm_prog object. See'Value' atrctglm_with_prognosticscore for more details.

Usage

prog(x)## S3 method for class 'rctglm_prog'prog(x)

Arguments

x

an object of classrctglm_prog (returned byrctglm_with_prognosticscore)

Value

a list with the structure described ofprognostic_info in theValue section ofrctglm_with_prognosticscore.

Examples

# Generate some datan <- 100b0 <- 1b1 <- 1.5b2 <- 2W1 <- runif(n, min = 1, max = 10)exposure_prob <- .5dat_treat <- glm_data(  Y ~ b0+b1*log(W1)+b2*A,  W1 = W1,  A = rbinom(n, 1, exposure_prob))dat_notreat <- glm_data(  Y ~ b0+b1*log(W1),  W1 = W1)learners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    )  ))ate <- rctglm_with_prognosticscore(  formula = Y ~ .,  exposure_indicator = A,  exposure_prob = exposure_prob,  data = dat_treat,  family = gaussian(),  estimand_fun = "ate",  data_hist = dat_notreat,  learners = learners)prog(ate)

Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation usinginfluence functions

Description

The procedure uses plug-in-estimation and influence functions to perform robust inference of any specifiedestimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect ofcovariates in randomisation groups. SeePowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025) for more details on methodology.

Usage

rctglm(  formula,  exposure_indicator,  exposure_prob,  data,  family = gaussian,  estimand_fun = "ate",  estimand_fun_deriv0 = NULL,  estimand_fun_deriv1 = NULL,  cv_variance = FALSE,  cv_variance_folds = 10,  verbose = options::opt("verbose"),  ...)

Arguments

formula

exposure_indicator

(name of) thebinary variable indata thatidentifies randomisation groups. The variable is required to be binary currently.

exposure_prob

anumeric with the probability of being in"group 1" (rather than group 0) in groups defined byexposure_indicator.

data

an optional data frame, list or environment (or object coercibleby as.data.frame to a data frame) containing the variables in the model. Ifnot found in data, the variables are taken from environment(formula), typicallythe environment from which the function is called.

family

a description of the error distribution and linkfunction to be used in the model. Forglm this can be acharacter string naming a family function, a family function or theresult of a call to a family function. Forglm.fit only thethird option is supported. (Seefamily for details offamily functions.)

estimand_fun

estimand_fun_deriv0

afunction specifying the derivative ofestimand_fun wrt.psi0. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

estimand_fun_deriv1

afunction specifying the derivative ofestimand_fun wrt.psi1. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

cv_variance

alogical determining whether to estimate the varianceusing cross-validation (see details ofrctglm).

cv_variance_folds

anumeric with the number of folds to use for crossvalidation ifcv_variance isTRUE.

verbose

...

Additional arguments passed tostats::glm()

Details

The procedure assumes the setup of a randomised clinical trial with observations grouped by a binaryexposure_indicator variable, allocated randomly with probabilityexposure_prob. A GLM isfit and then used to predict the response of all observations in the event that theexposure_indicatoris 0 and 1, respectively. Taking means of these predictions produce thecounterfactual meanspsi0 andpsi1, and an estimandr(psi0, psi1) is calculated using any specifiedestimand_fun.

The variance of the estimand is found by taking the variance of the influence function of the estimand.Ifcv_variance isTRUE, then the counterfactual predictions for each observation (which areused to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictionsusing cross validation with number of folds specified bycv_variance_folds. The cross validation splitsare performed using stratified sampling withexposure_indicator as thestrata argument inrsample::vfold_cv.

Value

rctglm returns an object of class inheriting from"rctglm".

An object of classrctglm is a list containing the following components:

estimand: Adata.frame with plug-in estimate of estimand, standarderror (SE) estimate and variance estimate of estimand
estimand_funs: Alist with
- f: Theestimand_fun used to obtain an estimate of the estimand from counterfactual means
- d0: The derivative with respect topsi0
- d1: The derivative with respect topsi1
means_counterfactual: Adata.frame with counterfactual meanspsi0 andpsi1
fitted.values_counterfactual: Adata.frame with counterfactual meanvalues, obtained by transforming the linear predictors for each groupby the inverse of the link function.
glm: Aglm object returned from runningstats::glm within the procedure
call: The matchedcall

Estimands

As noted in the description,psi0 andpsi1 are the counterfactual means found by prediction usinga fitted GLM in the binary groups defined byexposure_indicator.

Default estimand functions can be specified via"ate" (which uses the functionfunction(psi1, psi0) psi1-psi0) and"rate_ratio" (which uses the functionfunction(psi1, psi0) psi1/psi0). See more information on specifying theestimand_funinvignette("model-fit").

As a default, theDeriv package is used to perform symbolic differentiation to find the derivatives oftheestimand_fun.

Examples

# Generate some data to showcase examplen <- 100exp_prob <- .5dat_gaus <- glm_data(  Y ~ 1+1.5*X1+2*A,  X1 = rnorm(n),  A = rbinom(n, 1, exp_prob),  family = gaussian())# Fit the modelate <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exp_prob,              data = dat_gaus,              family = gaussian)# Pull information on estimandestimand(ate)## Another example with different family and specification of estimand_fundat_binom <- glm_data(  Y ~ 1+1.5*X1+2*A,  X1 = rnorm(n),  A = rbinom(n, 1, exp_prob),  family = binomial())rr <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exp_prob,              data = dat_binom,              family = binomial(),              estimand_fun = "rate_ratio")odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))or <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exp_prob,              data = dat_binom,              family = binomial,              estimand_fun = odds_ratio)

Methods for objects of class`rctglm`

Description

Methods mostly to extract information from model fit and inference. Seedetails for more information on each method.

Usage

estimand(object)## S3 method for class 'rctglm'estimand(object)est(object)## S3 method for class 'rctglm'coef(object, ...)## S3 method for class 'rctglm'predict(object, ...)## S3 method for class 'rctglm'print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

object

an object of classrctglm

...

additional arguments passed to methods

x

an object of classrctglm

digits

anumeric with the number of digits to display when printing

Details

The functionestimand (or short-hand versionest) can be used to extractadata.frame with an estimated value and standard error of the estimand.

A method for the genericcoef() has been added for classrctglm, whichextracts coefficient information from the underlyingglm fitin the procedure. The same is true for the method defined for thepredict() generic.The method for anrctglm class object callspredict.glm() on theglm fitcontained within anrctglm object.

Value

estimand/est returns adata.frame with columnsEstimate and⁠Std. Error⁠ with the estimate and standard error of the estimand.

Seecoef() andpredict.glm() for details on what is returned by the correspondingmethods.

Examples

# Generate some data to showcase examplen <- 100exposure_prob <- .5dat_binom <- glm_data(  Y ~ 1+1.5*X1+2*A,  X1 = rnorm(n),  A = rbinom(n, 1, exposure_prob),  family = binomial())# Fit the modelate <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exposure_prob,              data = dat_binom,              family = binomial,              estimand_fun = "ate")print(ate)estimand(ate)coef(ate)

Use prognostic covariate adjustment when fitting anrctglm

Description

The procedure usesfit_best_learner to fit a prognostic model to historical data and usesthe model to produce counterfactual predictions as a prognostic score that is then adjustedfor as a covariate in therctglm procedure. SeePowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025) for more details.

Usage

rctglm_with_prognosticscore(  formula,  exposure_indicator,  exposure_prob,  data,  family = gaussian,  estimand_fun = "ate",  estimand_fun_deriv0 = NULL,  estimand_fun_deriv1 = NULL,  cv_variance = FALSE,  cv_variance_folds = 10,  ...,  data_hist,  prog_formula = NULL,  cv_prog_folds = 5,  learners = default_learners(),  verbose = options::opt("verbose"))

Arguments

formula

exposure_indicator

(name of) thebinary variable indata thatidentifies randomisation groups. The variable is required to be binary currently.

exposure_prob

anumeric with the probability of being in"group 1" (rather than group 0) in groups defined byexposure_indicator.

data

family

estimand_fun

estimand_fun_deriv0

afunction specifying the derivative ofestimand_fun wrt.psi0. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

estimand_fun_deriv1

afunction specifying the derivative ofestimand_fun wrt.psi1. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

cv_variance

alogical determining whether to estimate the varianceusing cross-validation (see details ofrctglm).

cv_variance_folds

anumeric with the number of folds to use for crossvalidation ifcv_variance isTRUE.

...

Additional arguments passed tostats::glm()

data_hist

adata.frame with historical data on which to fit a prognostic model

prog_formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description of the prognostic model to be fitted todata_hist. Passed in alist as thepreproc argument infit_best_learner(). As a default,the formula is created by modelling the response (assumed to have the same name as informula) using all columns indata_hist.

cv_prog_folds

anumeric with the number of cross-validation folds used when fitting andevaluating models

learners

verbose

Details

Prognostic covariate adjustment involves training a prognostic model (usingfit_best_learner) on historical data (data_hist) to predict the responsein that data.

Assuming that thehistorical data is representative of the comparator group in a “new” dataset (group 0 of the binaryexposure_indicator indata), we can use theprognostic model to predict the counterfactualoutcome of all observations (including the ones in the comparator groupfor which the prediction of counterfactual outcome coincides with aprediction of actual outcome).

This prediction, which is called the prognostic score, is then used as anadjustment covariate in the GLM (the prognostic score is added toformulabefore callingrctglm with the modified formula).

See much more details in the reference in the description.

Value

rctglm_with_prognosticscore returns an object of classrctglm_prog,which inherits fromrctglm.

Anrctglm_prog object is a list with the same components as anrctglm object(see theValue section ofrctglm for a breakdown of the structure),but with an additional list element of:

prognostic_info: List with information about the fitted prognostic modelon historical data. It has components:
- formula: Theformula with symbolic description of how the responseis modelled as function of covariates in the models
- model_fit: A trainedworkflow - the result offit_best_learner
- learners: Alist of learners used for the discrete super learner
- cv_folds: The amount of folds used for cross validation
- data: The historical data used for cross validation when fitting andtesting models

Examples

# Generate some datan <- 100b0 <- 1b1 <- 1.5b2 <- 2W1 <- runif(n, min = 1, max = 10)exp_prob <- .5dat_treat <- glm_data(  Y ~ b0+b1*log(W1)+b2*A,  W1 = W1,  A = rbinom (n, 1, exp_prob))dat_notreat <- glm_data(  Y ~ b0+b1*log(W1),  W1 = W1)learners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    ) ),    lm = list(      model = parsnip::set_engine(        parsnip::linear_reg(),        "lm"      )    ))ate <- rctglm_with_prognosticscore(  formula = Y ~ .,  exposure_indicator = A,  exposure_prob = exp_prob,  data = dat_treat,  family = gaussian(),  estimand_fun = "ate",  data_hist = dat_notreat,  learners = learners)# Pull information on estimandestimand(ate)

Create data and plot power curves calculated using functions in`power_linear()` for a list of formulas/models

Description

Estimate a variance for power approximation usingvariance_ancova() for each formulainformula_list ontrain_data. Then calculate power using the function with namespecified inpower_fun across a number of sample sizesns for an assumed averagetreatment effect ofate.

Usage

repeat_power_linear(  ate,  formula_list,  train_data,  power_fun = c("power_gs", "power_nc"),  ns = 10:400,  desired_power = 0.9,  ...)## S3 method for class 'postcard_rpl'plot(x, cols = NULL, ...)

Arguments

ate

Passed topower_gs() orpower_nc()

formula_list

a namedlist of formulas that are element wise passed tovariance_ancova()

train_data

Passed as thedata argument invariance_ancova()

power_fun

acharacter string with value"power_gs" or"power_nc",specifying what function in thepower_linear() topic to use

ns

anumeric vector of sample sizes

desired_power

anumeric between 0 and 1 indicating the desired power level

...

Arguments passed tovariance_ancova() andpower_gs() orpower_nc()

x

an object of classpostcard_rpl created byrepeat_power_linear()

cols

a (potentially named)character vector of colors for the different modelsinformula_list

Value

repeat_power_linear returns an object of classpostcard_rpl, which isjust adata.frame with aplot method defined. Theplot method returns aggplot2 object.

Examples

train_data <- glm_data(  Y ~ 1+1.5*log(W)+2*X,  W = runif(1e3, min = 1, max = 10),  X = rnorm(1e3, sd = 3))rpl <- repeat_power_linear(  ate = 0.5,  formula_list = list("ANCOVA 1 covariate" = Y ~ X, "ANCOVA 2 covariates" = Y ~ W + X),  train_data = train_data)rpl_nc <- repeat_power_linear(  ate = 0.5,  formula_list = list("ANCOVA 1 covariate" = Y ~ X, "ANCOVA 2 covariates" = Y ~ W + X),  train_data = train_data,  power_fun = "power_nc",  df = 1e3-3,  deflation = 0.95,  margin = -0.2,  r = 2)## Not run: plot(rpl)plot(rpl_nc)## End(Not run)

Create data and plot power curves calculated using`power_marginaleffect()` for a list of models

Description

Iterate a process of simulating test data fromtest_data_fun, making predictionsusing models inmodel_list, and calculating power usingpower_marginaleffect()across a number of sample sizesns and iterationsn_iter. The results are averagedand used to create a plot of the resulting power curves.

Usage

repeat_power_marginaleffect(  target_effect,  exposure_prob,  model_list = default_power_model_list(),  test_data_fun = function(n) {     glm_data(Y ~ 1 + 3 * log(W), W = stats::runif(n, min    = 1, max = 50)) },  ns = seq(10, 300, 10),  desired_power = 0.9,  n_iter = 1,  ...)## S3 method for class 'postcard_rpm'plot(x, cols = NULL, ...)

Arguments

target_effect

Passed topower_marginaleffect()

exposure_prob

Passed topower_marginaleffect()

model_list

a namedlist of models used to get predictions on generated testdata sets that are then passed topower_marginaleffect() aspredictions. Theelements ofmodel_list need to have an existingpredict() method. The default isan ANCOVA and a prognostic model fitted withfit_best_learner() to a simple data setof 1000 observations generated with a non-linear effect of a single covariate usingglm_data().

test_data_fun

afunction with a single argumentn that generates testdata sets for the sample sizesns specified. The default generates data usingglm_data() with the same data generating process as the trainingdata used to fit the default models inmodel_list.

ns

anumeric vector of sample sizes

desired_power

anumeric between 0 and 1 indicating the desired power level

n_iter

anumeric indicating a number of iterations to process and average over

...

additional arguments passed topower_marginaleffect()

x

an object of classpostcard_rpm created byrepeat_power_marginaleffect()

cols

a (potentially named)character vector of colors for the different modelsinmodel_list

Value

repeat_power_marginal returns an object of classpostcard_rpm, which isjust adata.frame with aplot method defined. Theplot method returns aggplot2 object.

Examples

# Note everything is wrapped in dontrun to avoid long runtimes of examples (tests are# still in place). Reduce the number of sample sizes and/or iterations to avoid long# runtimes## Not run: # A simple use case with default models and test data (we run only with a few sample# sizes to reduce runtime of examples)rpm <- repeat_power_marginaleffect(  target_effect = 0.9,  exposure_prob = 0.5)plot(rpm)################################# Create model from a poisson family and estimate the power of rate ratio with# several arguments passed to power_marginaleffect################################b1 <- 0.9b2 <- 0.2b3 <- -0.4b4 <- -0.6train_pois <- glm_data(  Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3,  X1 = runif(1e3, min = 1, max = 10),  X2 = rnorm(1e3),  X3 = rgamma(1e3, shape = 1),  family = poisson())# Define models to compare fit to training dataancova_prog_list <- list(ANCOVA = glm(Y ~ X1 + X2 + X3, data = train_pois, family = poisson),"ANCOVA with prognostic score" = fit_best_learner(list(mod = Y ~ X1 + X2 + X3), data = train_pois))# Create a function that produces data to predict ontest_pois_fun <- function(n) { glm_data(   Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3,   X1 = runif(n, min = 1, max = 10),   X2 = rnorm(n),   X3 = rgamma(n, shape = 1),   family = poisson() )}# Specify a bunch of different arguments that are passed to power_marginaleffect()## Run for 2 sample sizes to reduce runtimerpm_rr <- repeat_power_marginaleffect(  model_list = ancova_prog_list,  test_data_fun = test_pois_fun,  ns = seq(100, 200), n_iter = 1,  var1 = function(var0) 1.1 * var0,  kappa1_squared = function(kap0) 1.1 * kap0,  estimand_fun = "rate_ratio",  target_effect = 1.4,  exposure_prob = 1/2,  margin = 0.8)plot(rpm_rr2)## End(Not run)

Movatterモバイル変換

postcard: Estimating Marginal Effects with Prognostic Covariate Adjustment

Description

Author(s)

See Also

Creates a list of learners

Description

Usage

Value

Examples

Find the best learner in terms of RMSE among specified learners using cross validation

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate data simulated from a GLM

Description

Usage

Arguments

Value

Examples

postcard Options

Description

Arguments

Checking Option Values

Options

See Also

Power and sample size estimation for linear models

Description

Usage

Arguments

Details

Finding thevariance to use for approximation

Value

Guenther-Schouten power approximation

Power approximation using non-centrality parameter

See Also

Examples

Power approximation for estimating marginal effects in GLMs

Description

Usage

Arguments

Details

Relating arguments to formulas

Value

See Also

Examples

Extract information about the fitted prognostic model

Description

Usage

Arguments

Value

See Also

Examples

Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation usinginfluence functions

Description

Usage

Arguments

Details

Value

Estimands

See Also

Examples

Methods for objects of classrctglm

Description

Usage

Arguments

Details

Value

See Also

Examples

Use prognostic covariate adjustment when fitting anrctglm

Description

Usage

Arguments

Details

Value

Finding the`variance` to use for approximation

Methods for objects of class`rctglm`

Create data and plot power curves calculated using functions in`power_linear()` for a list of formulas/models

Create data and plot power curves calculated using`power_marginaleffect()` for a list of models