Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Estimating Marginal Effects with Prognostic Covariate Adjustment
Version:1.1.0
Description:Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025) <doi:10.48550/arXiv.2503.22284>.
License:MIT + file LICENSE
URL:https://novonordisk-opensource.github.io/postcard/,https://github.com/NovoNordisk-OpenSource/postcard
BugReports:https://github.com/NovoNordisk-OpenSource/postcard/issues
Imports:cli, Deriv, dplyr, earth, generics, gggrid, ggplot2, options,parsnip, rlang, rsample, scales, stats, stringr, tune, utils,workflowsets, xgboost, yardstick
Suggests:knitr, LiblineaR, MASS, ranger, rmarkdown, testthat (≥3.0.0), vdiffr, withr
Config/testthat/edition:3
Encoding:UTF-8
RoxygenNote:7.3.2
VignetteBuilder:knitr
NeedsCompilation:no
Packaged:2025-09-17 12:22:14 UTC; Mathias
Author:Mathias Lerbech Jeppesen [aut, cre], Emilie Hojbjerre-Frandsen [aut], Novo Nordisk A/S [cph]
Maintainer:Mathias Lerbech Jeppesen <mathiasljeppesen@outlook.com>
Repository:CRAN
Date/Publication:2025-09-17 12:50:02 UTC

postcard: Estimating Marginal Effects with Prognostic Covariate Adjustment

Description

logo

Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025)doi:10.48550/arXiv.2503.22284.

Author(s)

Maintainer: Mathias Lerbech Jeppesenmathiasljeppesen@outlook.com

Authors:

Other contributors:

See Also

Useful links:


Creates a list of learners

Description

This function creates a list of learners compatible with thelearnersargument offit_best_learner, which is used as the default argument.

Usage

default_learners()

Value

a namedlist of learners, where each element consists of a

Examples

default_learners()

Find the best learner in terms of RMSE among specified learners using cross validation

Description

Find the best learner in terms of RMSE among specified learners using cross validation

Usage

fit_best_learner(  preproc,  data,  cv_folds = 5,  learners = default_learners(),  verbose = options::opt("verbose"))

Arguments

preproc

A list (preferably named) with preprocessing objects:formulas, recipes, orworkflows::workflow_variables(). Passed toworkflowsets::workflow_set().

data

A data frame.

cv_folds

anumeric with the number of cross-validation folds used when fitting andevaluating models

learners

alist (preferably named) containing named lists of elementsmodel and optionallygrid. Themodel element should be aparsnipmodel specification, which is passed toworkflowsets::workflow_set as themodel argument, while thegrid element is passed as thegrid argumentofworkflowsets::option_add

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution (Defaults to2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

Details

Ensure data compatibility with the learners.

Value

a trainedworkflow

See Also

Seerctglm_with_prognosticscore() for a function that utilises thisfunction to perform prognostic covariate adjustment.

Examples

# Generate some synthetic 2-armed RCT data along with historical controlsn <- 100dat_rct <- glm_data(  Y ~ 1+2*x1+3*a,  x1 = rnorm(n, 2),  a = rbinom (n, 1, .5),  family = gaussian())dat_hist <- glm_data(  Y ~ 1+2*x1,  x1 = rnorm(n, 2),  family = gaussian())# Fit a learner to the historical control datalearners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    )  ))fit <- fit_best_learner(  preproc = list(mod = Y ~ .),  data = dat_hist,  learners = learners)# Use it fx. to predict the "control outcome" in the 2-armed RCTpredict(fit, new_data = dat_rct)

Generate data simulated from a GLM

Description

Provide a formula, variables and a family to generate a linear predictorusing the formula and provided variables before using the inverse link ofthe family to generate the GLM modelled mean, mu, which is then used tosimulate the response with this mean from the generating function accordingto the chosen family.

Usage

glm_data(formula, ..., family = gaussian(), family_args = NULL)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation.

...

adata.frame with columns corresponding to variables usedinformula, a namedlist of those variables, or individually providednamed arguments of variables

family

thefamily of the response. this can be acharacterstring naming a family function, a familyfunction or the result ofacall to a family function

family_args

a namedlist with values of arguments passed tofamily relevant⁠r<family_name>⁠ function for simulating the data

Value

adata.frame

Examples

# Generate a gaussian response from a single covariateglm_data(Y ~ 1+2*x1,                x1 = rnorm(10))# Generate a gaussian response from a single covariate with non-linear# effects. Specify that the response should have standard deviation sqrt(3)glm_data(Y ~ 1+2*log(x1),                x1 = runif(10, min = 1, max = 10),                family_args = list(sd = sqrt(3)))# Generate a negative binomial responseglm_data(Y ~ 1+2*x1-x2,                x1 = rnorm(10),                x2 = rgamma(10, shape = 2),                family = MASS::negative.binomial(2))# Provide variables as a list/data.frame and pass a link to the negative.binomial# functionglm_data(resp ~ 1+2*x1-x2,                data.frame(                  x1 = rnorm(10),                  x2 = rgamma(10, shape = 2)                ),                family = MASS::negative.binomial(2),                family_args = list(link = "identity"))

postcard Options

Description

Internally used, package-specific options. All options will prioritize R options() values, and fall back to environment variables if undefined. If neither the option nor the environment variable is set, a default value is used.

Arguments

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution (Defaults to2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

Checking Option Values

Option values specific topostcard can beaccessed by passing the package name toenv.

options::opts(env = "postcard")options::opt(x, default, env = "postcard")

Options

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution

default:
2
option:

postcard.verbose

envvar:

R_POSTCARD_VERBOSE (evaluated if possible, raw string otherwise)

See Also

options getOption Sys.setenv Sys.getenv


Power and sample size estimation for linear models

Description

variance_ancova provides a convenient function for estimating avariance to use for power and sample size approximation.

Thepower_gs andsamplesize_gs functions calculate the Guenther-Schoutenpower approximation for ANOVA or ANCOVA.The approximation is based in (Guenther WC. Sample Size Formulas for Normal Theory T Tests.The American Statistician. 1981;35(4):243–244) and (Schouten HJA. Sample size formula witha continuous outcome for unequal group sizes and unequal variances. Statistics in Medicine.1999;18(1):87–91).

The functionpower_nc calculates the power for ANOVA or ANCOVA based on thenon-centrality parameter and the exact t-distributions.

See more details about each function inDetails and in sections afterValue.

Usage

variance_ancova(formula, data, inflation = 1, deflation = 1, ...)power_gs(variance, ate, n, r = 1, margin = 0, alpha = 0.05, ...)samplesize_gs(variance, ate, r = 1, margin = 0, power = 0.9, alpha = 0.05, ...)power_nc(variance, df, ate, n, r = 1, margin = 0, alpha = 0.05, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description used instats::model.frame() to create adata.frame withresponse and covariates. This data.frame is used to estimate theR^2, which isthen used to find the variance. See more in details.

data

a data frame, list or environment (or objectcoercible byas.data.frame to a data frame),containing the variables informula. Neither a matrix nor anarray will be accepted.

inflation

anumeric to multiply the marginal variance of the response by.Default is1 which estimates the variance directly from data. Use values above1to obtain a more conservative estimate of the marginal response variance.

deflation

anumeric to multiply theR^2 by.Default is1 which means the estimate ofR^2 is unchanged. Use valuesbelow1 to obtain a more conservative estimate of the coefficient of determination.See details about howR^2 related to the estimation.

...

Not currently used. Here to accomodate implementation of... for therepeat_power_linear() function.

variance

anumeric variance to use for the approximation. See more detailsin documentation sections of each power approximating function.

ate

anumeric minimum effect size that we should be able to detect.

n

anumeric with number of participants in total.From this number of participants in the treatment group isn1=(r/(1+r))nand the control group isn0=(1/(1+r))n.

r

anumeric allocation ratior=n1/n0. For one-to-one randomisationr=1.

margin

anumeric superiority margin(for non-inferiority margin, a negative value can be provided).

alpha

anumeric significance level. The critical value used fortesting corresponds to using the significance level for a two-sided test. I.e.we use the quantile1-\alpha/2 of the null distribution as the critical value.

power

anumeric giving the desired power when calculating the sample size

df

anumeric degrees of freedom to use in the t-distribution.

Details

This details section provides information about relation between arguments tofunctions and the formulas described in sections below for each powerapproximation formula.

Note that all entities that carry the same name as an argument and in the formulawill not be mentioned below, as they are obviously linked (n, r, alpha)

Finding thevariance to use for approximation

Thevariance_ancova function estimates\sigma^2(1-R^2) in data andreturns it as anumeric that can be passed directly as thevarianceinpower_gs. Corresponds to estimating the power from using anlm withthe sameformula as specified invariance_ancova.

The user can estimate thevariance any way they see fit.

Value

All functions return anumeric.variance_ancova returns anumeric witha variance estimated from data to used for power estimation and sample sizeestimation.power_xx andsamplesize_xx functions return anumeric withthe power or sample size approximation.

Guenther-Schouten power approximation

The estimation formula in the case of an ANCOVA model with multiple covariate adjustment is (see description for reference):

n=\frac{(1+r)^2}{r}\frac{(z_{1-\alpha/2}+z_{1-\beta})^2\widehat{\sigma}^2(1-\widehat{R}^2)}{(\beta_1-\beta_0-\Delta_s)^2}+\frac{(z_{1-\alpha/2})^2}{2}

where\widehat{R}^2= \frac{\widehat{\sigma}_{XY}^\top \widehat{\Sigma}_X^{-1}\widehat{\sigma}_{XY}}{\widehat{\sigma}^2},we denote by\widehat{\sigma^2} an estimate of the variance of the outcome,\widehat{\Sigma_X} and estimate of the covariance matrix of thecovariates, and\widehat{\sigma_{XY}} ap-dimensional column vector consisting ofan estimate of the covariancebetween the outcome variable and each covariate.In the univariate caseR^2 is replaced by\rho^2

Power approximation using non-centrality parameter

The prospective power estimations are based on (Kieser M. Methods and Applications of Sample Size Calculation and Recalculation in Clinical Trials. Springer; 2020).The ANOVA power is calculated based on the non-centrality parameter given as

nc =\sqrt{\frac{r}{(1+r)^2}\cdot n}\cdot\frac{\beta_1-\beta_0-\Delta_s}{\sigma},

where we denote by\sigma^2 the variance of the outcome, such that the power can be estimated as

1-\beta = 1 - F_{t,n-2,nc}\left(F_{t, n-2, 0}^{-1}(1-\alpha/2)\right).

The power of ANCOVA with univariate covariate adjustment and no interaction is calculated based on the non-centrality parameter given as

nc =\sqrt{\frac{rn}{(1+r)^2}}\frac{\beta_1-\beta_0-\Delta_s}{\sigma\sqrt{1-\rho^2}},

such that the power can be estimated as

1-\beta = 1 - F_{t,n-3,nc}\left(F_{t, n-3, 0}^{-1}(1-\alpha/2)\right).

The power of ANCOVA with either univariate covariate adjustment and interaction or multiple covariate adjustement with or without interaction is calculated based on the non-centrality parameter given as

nc =\frac{\beta_1-\beta_0-\Delta_s}{\sqrt{\left(\frac{1}{n_1}+\frac{1}{n_0} + X_d^\top\left((n-2)\Sigma_X\right)^{-1}X_d \right)\sigma^2\left(1-\widehat{R}^2\right)}}.

whereX_d = \left(\overline{X}_1^1-\overline{X}_0^1, \ldots, \overline{X}_1^p-\overline{X}_0^p\right)^\top,\widehat{R}^2= \frac{\widehat{\sigma}_{XY}^\top \widehat{\Sigma}_X^{-1}\widehat{\sigma}_{XY}}{\widehat{\sigma}^2},we denote by\widehat{\sigma^2} an estimate of the variance of the outcome,\widehat{\Sigma_X} and estimate of the covariance matrix of thecovariates, and\widehat{\sigma_{XY}} ap-dimensional column vector consisting ofan estimate of the covariancebetween the outcome variable and each covariate.Since we are in the case of randomized trials the expected difference between the covariatevalues between the to groups is 0. Furthermore, the elements of\Sigma_X^{-1} will be small, unless the variances are close to 0, or the covariates exhibit strong linear dependencies, so that the correlations are close to 1.These scenarios are excluded since they could lead to potentially serious problems regarding inference either way. These arguments are used by Zimmermann et. al(Zimmermann G, Kieser M, Bathke AC. Sample Size Calculation and Blinded Recalculation for Analysis of Covariance Models with Multiple Random Covariates. Journal of Biopharmaceutical Statistics. 2020;30(1):143–159.) to approximatethe non-centrality parameter as in the univariate case where\rho^2 is replaced byR^2.

Then the power for ANCOVA withd degrees of freedom can be estimated as

1-\beta = 1 - F_{t,d,nc}\left(F_{t, d,0), 0}^{-1}(1-\alpha/2)\right).

See Also

Seepower_marginaleffect for a power approximation function that works for a largerclass of models.

Examples

# Generate a data set to use as an examplen_train <- 1e3n_test <- 100dat_gaus <- glm_data(Y ~ 1+2*X1-X2+3*A,                X1 = rnorm(n_train),                X2 = rgamma(n_train, shape = 2),                A = rbinom(n_train, size = 1, prob = 0.5),                family = gaussian())# Approximate the power using no adjustment covariatesva_nocov <- var(dat_gaus$Y)power_gs(n = n_test, variance = va_nocov, ate = 1)# Approximate the power with a model adjusting for both variables in the# data generating process## First estimate the variance sigma^2 * (1-R^2)va_cov <- variance_ancova(Y ~ X1 + X2 + A, dat_gaus)## Then estimate the power using this variancepower_gs(n = n_test, variance = va_cov, ate = 1.8, margin = 1, r = 2)# Approximate the sample size needed to obtain 90% power with same model as# abovesamplesize_gs(  variance = va_cov, ate = 1.8, power = 0.9, margin = 1, r = 2)# No adjustment covariatespower_nc(n = n_test, variance = va_nocov, df = 199, ate = 1)# Adjusting for all covariates in data generating processpower_nc(n = n_test, variance = va_cov, df = 196, ate = 1.8, margin = 1, r = 2)

Power approximation for estimating marginal effects in GLMs

Description

The functions implements the algorithm for power estimation described inPowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025). See a bit more context in details andall details in reference.

Usage

power_marginaleffect(  response,  predictions,  target_effect,  exposure_prob,  var1 = NULL,  kappa1_squared = NULL,  estimand_fun = "ate",  estimand_fun_deriv0 = NULL,  estimand_fun_deriv1 = NULL,  inv_estimand_fun = NULL,  margin = estimand_fun(1, 1),  alpha = 0.05,  tolerance = 0.001,  verbose = options::opt("verbose"),  ...)

Arguments

response

a vector of modenumeric with the response variable fromcomparator participants

predictions

a vector of modenumeric with predictions of the response

target_effect

anumeric minimum effect size that we should be able to detect.See more in details.

exposure_prob

anumeric with the probability of being in"group 1" (rather than group 0). See more in details.

var1

anumeric variance of the potential outcome corresponding to group 1,or afunction with a single argument meant to obtainvar1 as a transformationof the variance of the potential outcome corresponding to group 0.See more in details.

kappa1_squared

anumeric mean-squared error from predicting potentialoutcome corresponding to group 1, or afunction with a single arguments meantto obtainkappa1_squared as a transformation of the MSE in group 0.See more in details.

estimand_fun

afunction with argumentspsi1 andpsi0 specifyingthe estimand. Alternative, specify "ate" or "rate_ratio" as acharacterto use one of the default estimand functions. Seemore details in the "Estimand" section ofrctglm.

estimand_fun_deriv0

afunction specifying the derivative ofestimand_fun wrt.psi0. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

estimand_fun_deriv1

afunction specifying the derivative ofestimand_fun wrt.psi1. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

inv_estimand_fun

(optional) afunction with argumentspsi0 andtarget_effect, soestimand_fun(psi1 = y, psi0 = x) = z andinv_estimand_fun(psi0 = x, target_effect = z) = y for all x, y, z.If left asNULL, the inverse will be found numerically.

margin

anumeric superiority margin. As a default, theestimand_funis evaluated with the same counterfactual meanspsi1 andpsi0, correspondingto a superiority margin assuming no difference (fx. 0 for ATE and 1 for rate ratio).

alpha

anumeric significance level. The critical value used fortesting corresponds to using the significance level for a two-sided test. I.e.we use the quantile1-\alpha/2 of the null distribution as the critical value.

tolerance

passed toall.equal when comparing calculatedtarget_effectfrom derivations and giventarget_effect during numeric derivation ofinv_estimand_fun. Thus only relevant wheninv_estimand_fun isNULL.

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution (Defaults to2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

...

arguments passed tostats::uniroot, which is used to find theinverse ofestimand_fun, wheninv_estimand_fun isNULL.

Details

The reference in the description shows in its "Prospective power" section aderivation of a variance bound

v_\infty^{\uparrow 2} = r_0'^{\, 2}\sigma_0^2+r_1'^{\, 2}\sigma_1^2+\pi_0\pi_1\left(\frac{|r_0'|\kappa_0}{\pi_0} +\frac{|r_1'|\kappa_1}{\pi_1} \right)^2

wherer_a' is the derivative of theestimand_fun with respect to\Psi_a,\sigma_a^2 is the variance of the potential outcome corresponding togroupa,\pi_a is the probability of being assigned to groupa,and\kappa_a is the expected mean-squared error when predicting thepotential outcome corresponding to groupa.

The variance bound is then used for calculating a lower bound of the power usingthe distributions corresponding to the null and alternative hypotheses\mathcal{H}_0: \hat{\Psi} \sim F_0 = \mathcal{N}(\Delta ,v_\infty^{\uparrow 2} / n)and\mathcal{H}_1: \hat{\Psi} \sim F_1 = \mathcal{N}(\Psi,v_\infty^{\uparrow 2} / n).See more details in the reference.

Relating arguments to formulas

Value

anumeric with the estimated power.

See Also

Seepower_linear for power approximation functionalities specific to linear models.

Examples

# Generate a data set to use as an examplen <- 100exposure_prob <- .5dat_gaus <- glm_data(Y ~ 1+2*X1-X2+3*A+1.6*A*X2,                X1 = rnorm(n),                X2 = rgamma(n, shape = 2),                A = rbinom(n, size = 1, prob = exposure_prob),                family = gaussian())# ---------------------------------------------------------------------------# Obtain out-of-sample (OOS) prediction using glm model# ---------------------------------------------------------------------------gaus1 <- dat_gaus[1:(n/2), ]gaus2 <- dat_gaus[(n/2+1):n, ]glm1 <- glm(Y ~ X1 + X2 + A, data = gaus1)glm2 <- glm(Y ~ X1 + X2 + A, data = gaus2)preds_glm1 <- predict(glm2, newdata = gaus1, type = "response")preds_glm2 <- predict(glm1, newdata = gaus2, type = "response")preds_glm <- c(preds_glm1, preds_glm2)# Obtain powerpower_marginaleffect(  response = dat_gaus$Y,  predictions = preds_glm,  target_effect = 2,  exposure_prob = exposure_prob)# ---------------------------------------------------------------------------# Get OOS predictions using discrete super learner and adjust variance# ---------------------------------------------------------------------------learners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    ) ),    lm = list(      model = parsnip::set_engine(        parsnip::linear_reg(),        "lm"      )    ))lrnr1 <- fit_best_learner(preproc = list(mod = Y ~ X1 + X2 + A),                          data = gaus1,                          learners = learners)lrnr2 <- fit_best_learner(preproc = list(mod = Y ~ X1 + X2 + A),                          data = gaus2,                          learners = learners)preds_lrnr1 <- dplyr::pull(predict(lrnr2, new_data = gaus1))preds_lrnr2 <- dplyr::pull(predict(lrnr1, new_data = gaus2))preds_lrnr <- c(preds_lrnr1, preds_lrnr2)# Estimate the power AND adjust the assumed variance in the "unknown"# group 1 to be 20% larger than in group 0power_marginaleffect(  response = dat_gaus$Y,  predictions = preds_lrnr,  target_effect = 2,  exposure_prob = exposure_prob,  var1 = function(var0) 1.2 * var0)

Extract information about the fitted prognostic model

Description

Extracts theprognostic_info list element from anrctglm_prog object. See'Value' atrctglm_with_prognosticscore for more details.

Usage

prog(x)## S3 method for class 'rctglm_prog'prog(x)

Arguments

x

an object of classrctglm_prog (returned byrctglm_with_prognosticscore)

Value

a list with the structure described ofprognostic_info in theValue section ofrctglm_with_prognosticscore.

See Also

The genericrctglm_with_prognosticscore() for which this methodworks.

Examples

# Generate some datan <- 100b0 <- 1b1 <- 1.5b2 <- 2W1 <- runif(n, min = 1, max = 10)exposure_prob <- .5dat_treat <- glm_data(  Y ~ b0+b1*log(W1)+b2*A,  W1 = W1,  A = rbinom(n, 1, exposure_prob))dat_notreat <- glm_data(  Y ~ b0+b1*log(W1),  W1 = W1)learners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    )  ))ate <- rctglm_with_prognosticscore(  formula = Y ~ .,  exposure_indicator = A,  exposure_prob = exposure_prob,  data = dat_treat,  family = gaussian(),  estimand_fun = "ate",  data_hist = dat_notreat,  learners = learners)prog(ate)

Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation usinginfluence functions

Description

The procedure uses plug-in-estimation and influence functions to perform robust inference of any specifiedestimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect ofcovariates in randomisation groups. SeePowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025) for more details on methodology.

Usage

rctglm(  formula,  exposure_indicator,  exposure_prob,  data,  family = gaussian,  estimand_fun = "ate",  estimand_fun_deriv0 = NULL,  estimand_fun_deriv1 = NULL,  cv_variance = FALSE,  cv_variance_folds = 10,  verbose = options::opt("verbose"),  ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation.

exposure_indicator

(name of) thebinary variable indata thatidentifies randomisation groups. The variable is required to be binary currently.

exposure_prob

anumeric with the probability of being in"group 1" (rather than group 0) in groups defined byexposure_indicator.

data

an optional data frame, list or environment (or object coercibleby as.data.frame to a data frame) containing the variables in the model. Ifnot found in data, the variables are taken from environment(formula), typicallythe environment from which the function is called.

family

a description of the error distribution and linkfunction to be used in the model. Forglm this can be acharacter string naming a family function, a family function or theresult of a call to a family function. Forglm.fit only thethird option is supported. (Seefamily for details offamily functions.)

estimand_fun

afunction with argumentspsi1 andpsi0 specifyingthe estimand. Alternative, specify "ate" or "rate_ratio" as acharacterto use one of the default estimand functions. Seemore details in the "Estimand" section ofrctglm.

estimand_fun_deriv0

afunction specifying the derivative ofestimand_fun wrt.psi0. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

estimand_fun_deriv1

afunction specifying the derivative ofestimand_fun wrt.psi1. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

cv_variance

alogical determining whether to estimate the varianceusing cross-validation (see details ofrctglm).

cv_variance_folds

anumeric with the number of folds to use for crossvalidation ifcv_variance isTRUE.

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution (Defaults to2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

...

Additional arguments passed tostats::glm()

Details

The procedure assumes the setup of a randomised clinical trial with observations grouped by a binaryexposure_indicator variable, allocated randomly with probabilityexposure_prob. A GLM isfit and then used to predict the response of all observations in the event that theexposure_indicatoris 0 and 1, respectively. Taking means of these predictions produce thecounterfactual meanspsi0 andpsi1, and an estimandr(psi0, psi1) is calculated using any specifiedestimand_fun.

The variance of the estimand is found by taking the variance of the influence function of the estimand.Ifcv_variance isTRUE, then the counterfactual predictions for each observation (which areused to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictionsusing cross validation with number of folds specified bycv_variance_folds. The cross validation splitsare performed using stratified sampling withexposure_indicator as thestrata argument inrsample::vfold_cv.

Read more invignette("model-fit").

Value

rctglm returns an object of class inheriting from"rctglm".

An object of classrctglm is a list containing the following components:

Estimands

As noted in the description,psi0 andpsi1 are the counterfactual means found by prediction usinga fitted GLM in the binary groups defined byexposure_indicator.

Default estimand functions can be specified via"ate" (which uses the functionfunction(psi1, psi0) psi1-psi0) and"rate_ratio" (which uses the functionfunction(psi1, psi0) psi1/psi0). See more information on specifying theestimand_funinvignette("model-fit").

As a default, theDeriv package is used to perform symbolic differentiation to find the derivatives oftheestimand_fun.

See Also

See how to extract information using methods inrctglm_methods.

Userctglm_with_prognosticscore() to include prognostic covariate adjustment.

See vignettes

Examples

# Generate some data to showcase examplen <- 100exp_prob <- .5dat_gaus <- glm_data(  Y ~ 1+1.5*X1+2*A,  X1 = rnorm(n),  A = rbinom(n, 1, exp_prob),  family = gaussian())# Fit the modelate <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exp_prob,              data = dat_gaus,              family = gaussian)# Pull information on estimandestimand(ate)## Another example with different family and specification of estimand_fundat_binom <- glm_data(  Y ~ 1+1.5*X1+2*A,  X1 = rnorm(n),  A = rbinom(n, 1, exp_prob),  family = binomial())rr <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exp_prob,              data = dat_binom,              family = binomial(),              estimand_fun = "rate_ratio")odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))or <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exp_prob,              data = dat_binom,              family = binomial,              estimand_fun = odds_ratio)

Methods for objects of classrctglm

Description

Methods mostly to extract information from model fit and inference. Seedetails for more information on each method.

Usage

estimand(object)## S3 method for class 'rctglm'estimand(object)est(object)## S3 method for class 'rctglm'coef(object, ...)## S3 method for class 'rctglm'predict(object, ...)## S3 method for class 'rctglm'print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

object

an object of classrctglm

...

additional arguments passed to methods

x

an object of classrctglm

digits

anumeric with the number of digits to display when printing

Details

The functionestimand (or short-hand versionest) can be used to extractadata.frame with an estimated value and standard error of the estimand.

A method for the genericcoef() has been added for classrctglm, whichextracts coefficient information from the underlyingglm fitin the procedure. The same is true for the method defined for thepredict() generic.The method for anrctglm class object callspredict.glm() on theglm fitcontained within anrctglm object.

Value

estimand/est returns adata.frame with columnsEstimate and⁠Std. Error⁠ with the estimate and standard error of the estimand.

Seecoef() andpredict.glm() for details on what is returned by the correspondingmethods.

See Also

The genericrctglm() which producesrctglm class objects.

Examples

# Generate some data to showcase examplen <- 100exposure_prob <- .5dat_binom <- glm_data(  Y ~ 1+1.5*X1+2*A,  X1 = rnorm(n),  A = rbinom(n, 1, exposure_prob),  family = binomial())# Fit the modelate <- rctglm(formula = Y ~ .,              exposure_indicator = A,              exposure_prob = exposure_prob,              data = dat_binom,              family = binomial,              estimand_fun = "ate")print(ate)estimand(ate)coef(ate)

Use prognostic covariate adjustment when fitting anrctglm

Description

The procedure usesfit_best_learner to fit a prognostic model to historical data and usesthe model to produce counterfactual predictions as a prognostic score that is then adjustedfor as a covariate in therctglm procedure. SeePowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025) for more details.

Usage

rctglm_with_prognosticscore(  formula,  exposure_indicator,  exposure_prob,  data,  family = gaussian,  estimand_fun = "ate",  estimand_fun_deriv0 = NULL,  estimand_fun_deriv1 = NULL,  cv_variance = FALSE,  cv_variance_folds = 10,  ...,  data_hist,  prog_formula = NULL,  cv_prog_folds = 5,  learners = default_learners(),  verbose = options::opt("verbose"))

Arguments

formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation.

exposure_indicator

(name of) thebinary variable indata thatidentifies randomisation groups. The variable is required to be binary currently.

exposure_prob

anumeric with the probability of being in"group 1" (rather than group 0) in groups defined byexposure_indicator.

data

an optional data frame, list or environment (or object coercibleby as.data.frame to a data frame) containing the variables in the model. Ifnot found in data, the variables are taken from environment(formula), typicallythe environment from which the function is called.

family

a description of the error distribution and linkfunction to be used in the model. Forglm this can be acharacter string naming a family function, a family function or theresult of a call to a family function. Forglm.fit only thethird option is supported. (Seefamily for details offamily functions.)

estimand_fun

afunction with argumentspsi1 andpsi0 specifyingthe estimand. Alternative, specify "ate" or "rate_ratio" as acharacterto use one of the default estimand functions. Seemore details in the "Estimand" section ofrctglm.

estimand_fun_deriv0

afunction specifying the derivative ofestimand_fun wrt.psi0. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

estimand_fun_deriv1

afunction specifying the derivative ofestimand_fun wrt.psi1. As a defaultthe algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun

cv_variance

alogical determining whether to estimate the varianceusing cross-validation (see details ofrctglm).

cv_variance_folds

anumeric with the number of folds to use for crossvalidation ifcv_variance isTRUE.

...

Additional arguments passed tostats::glm()

data_hist

adata.frame with historical data on which to fit a prognostic model

prog_formula

an object of class "formula" (or one that can be coerced to that class):a symbolic description of the prognostic model to be fitted todata_hist. Passed in alist as thepreproc argument infit_best_learner(). As a default,the formula is created by modelling the response (assumed to have the same name as informula) using all columns indata_hist.

cv_prog_folds

anumeric with the number of cross-validation folds used when fitting andevaluating models

learners

alist (preferably named) containing named lists of elementsmodel and optionallygrid. Themodel element should be aparsnipmodel specification, which is passed toworkflowsets::workflow_set as themodel argument, while thegrid element is passed as thegrid argumentofworkflowsets::option_add

verbose

numeric verbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution (Defaults to2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

Details

Prognostic covariate adjustment involves training a prognostic model (usingfit_best_learner) on historical data (data_hist) to predict the responsein that data.

Assuming that thehistorical data is representative of the comparator group in a “new” dataset (group 0 of the binaryexposure_indicator indata), we can use theprognostic model to predict the counterfactualoutcome of all observations (including the ones in the comparator groupfor which the prediction of counterfactual outcome coincides with aprediction of actual outcome).

This prediction, which is called the prognostic score, is then used as anadjustment covariate in the GLM (the prognostic score is added toformulabefore callingrctglm with the modified formula).

See much more details in the reference in the description.

Value

rctglm_with_prognosticscore returns an object of classrctglm_prog,which inherits fromrctglm.

Anrctglm_prog object is a list with the same components as anrctglm object(see theValue section ofrctglm for a breakdown of the structure),but with an additional list element of:

See Also

Method to extract information of the prognostic model inprog. Functionused to fit the prognostic model isfit_best_learner().

Seerctglm() for the function and class this inherits from.

Examples

# Generate some datan <- 100b0 <- 1b1 <- 1.5b2 <- 2W1 <- runif(n, min = 1, max = 10)exp_prob <- .5dat_treat <- glm_data(  Y ~ b0+b1*log(W1)+b2*A,  W1 = W1,  A = rbinom (n, 1, exp_prob))dat_notreat <- glm_data(  Y ~ b0+b1*log(W1),  W1 = W1)learners <- list(  mars = list(    model = parsnip::set_engine(      parsnip::mars(        mode = "regression", prod_degree = 3      ),      "earth"    ) ),    lm = list(      model = parsnip::set_engine(        parsnip::linear_reg(),        "lm"      )    ))ate <- rctglm_with_prognosticscore(  formula = Y ~ .,  exposure_indicator = A,  exposure_prob = exp_prob,  data = dat_treat,  family = gaussian(),  estimand_fun = "ate",  data_hist = dat_notreat,  learners = learners)# Pull information on estimandestimand(ate)

Create data and plot power curves calculated using functions inpower_linear() for a list of formulas/models

Description

Estimate a variance for power approximation usingvariance_ancova() for each formulainformula_list ontrain_data. Then calculate power using the function with namespecified inpower_fun across a number of sample sizesns for an assumed averagetreatment effect ofate.

Usage

repeat_power_linear(  ate,  formula_list,  train_data,  power_fun = c("power_gs", "power_nc"),  ns = 10:400,  desired_power = 0.9,  ...)## S3 method for class 'postcard_rpl'plot(x, cols = NULL, ...)

Arguments

ate

Passed topower_gs() orpower_nc()

formula_list

a namedlist of formulas that are element wise passed tovariance_ancova()

train_data

Passed as thedata argument invariance_ancova()

power_fun

acharacter string with value"power_gs" or"power_nc",specifying what function in thepower_linear() topic to use

ns

anumeric vector of sample sizes

desired_power

anumeric between 0 and 1 indicating the desired power level

...

Arguments passed tovariance_ancova() andpower_gs() orpower_nc()

x

an object of classpostcard_rpl created byrepeat_power_linear()

cols

a (potentially named)character vector of colors for the different modelsinformula_list

Value

repeat_power_linear returns an object of classpostcard_rpl, which isjust adata.frame with aplot method defined. Theplot method returns aggplot2 object.

See Also

repeat_power_marginaleffect() for a similar implementation to iterate theprocess of approximating power with thepower_marginaleffect()

Examples

train_data <- glm_data(  Y ~ 1+1.5*log(W)+2*X,  W = runif(1e3, min = 1, max = 10),  X = rnorm(1e3, sd = 3))rpl <- repeat_power_linear(  ate = 0.5,  formula_list = list("ANCOVA 1 covariate" = Y ~ X, "ANCOVA 2 covariates" = Y ~ W + X),  train_data = train_data)rpl_nc <- repeat_power_linear(  ate = 0.5,  formula_list = list("ANCOVA 1 covariate" = Y ~ X, "ANCOVA 2 covariates" = Y ~ W + X),  train_data = train_data,  power_fun = "power_nc",  df = 1e3-3,  deflation = 0.95,  margin = -0.2,  r = 2)## Not run: plot(rpl)plot(rpl_nc)## End(Not run)

Create data and plot power curves calculated usingpower_marginaleffect() for a list of models

Description

Iterate a process of simulating test data fromtest_data_fun, making predictionsusing models inmodel_list, and calculating power usingpower_marginaleffect()across a number of sample sizesns and iterationsn_iter. The results are averagedand used to create a plot of the resulting power curves.

Usage

repeat_power_marginaleffect(  target_effect,  exposure_prob,  model_list = default_power_model_list(),  test_data_fun = function(n) {     glm_data(Y ~ 1 + 3 * log(W), W = stats::runif(n, min    = 1, max = 50)) },  ns = seq(10, 300, 10),  desired_power = 0.9,  n_iter = 1,  ...)## S3 method for class 'postcard_rpm'plot(x, cols = NULL, ...)

Arguments

target_effect

Passed topower_marginaleffect()

exposure_prob

Passed topower_marginaleffect()

model_list

a namedlist of models used to get predictions on generated testdata sets that are then passed topower_marginaleffect() aspredictions. Theelements ofmodel_list need to have an existingpredict() method. The default isan ANCOVA and a prognostic model fitted withfit_best_learner() to a simple data setof 1000 observations generated with a non-linear effect of a single covariate usingglm_data().

test_data_fun

afunction with a single argumentn that generates testdata sets for the sample sizesns specified. The default generates data usingglm_data() with the same data generating process as the trainingdata used to fit the default models inmodel_list.

ns

anumeric vector of sample sizes

desired_power

anumeric between 0 and 1 indicating the desired power level

n_iter

anumeric indicating a number of iterations to process and average over

...

additional arguments passed topower_marginaleffect()

x

an object of classpostcard_rpm created byrepeat_power_marginaleffect()

cols

a (potentially named)character vector of colors for the different modelsinmodel_list

Value

repeat_power_marginal returns an object of classpostcard_rpm, which isjust adata.frame with aplot method defined. Theplot method returns aggplot2 object.

See Also

repeat_power_linear() for a similar implementation to iterate the processof approximating power with the functions inpower_linear()

Examples

# Note everything is wrapped in dontrun to avoid long runtimes of examples (tests are# still in place). Reduce the number of sample sizes and/or iterations to avoid long# runtimes## Not run: # A simple use case with default models and test data (we run only with a few sample# sizes to reduce runtime of examples)rpm <- repeat_power_marginaleffect(  target_effect = 0.9,  exposure_prob = 0.5)plot(rpm)################################# Create model from a poisson family and estimate the power of rate ratio with# several arguments passed to power_marginaleffect################################b1 <- 0.9b2 <- 0.2b3 <- -0.4b4 <- -0.6train_pois <- glm_data(  Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3,  X1 = runif(1e3, min = 1, max = 10),  X2 = rnorm(1e3),  X3 = rgamma(1e3, shape = 1),  family = poisson())# Define models to compare fit to training dataancova_prog_list <- list(ANCOVA = glm(Y ~ X1 + X2 + X3, data = train_pois, family = poisson),"ANCOVA with prognostic score" = fit_best_learner(list(mod = Y ~ X1 + X2 + X3), data = train_pois))# Create a function that produces data to predict ontest_pois_fun <- function(n) { glm_data(   Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3,   X1 = runif(n, min = 1, max = 10),   X2 = rnorm(n),   X3 = rgamma(n, shape = 1),   family = poisson() )}# Specify a bunch of different arguments that are passed to power_marginaleffect()## Run for 2 sample sizes to reduce runtimerpm_rr <- repeat_power_marginaleffect(  model_list = ancova_prog_list,  test_data_fun = test_pois_fun,  ns = seq(100, 200), n_iter = 1,  var1 = function(var0) 1.1 * var0,  kappa1_squared = function(kap0) 1.1 * kap0,  estimand_fun = "rate_ratio",  target_effect = 1.4,  exposure_prob = 1/2,  margin = 0.8)plot(rpm_rr2)## End(Not run)

[8]ページ先頭

©2009-2025 Movatter.jp