| Type: | Package |
| Title: | Estimating Marginal Effects with Prognostic Covariate Adjustment |
| Version: | 1.1.0 |
| Description: | Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025) <doi:10.48550/arXiv.2503.22284>. |
| License: | MIT + file LICENSE |
| URL: | https://novonordisk-opensource.github.io/postcard/,https://github.com/NovoNordisk-OpenSource/postcard |
| BugReports: | https://github.com/NovoNordisk-OpenSource/postcard/issues |
| Imports: | cli, Deriv, dplyr, earth, generics, gggrid, ggplot2, options,parsnip, rlang, rsample, scales, stats, stringr, tune, utils,workflowsets, xgboost, yardstick |
| Suggests: | knitr, LiblineaR, MASS, ranger, rmarkdown, testthat (≥3.0.0), vdiffr, withr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-09-17 12:22:14 UTC; Mathias |
| Author: | Mathias Lerbech Jeppesen [aut, cre], Emilie Hojbjerre-Frandsen [aut], Novo Nordisk A/S [cph] |
| Maintainer: | Mathias Lerbech Jeppesen <mathiasljeppesen@outlook.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-09-17 12:50:02 UTC |
postcard: Estimating Marginal Effects with Prognostic Covariate Adjustment
Description

Conduct power analyses and inference of marginal effects. Uses plug-in estimation and influence functions to perform robust inference, optionally leveraging historical data to increase precision with prognostic covariate adjustment. The methods are described in Højbjerre-Frandsen et al. (2025)doi:10.48550/arXiv.2503.22284.
Author(s)
Maintainer: Mathias Lerbech Jeppesenmathiasljeppesen@outlook.com
Authors:
Emilie Hojbjerre-Frandsenehfd@novonordisk.com
Other contributors:
Novo Nordisk A/S [copyright holder]
See Also
Useful links:
Creates a list of learners
Description
This function creates a list of learners compatible with thelearnersargument offit_best_learner, which is used as the default argument.
Usage
default_learners()Value
a namedlist of learners, where each element consists of a
model: Aparsnipmodel specificationgrid: Adata.framewith columns corresponding to tuning parameters
Examples
default_learners()Find the best learner in terms of RMSE among specified learners using cross validation
Description
Find the best learner in terms of RMSE among specified learners using cross validation
Usage
fit_best_learner( preproc, data, cv_folds = 5, learners = default_learners(), verbose = options::opt("verbose"))Arguments
preproc | A list (preferably named) with preprocessing objects:formulas, recipes, or |
data | A data frame. |
cv_folds | a |
learners | a |
verbose |
|
Details
Ensure data compatibility with the learners.
Value
a trainedworkflow
See Also
Seerctglm_with_prognosticscore() for a function that utilises thisfunction to perform prognostic covariate adjustment.
Examples
# Generate some synthetic 2-armed RCT data along with historical controlsn <- 100dat_rct <- glm_data( Y ~ 1+2*x1+3*a, x1 = rnorm(n, 2), a = rbinom (n, 1, .5), family = gaussian())dat_hist <- glm_data( Y ~ 1+2*x1, x1 = rnorm(n, 2), family = gaussian())# Fit a learner to the historical control datalearners <- list( mars = list( model = parsnip::set_engine( parsnip::mars( mode = "regression", prod_degree = 3 ), "earth" ) ))fit <- fit_best_learner( preproc = list(mod = Y ~ .), data = dat_hist, learners = learners)# Use it fx. to predict the "control outcome" in the 2-armed RCTpredict(fit, new_data = dat_rct)Generate data simulated from a GLM
Description
Provide a formula, variables and a family to generate a linear predictorusing the formula and provided variables before using the inverse link ofthe family to generate the GLM modelled mean, mu, which is then used tosimulate the response with this mean from the generating function accordingto the chosen family.
Usage
glm_data(formula, ..., family = gaussian(), family_args = NULL)Arguments
formula | an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation. |
... | a |
family | the |
family_args | a named |
Value
adata.frame
Examples
# Generate a gaussian response from a single covariateglm_data(Y ~ 1+2*x1, x1 = rnorm(10))# Generate a gaussian response from a single covariate with non-linear# effects. Specify that the response should have standard deviation sqrt(3)glm_data(Y ~ 1+2*log(x1), x1 = runif(10, min = 1, max = 10), family_args = list(sd = sqrt(3)))# Generate a negative binomial responseglm_data(Y ~ 1+2*x1-x2, x1 = rnorm(10), x2 = rgamma(10, shape = 2), family = MASS::negative.binomial(2))# Provide variables as a list/data.frame and pass a link to the negative.binomial# functionglm_data(resp ~ 1+2*x1-x2, data.frame( x1 = rnorm(10), x2 = rgamma(10, shape = 2) ), family = MASS::negative.binomial(2), family_args = list(link = "identity"))postcard Options
Description
Internally used, package-specific options. All options will prioritize R options() values, and fall back to environment variables if undefined. If neither the option nor the environment variable is set, a default value is used.
Arguments
verbose |
|
Checking Option Values
Option values specific topostcard can beaccessed by passing the package name toenv.
options::opts(env = "postcard")options::opt(x, default, env = "postcard")
Options
- verbose
numericverbosity level. Higher values means more information isprinted in console. A value of 0 means nothing is printed to console duringexecution- default:
2
- option:
postcard.verbose
- envvar:
R_POSTCARD_VERBOSE (evaluated if possible, raw string otherwise)
See Also
options getOption Sys.setenv Sys.getenv
Power and sample size estimation for linear models
Description
variance_ancova provides a convenient function for estimating avariance to use for power and sample size approximation.
Thepower_gs andsamplesize_gs functions calculate the Guenther-Schoutenpower approximation for ANOVA or ANCOVA.The approximation is based in (Guenther WC. Sample Size Formulas for Normal Theory T Tests.The American Statistician. 1981;35(4):243–244) and (Schouten HJA. Sample size formula witha continuous outcome for unequal group sizes and unequal variances. Statistics in Medicine.1999;18(1):87–91).
The functionpower_nc calculates the power for ANOVA or ANCOVA based on thenon-centrality parameter and the exact t-distributions.
See more details about each function inDetails and in sections afterValue.
Usage
variance_ancova(formula, data, inflation = 1, deflation = 1, ...)power_gs(variance, ate, n, r = 1, margin = 0, alpha = 0.05, ...)samplesize_gs(variance, ate, r = 1, margin = 0, power = 0.9, alpha = 0.05, ...)power_nc(variance, df, ate, n, r = 1, margin = 0, alpha = 0.05, ...)Arguments
formula | an object of class "formula" (or one that can be coerced to that class):a symbolic description used in |
data | a data frame, list or environment (or objectcoercible by |
inflation | a |
deflation | a |
... | Not currently used. Here to accomodate implementation of |
variance | a |
ate | a |
n | a |
r | a |
margin | a |
alpha | a |
power | a |
df | a |
Details
This details section provides information about relation between arguments tofunctions and the formulas described in sections below for each powerapproximation formula.
Note that all entities that carry the same name as an argument and in the formulawill not be mentioned below, as they are obviously linked (n, r, alpha)
ate:\beta_1-\beta_0margin:\Delta_svariance:\widehat{\sigma}^2(1-\widehat{R}^2)
Finding thevariance to use for approximation
Thevariance_ancova function estimates\sigma^2(1-R^2) in data andreturns it as anumeric that can be passed directly as thevarianceinpower_gs. Corresponds to estimating the power from using anlm withthe sameformula as specified invariance_ancova.
The user can estimate thevariance any way they see fit.
Value
All functions return anumeric.variance_ancova returns anumeric witha variance estimated from data to used for power estimation and sample sizeestimation.power_xx andsamplesize_xx functions return anumeric withthe power or sample size approximation.
Guenther-Schouten power approximation
The estimation formula in the case of an ANCOVA model with multiple covariate adjustment is (see description for reference):
n=\frac{(1+r)^2}{r}\frac{(z_{1-\alpha/2}+z_{1-\beta})^2\widehat{\sigma}^2(1-\widehat{R}^2)}{(\beta_1-\beta_0-\Delta_s)^2}+\frac{(z_{1-\alpha/2})^2}{2}
where\widehat{R}^2= \frac{\widehat{\sigma}_{XY}^\top \widehat{\Sigma}_X^{-1}\widehat{\sigma}_{XY}}{\widehat{\sigma}^2},we denote by\widehat{\sigma^2} an estimate of the variance of the outcome,\widehat{\Sigma_X} and estimate of the covariance matrix of thecovariates, and\widehat{\sigma_{XY}} ap-dimensional column vector consisting ofan estimate of the covariancebetween the outcome variable and each covariate.In the univariate caseR^2 is replaced by\rho^2
Power approximation using non-centrality parameter
The prospective power estimations are based on (Kieser M. Methods and Applications of Sample Size Calculation and Recalculation in Clinical Trials. Springer; 2020).The ANOVA power is calculated based on the non-centrality parameter given as
nc =\sqrt{\frac{r}{(1+r)^2}\cdot n}\cdot\frac{\beta_1-\beta_0-\Delta_s}{\sigma},
where we denote by\sigma^2 the variance of the outcome, such that the power can be estimated as
1-\beta = 1 - F_{t,n-2,nc}\left(F_{t, n-2, 0}^{-1}(1-\alpha/2)\right).
The power of ANCOVA with univariate covariate adjustment and no interaction is calculated based on the non-centrality parameter given as
nc =\sqrt{\frac{rn}{(1+r)^2}}\frac{\beta_1-\beta_0-\Delta_s}{\sigma\sqrt{1-\rho^2}},
such that the power can be estimated as
1-\beta = 1 - F_{t,n-3,nc}\left(F_{t, n-3, 0}^{-1}(1-\alpha/2)\right).
The power of ANCOVA with either univariate covariate adjustment and interaction or multiple covariate adjustement with or without interaction is calculated based on the non-centrality parameter given as
nc =\frac{\beta_1-\beta_0-\Delta_s}{\sqrt{\left(\frac{1}{n_1}+\frac{1}{n_0} + X_d^\top\left((n-2)\Sigma_X\right)^{-1}X_d \right)\sigma^2\left(1-\widehat{R}^2\right)}}.
whereX_d = \left(\overline{X}_1^1-\overline{X}_0^1, \ldots, \overline{X}_1^p-\overline{X}_0^p\right)^\top,\widehat{R}^2= \frac{\widehat{\sigma}_{XY}^\top \widehat{\Sigma}_X^{-1}\widehat{\sigma}_{XY}}{\widehat{\sigma}^2},we denote by\widehat{\sigma^2} an estimate of the variance of the outcome,\widehat{\Sigma_X} and estimate of the covariance matrix of thecovariates, and\widehat{\sigma_{XY}} ap-dimensional column vector consisting ofan estimate of the covariancebetween the outcome variable and each covariate.Since we are in the case of randomized trials the expected difference between the covariatevalues between the to groups is 0. Furthermore, the elements of\Sigma_X^{-1} will be small, unless the variances are close to 0, or the covariates exhibit strong linear dependencies, so that the correlations are close to 1.These scenarios are excluded since they could lead to potentially serious problems regarding inference either way. These arguments are used by Zimmermann et. al(Zimmermann G, Kieser M, Bathke AC. Sample Size Calculation and Blinded Recalculation for Analysis of Covariance Models with Multiple Random Covariates. Journal of Biopharmaceutical Statistics. 2020;30(1):143–159.) to approximatethe non-centrality parameter as in the univariate case where\rho^2 is replaced byR^2.
Then the power for ANCOVA withd degrees of freedom can be estimated as
1-\beta = 1 - F_{t,d,nc}\left(F_{t, d,0), 0}^{-1}(1-\alpha/2)\right).
See Also
Seepower_marginaleffect for a power approximation function that works for a largerclass of models.
Examples
# Generate a data set to use as an examplen_train <- 1e3n_test <- 100dat_gaus <- glm_data(Y ~ 1+2*X1-X2+3*A, X1 = rnorm(n_train), X2 = rgamma(n_train, shape = 2), A = rbinom(n_train, size = 1, prob = 0.5), family = gaussian())# Approximate the power using no adjustment covariatesva_nocov <- var(dat_gaus$Y)power_gs(n = n_test, variance = va_nocov, ate = 1)# Approximate the power with a model adjusting for both variables in the# data generating process## First estimate the variance sigma^2 * (1-R^2)va_cov <- variance_ancova(Y ~ X1 + X2 + A, dat_gaus)## Then estimate the power using this variancepower_gs(n = n_test, variance = va_cov, ate = 1.8, margin = 1, r = 2)# Approximate the sample size needed to obtain 90% power with same model as# abovesamplesize_gs( variance = va_cov, ate = 1.8, power = 0.9, margin = 1, r = 2)# No adjustment covariatespower_nc(n = n_test, variance = va_nocov, df = 199, ate = 1)# Adjusting for all covariates in data generating processpower_nc(n = n_test, variance = va_cov, df = 196, ate = 1.8, margin = 1, r = 2)Power approximation for estimating marginal effects in GLMs
Description
The functions implements the algorithm for power estimation described inPowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025). See a bit more context in details andall details in reference.
Usage
power_marginaleffect( response, predictions, target_effect, exposure_prob, var1 = NULL, kappa1_squared = NULL, estimand_fun = "ate", estimand_fun_deriv0 = NULL, estimand_fun_deriv1 = NULL, inv_estimand_fun = NULL, margin = estimand_fun(1, 1), alpha = 0.05, tolerance = 0.001, verbose = options::opt("verbose"), ...)Arguments
response | a vector of mode |
predictions | a vector of mode |
target_effect | a |
exposure_prob | a |
var1 | a |
kappa1_squared | a |
estimand_fun | a |
estimand_fun_deriv0 | a |
estimand_fun_deriv1 | a |
inv_estimand_fun | (optional) a |
margin | a |
alpha | a |
tolerance | passed toall.equal when comparing calculated |
verbose |
|
... | arguments passed tostats::uniroot, which is used to find theinverse of |
Details
The reference in the description shows in its "Prospective power" section aderivation of a variance bound
v_\infty^{\uparrow 2} = r_0'^{\, 2}\sigma_0^2+r_1'^{\, 2}\sigma_1^2+\pi_0\pi_1\left(\frac{|r_0'|\kappa_0}{\pi_0} +\frac{|r_1'|\kappa_1}{\pi_1} \right)^2
wherer_a' is the derivative of theestimand_fun with respect to\Psi_a,\sigma_a^2 is the variance of the potential outcome corresponding togroupa,\pi_a is the probability of being assigned to groupa,and\kappa_a is the expected mean-squared error when predicting thepotential outcome corresponding to groupa.
The variance bound is then used for calculating a lower bound of the power usingthe distributions corresponding to the null and alternative hypotheses\mathcal{H}_0: \hat{\Psi} \sim F_0 = \mathcal{N}(\Delta ,v_\infty^{\uparrow 2} / n)and\mathcal{H}_1: \hat{\Psi} \sim F_1 = \mathcal{N}(\Psi,v_\infty^{\uparrow 2} / n).See more details in the reference.
Relating arguments to formulas
response: Used to obtain both\sigma_0^2(by taking the samplevariance of the response) and\kappa_0.predictions: Used when calculating the MSE\kappa_0.var1:\sigma_1^2. As a default, chosen to be the same as\sigma_0^2. Can specify differently through this argument fx. byInflating or deflating the value of
\sigma_0^2by a scalar accordingto prior beliefs. Fx. specifyvar1 = function(x) 1.2 * xto inflate\sigma_0^2by 1.2.If historical data is available for group 1, an estimate of the variancefrom that data can be provided here.
kappa1_squared:\kappa_1. Same as forvar1, default is to assumethe same value askappa0_squared, which is found by using theresponseandpredictionsvectors. Adjust the value according to prior beliefs ifrelevant.target_effect:\Psi.exposure_prob:\pi_1
Value
anumeric with the estimated power.
See Also
Seepower_linear for power approximation functionalities specific to linear models.
Examples
# Generate a data set to use as an examplen <- 100exposure_prob <- .5dat_gaus <- glm_data(Y ~ 1+2*X1-X2+3*A+1.6*A*X2, X1 = rnorm(n), X2 = rgamma(n, shape = 2), A = rbinom(n, size = 1, prob = exposure_prob), family = gaussian())# ---------------------------------------------------------------------------# Obtain out-of-sample (OOS) prediction using glm model# ---------------------------------------------------------------------------gaus1 <- dat_gaus[1:(n/2), ]gaus2 <- dat_gaus[(n/2+1):n, ]glm1 <- glm(Y ~ X1 + X2 + A, data = gaus1)glm2 <- glm(Y ~ X1 + X2 + A, data = gaus2)preds_glm1 <- predict(glm2, newdata = gaus1, type = "response")preds_glm2 <- predict(glm1, newdata = gaus2, type = "response")preds_glm <- c(preds_glm1, preds_glm2)# Obtain powerpower_marginaleffect( response = dat_gaus$Y, predictions = preds_glm, target_effect = 2, exposure_prob = exposure_prob)# ---------------------------------------------------------------------------# Get OOS predictions using discrete super learner and adjust variance# ---------------------------------------------------------------------------learners <- list( mars = list( model = parsnip::set_engine( parsnip::mars( mode = "regression", prod_degree = 3 ), "earth" ) ), lm = list( model = parsnip::set_engine( parsnip::linear_reg(), "lm" ) ))lrnr1 <- fit_best_learner(preproc = list(mod = Y ~ X1 + X2 + A), data = gaus1, learners = learners)lrnr2 <- fit_best_learner(preproc = list(mod = Y ~ X1 + X2 + A), data = gaus2, learners = learners)preds_lrnr1 <- dplyr::pull(predict(lrnr2, new_data = gaus1))preds_lrnr2 <- dplyr::pull(predict(lrnr1, new_data = gaus2))preds_lrnr <- c(preds_lrnr1, preds_lrnr2)# Estimate the power AND adjust the assumed variance in the "unknown"# group 1 to be 20% larger than in group 0power_marginaleffect( response = dat_gaus$Y, predictions = preds_lrnr, target_effect = 2, exposure_prob = exposure_prob, var1 = function(var0) 1.2 * var0)Extract information about the fitted prognostic model
Description
Extracts theprognostic_info list element from anrctglm_prog object. See'Value' atrctglm_with_prognosticscore for more details.
Usage
prog(x)## S3 method for class 'rctglm_prog'prog(x)Arguments
x | an object of class |
Value
a list with the structure described ofprognostic_info in theValue section ofrctglm_with_prognosticscore.
See Also
The genericrctglm_with_prognosticscore() for which this methodworks.
Examples
# Generate some datan <- 100b0 <- 1b1 <- 1.5b2 <- 2W1 <- runif(n, min = 1, max = 10)exposure_prob <- .5dat_treat <- glm_data( Y ~ b0+b1*log(W1)+b2*A, W1 = W1, A = rbinom(n, 1, exposure_prob))dat_notreat <- glm_data( Y ~ b0+b1*log(W1), W1 = W1)learners <- list( mars = list( model = parsnip::set_engine( parsnip::mars( mode = "regression", prod_degree = 3 ), "earth" ) ))ate <- rctglm_with_prognosticscore( formula = Y ~ ., exposure_indicator = A, exposure_prob = exposure_prob, data = dat_treat, family = gaussian(), estimand_fun = "ate", data_hist = dat_notreat, learners = learners)prog(ate)Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation usinginfluence functions
Description
The procedure uses plug-in-estimation and influence functions to perform robust inference of any specifiedestimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect ofcovariates in randomisation groups. SeePowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025) for more details on methodology.
Usage
rctglm( formula, exposure_indicator, exposure_prob, data, family = gaussian, estimand_fun = "ate", estimand_fun_deriv0 = NULL, estimand_fun_deriv1 = NULL, cv_variance = FALSE, cv_variance_folds = 10, verbose = options::opt("verbose"), ...)Arguments
formula | an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation. |
exposure_indicator | (name of) thebinary variable in |
exposure_prob | a |
data | an optional data frame, list or environment (or object coercibleby as.data.frame to a data frame) containing the variables in the model. Ifnot found in data, the variables are taken from environment(formula), typicallythe environment from which the function is called. |
family | a description of the error distribution and linkfunction to be used in the model. For |
estimand_fun | a |
estimand_fun_deriv0 | a |
estimand_fun_deriv1 | a |
cv_variance | a |
cv_variance_folds | a |
verbose |
|
... | Additional arguments passed to |
Details
The procedure assumes the setup of a randomised clinical trial with observations grouped by a binaryexposure_indicator variable, allocated randomly with probabilityexposure_prob. A GLM isfit and then used to predict the response of all observations in the event that theexposure_indicatoris 0 and 1, respectively. Taking means of these predictions produce thecounterfactual meanspsi0 andpsi1, and an estimandr(psi0, psi1) is calculated using any specifiedestimand_fun.
The variance of the estimand is found by taking the variance of the influence function of the estimand.Ifcv_variance isTRUE, then the counterfactual predictions for each observation (which areused to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictionsusing cross validation with number of folds specified bycv_variance_folds. The cross validation splitsare performed using stratified sampling withexposure_indicator as thestrata argument inrsample::vfold_cv.
Read more invignette("model-fit").
Value
rctglm returns an object of class inheriting from"rctglm".
An object of classrctglm is a list containing the following components:
estimand: Adata.framewith plug-in estimate of estimand, standarderror (SE) estimate and variance estimate of estimandestimand_funs: Alistwithf: Theestimand_funused to obtain an estimate of the estimand from counterfactual meansd0: The derivative with respect topsi0d1: The derivative with respect topsi1
means_counterfactual: Adata.framewith counterfactual meanspsi0andpsi1fitted.values_counterfactual: Adata.framewith counterfactual meanvalues, obtained by transforming the linear predictors for each groupby the inverse of the link function.glm: Aglmobject returned from runningstats::glm within the procedurecall: The matchedcall
Estimands
As noted in the description,psi0 andpsi1 are the counterfactual means found by prediction usinga fitted GLM in the binary groups defined byexposure_indicator.
Default estimand functions can be specified via"ate" (which uses the functionfunction(psi1, psi0) psi1-psi0) and"rate_ratio" (which uses the functionfunction(psi1, psi0) psi1/psi0). See more information on specifying theestimand_funinvignette("model-fit").
As a default, theDeriv package is used to perform symbolic differentiation to find the derivatives oftheestimand_fun.
See Also
See how to extract information using methods inrctglm_methods.
Userctglm_with_prognosticscore() to include prognostic covariate adjustment.
See vignettes
Examples
# Generate some data to showcase examplen <- 100exp_prob <- .5dat_gaus <- glm_data( Y ~ 1+1.5*X1+2*A, X1 = rnorm(n), A = rbinom(n, 1, exp_prob), family = gaussian())# Fit the modelate <- rctglm(formula = Y ~ ., exposure_indicator = A, exposure_prob = exp_prob, data = dat_gaus, family = gaussian)# Pull information on estimandestimand(ate)## Another example with different family and specification of estimand_fundat_binom <- glm_data( Y ~ 1+1.5*X1+2*A, X1 = rnorm(n), A = rbinom(n, 1, exp_prob), family = binomial())rr <- rctglm(formula = Y ~ ., exposure_indicator = A, exposure_prob = exp_prob, data = dat_binom, family = binomial(), estimand_fun = "rate_ratio")odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))or <- rctglm(formula = Y ~ ., exposure_indicator = A, exposure_prob = exp_prob, data = dat_binom, family = binomial, estimand_fun = odds_ratio)Methods for objects of classrctglm
Description
Methods mostly to extract information from model fit and inference. Seedetails for more information on each method.
Usage
estimand(object)## S3 method for class 'rctglm'estimand(object)est(object)## S3 method for class 'rctglm'coef(object, ...)## S3 method for class 'rctglm'predict(object, ...)## S3 method for class 'rctglm'print(x, digits = max(3L, getOption("digits") - 3L), ...)Arguments
object | an object of class |
... | additional arguments passed to methods |
x | an object of class |
digits | a |
Details
The functionestimand (or short-hand versionest) can be used to extractadata.frame with an estimated value and standard error of the estimand.
A method for the genericcoef() has been added for classrctglm, whichextracts coefficient information from the underlyingglm fitin the procedure. The same is true for the method defined for thepredict() generic.The method for anrctglm class object callspredict.glm() on theglm fitcontained within anrctglm object.
Value
estimand/est returns adata.frame with columnsEstimate andStd. Error with the estimate and standard error of the estimand.
Seecoef() andpredict.glm() for details on what is returned by the correspondingmethods.
See Also
The genericrctglm() which producesrctglm class objects.
Examples
# Generate some data to showcase examplen <- 100exposure_prob <- .5dat_binom <- glm_data( Y ~ 1+1.5*X1+2*A, X1 = rnorm(n), A = rbinom(n, 1, exposure_prob), family = binomial())# Fit the modelate <- rctglm(formula = Y ~ ., exposure_indicator = A, exposure_prob = exposure_prob, data = dat_binom, family = binomial, estimand_fun = "ate")print(ate)estimand(ate)coef(ate)Use prognostic covariate adjustment when fitting anrctglm
Description
The procedure usesfit_best_learner to fit a prognostic model to historical data and usesthe model to produce counterfactual predictions as a prognostic score that is then adjustedfor as a covariate in therctglm procedure. SeePowering RCTs for marginal effects with GLMs using prognostic score adjustmentby Højbjerre-Frandsen et. al (2025) for more details.
Usage
rctglm_with_prognosticscore( formula, exposure_indicator, exposure_prob, data, family = gaussian, estimand_fun = "ate", estimand_fun_deriv0 = NULL, estimand_fun_deriv1 = NULL, cv_variance = FALSE, cv_variance_folds = 10, ..., data_hist, prog_formula = NULL, cv_prog_folds = 5, learners = default_learners(), verbose = options::opt("verbose"))Arguments
formula | an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted. The details of model specification aregiven under ‘Details’ in theglm documentation. |
exposure_indicator | (name of) thebinary variable in |
exposure_prob | a |
data | an optional data frame, list or environment (or object coercibleby as.data.frame to a data frame) containing the variables in the model. Ifnot found in data, the variables are taken from environment(formula), typicallythe environment from which the function is called. |
family | a description of the error distribution and linkfunction to be used in the model. For |
estimand_fun | a |
estimand_fun_deriv0 | a |
estimand_fun_deriv1 | a |
cv_variance | a |
cv_variance_folds | a |
... | Additional arguments passed to |
data_hist | a |
prog_formula | an object of class "formula" (or one that can be coerced to that class):a symbolic description of the prognostic model to be fitted to |
cv_prog_folds | a |
learners | a |
verbose |
|
Details
Prognostic covariate adjustment involves training a prognostic model (usingfit_best_learner) on historical data (data_hist) to predict the responsein that data.
Assuming that thehistorical data is representative of the comparator group in a “new” dataset (group 0 of the binaryexposure_indicator indata), we can use theprognostic model to predict the counterfactualoutcome of all observations (including the ones in the comparator groupfor which the prediction of counterfactual outcome coincides with aprediction of actual outcome).
This prediction, which is called the prognostic score, is then used as anadjustment covariate in the GLM (the prognostic score is added toformulabefore callingrctglm with the modified formula).
See much more details in the reference in the description.
Value
rctglm_with_prognosticscore returns an object of classrctglm_prog,which inherits fromrctglm.
Anrctglm_prog object is a list with the same components as anrctglm object(see theValue section ofrctglm for a breakdown of the structure),but with an additional list element of:
prognostic_info: List with information about the fitted prognostic modelon historical data. It has components:formula: Theformulawith symbolic description of how the responseis modelled as function of covariates in the modelsmodel_fit: A trainedworkflow- the result offit_best_learnerlearners: Alistof learners used for the discrete super learnercv_folds: The amount of folds used for cross validationdata: The historical data used for cross validation when fitting andtesting models
See Also
Method to extract information of the prognostic model inprog. Functionused to fit the prognostic model isfit_best_learner().
Seerctglm() for the function and class this inherits from.
Examples
# Generate some datan <- 100b0 <- 1b1 <- 1.5b2 <- 2W1 <- runif(n, min = 1, max = 10)exp_prob <- .5dat_treat <- glm_data( Y ~ b0+b1*log(W1)+b2*A, W1 = W1, A = rbinom (n, 1, exp_prob))dat_notreat <- glm_data( Y ~ b0+b1*log(W1), W1 = W1)learners <- list( mars = list( model = parsnip::set_engine( parsnip::mars( mode = "regression", prod_degree = 3 ), "earth" ) ), lm = list( model = parsnip::set_engine( parsnip::linear_reg(), "lm" ) ))ate <- rctglm_with_prognosticscore( formula = Y ~ ., exposure_indicator = A, exposure_prob = exp_prob, data = dat_treat, family = gaussian(), estimand_fun = "ate", data_hist = dat_notreat, learners = learners)# Pull information on estimandestimand(ate)Create data and plot power curves calculated using functions inpower_linear() for a list of formulas/models
Description
Estimate a variance for power approximation usingvariance_ancova() for each formulainformula_list ontrain_data. Then calculate power using the function with namespecified inpower_fun across a number of sample sizesns for an assumed averagetreatment effect ofate.
Usage
repeat_power_linear( ate, formula_list, train_data, power_fun = c("power_gs", "power_nc"), ns = 10:400, desired_power = 0.9, ...)## S3 method for class 'postcard_rpl'plot(x, cols = NULL, ...)Arguments
ate | Passed to |
formula_list | a named |
train_data | Passed as the |
power_fun | a |
ns | a |
desired_power | a |
... | Arguments passed to |
x | an object of class |
cols | a (potentially named) |
Value
repeat_power_linear returns an object of classpostcard_rpl, which isjust adata.frame with aplot method defined. Theplot method returns aggplot2 object.
See Also
repeat_power_marginaleffect() for a similar implementation to iterate theprocess of approximating power with thepower_marginaleffect()
Examples
train_data <- glm_data( Y ~ 1+1.5*log(W)+2*X, W = runif(1e3, min = 1, max = 10), X = rnorm(1e3, sd = 3))rpl <- repeat_power_linear( ate = 0.5, formula_list = list("ANCOVA 1 covariate" = Y ~ X, "ANCOVA 2 covariates" = Y ~ W + X), train_data = train_data)rpl_nc <- repeat_power_linear( ate = 0.5, formula_list = list("ANCOVA 1 covariate" = Y ~ X, "ANCOVA 2 covariates" = Y ~ W + X), train_data = train_data, power_fun = "power_nc", df = 1e3-3, deflation = 0.95, margin = -0.2, r = 2)## Not run: plot(rpl)plot(rpl_nc)## End(Not run)Create data and plot power curves calculated usingpower_marginaleffect() for a list of models
Description
Iterate a process of simulating test data fromtest_data_fun, making predictionsusing models inmodel_list, and calculating power usingpower_marginaleffect()across a number of sample sizesns and iterationsn_iter. The results are averagedand used to create a plot of the resulting power curves.
Usage
repeat_power_marginaleffect( target_effect, exposure_prob, model_list = default_power_model_list(), test_data_fun = function(n) { glm_data(Y ~ 1 + 3 * log(W), W = stats::runif(n, min = 1, max = 50)) }, ns = seq(10, 300, 10), desired_power = 0.9, n_iter = 1, ...)## S3 method for class 'postcard_rpm'plot(x, cols = NULL, ...)Arguments
target_effect | Passed to |
exposure_prob | Passed to |
model_list | a named |
test_data_fun | a |
ns | a |
desired_power | a |
n_iter | a |
... | additional arguments passed to |
x | an object of class |
cols | a (potentially named) |
Value
repeat_power_marginal returns an object of classpostcard_rpm, which isjust adata.frame with aplot method defined. Theplot method returns aggplot2 object.
See Also
repeat_power_linear() for a similar implementation to iterate the processof approximating power with the functions inpower_linear()
Examples
# Note everything is wrapped in dontrun to avoid long runtimes of examples (tests are# still in place). Reduce the number of sample sizes and/or iterations to avoid long# runtimes## Not run: # A simple use case with default models and test data (we run only with a few sample# sizes to reduce runtime of examples)rpm <- repeat_power_marginaleffect( target_effect = 0.9, exposure_prob = 0.5)plot(rpm)################################# Create model from a poisson family and estimate the power of rate ratio with# several arguments passed to power_marginaleffect################################b1 <- 0.9b2 <- 0.2b3 <- -0.4b4 <- -0.6train_pois <- glm_data( Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3, X1 = runif(1e3, min = 1, max = 10), X2 = rnorm(1e3), X3 = rgamma(1e3, shape = 1), family = poisson())# Define models to compare fit to training dataancova_prog_list <- list(ANCOVA = glm(Y ~ X1 + X2 + X3, data = train_pois, family = poisson),"ANCOVA with prognostic score" = fit_best_learner(list(mod = Y ~ X1 + X2 + X3), data = train_pois))# Create a function that produces data to predict ontest_pois_fun <- function(n) { glm_data( Y ~ b1*log(X1)+b2*X2+b3*X3+b4*X2*X3, X1 = runif(n, min = 1, max = 10), X2 = rnorm(n), X3 = rgamma(n, shape = 1), family = poisson() )}# Specify a bunch of different arguments that are passed to power_marginaleffect()## Run for 2 sample sizes to reduce runtimerpm_rr <- repeat_power_marginaleffect( model_list = ancova_prog_list, test_data_fun = test_pois_fun, ns = seq(100, 200), n_iter = 1, var1 = function(var0) 1.1 * var0, kappa1_squared = function(kap0) 1.1 * kap0, estimand_fun = "rate_ratio", target_effect = 1.4, exposure_prob = 1/2, margin = 0.8)plot(rpm_rr2)## End(Not run)