| Type: | Package |
| Title: | Methods for Evaluating Principal Surrogates of TreatmentResponse |
| Version: | 1.3.3 |
| Date: | 2025-04-08 |
| Maintainer: | Michael C Sachs <sachsmc@gmail.com> |
| Description: | Contains the core methods for the evaluation of principal surrogates in a single clinical trial. Provides a flexible interface for defining models for the risk given treatment and the surrogate, the models for integration over the missing counterfactual surrogate responses, and the estimation methods. Estimated maximum likelihood and pseudo-score can be used for estimation, and the bootstrap for inference. A variety of post-estimation summary methods are provided, including print, summary, plot, and testing. |
| License: | MIT + file LICENSE |
| URL: | https://sachsmc.github.io/pseval/ |
| BugReports: | https://github.com/sachsmc/pseval/issues/ |
| Suggests: | ggplot2, testthat, knitr, printr, rmarkdown |
| Imports: | survival |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.2 |
| Encoding: | UTF-8 |
| NeedsCompilation: | no |
| Packaged: | 2025-04-08 07:59:28 UTC; micsac |
| Author: | Michael C Sachs [aut, cre], Erin E Gabriel [aut] |
| Repository: | CRAN |
| Date/Publication: | 2025-04-08 10:20:02 UTC |
Modify a psdesign object by adding on new components.
Description
This operator allows you to add objects to a psdesign object, such as integration models and risk models
Usage
## S3 method for class 'ps'p1 + p2Arguments
p1 | An object of classpsdesign |
p2 | Another object to be added to If the first object is an object of class
|
Treatment efficacy contrast functions
Description
Treatment efficacy contrast functions
Usage
TE(R0, R1)Arguments
R0 | A vector of risks in the control arm |
R1 | A vector of risks in the treatment arm |
Details
These functions take the risk in the two treatment arms, andcomputes a one-dimensional summary of those risks. Built-in choices are"TE" for treatment efficacy = 1 - risk_1(s)/risk_0(s),"RR" forrelative risk = risk_0(s)/risk_1(s),"logRR" for log of the relativerisk, and"RD" for the risk difference = risk_0(s) - risk_1(s).
Value
A vector the same length as R0 and R1.
Bootstrap resampling parameters
Description
Bootstrap resampling parameters
Usage
add_bootstrap(psdesign, bootstrap)Arguments
psdesign | A psdesign object, it must have risk model and integration model components |
bootstrap | A bootstrap object created byps_bootstrap |
Examples
## Not run: test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP)est1 <- test + integrate_parametric(S.1 ~ BIP) + risk_binary() + ps_estimate(method = "BFGS")est1 + ps_bootstrap(method = "BFGS", start = est1$estimates$par, n.boots = 50)## End(Not run)Estimate parameters
Description
Estimate parameters
Usage
add_estimate(psdesign, estimate)Arguments
psdesign | A psdesign object, it must have risk model and integration model components |
estimate | An estimate object created byps_estimate |
Examples
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP)test + integrate_parametric(S.1 ~ BIP) + risk_binary(D = 50) + ps_estimate(method = "BFGS")Integration models
Description
Add integration model to a psdesign object
Usage
add_integration(psdesign, integration)Arguments
psdesign | A psdesign object |
integration | An integration object |
Details
This is a list of the available integration models. The fundamental problem in surrogate evaluation is that there are unobserved values of the counterfactual surrogate responses S(1). In the estimated maximum likelihood framework, for subjects missing the S(1) values, we use an auxiliary pre-treatment variable or set of variables W that is observed for every subject to estimate the distribution of S(1) | W. Typically, this W is a BIP. Then for each missing S(1), we integrate likelihood contributions over each non-missing S(1) given their value of W, and average over the contributions.
integrate_parametric This is a parametric integration model that fits a linear model for the mean of S(1) | W and assumes a Gaussian distribution.
integrate_bivnorm This is another parametric integration model that assumes that S(1) and W are jointly normally distributed. The user must specify their mean, variances and correlation.
integrate_nonparametric This is a non-parametric integration model that is only valid for categorical S(1) and W. It uses the observed proportions to estimate the joint distribution of S(1), W.
integrate_semiparametric This is a semi-parametric model that uses the semi-parametric location scale model of Heagerty and Pepe (1999). Models are specified for the location of S(1) | W and the scale of S(1) | W. Then integrations are drawn from the empirical distribution of the residuals from that model, which are then transformed to the appropriate location and scale.
Examples
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP)add_integration(test, integrate_parametric(S.1 ~ BIP))test + integrate_parametric(S.1 ~ BIP) # same as aboveAdd risk model to a psdesign object
Description
Add risk model to a psdesign object
Usage
add_riskmodel(psdesign, riskmodel)Arguments
psdesign | A psdesign object |
riskmodel | A risk model object, from the list above |
Details
The risk model component specifies the likelihood for the data. This involves specifying the distribution of the outcome variable, whether it is binary or time-to-event, and specifying how the surrogate S(1) and the treatment Z interact and affect the outcome. We use the formula notation to be consistent with other regression type models in R. Below is a list of available risk models.
risk_binary This is a generic risk model for binary outcomes. The user can specify the formula, and link function using eitherrisk.logit for the logistic link, orrisk.probit for the probit link. Custom link functions may also be specified, which take a single numeric vector argument, and returns a vector of corresponding probabilities.
risk_weibull This is a parameterization of the Weibull model for time-to-event outcomes that is consistent with that ofrweibull. The user specifies the formula for the linear predictor of the scale parameter.
risk_exponential This is a simple exponential model for a time-to-event outcome.
risk_poisson This is a Poisson model for count outcomes. It allows for offsets in the formula.
risk_continuous This is a Gaussian model for continuous outcomes. It assumes that larger values of the outcome are harmful (e.g. blood pressure)
Examples
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) + integrate_parametric(S.1 ~ BIP)add_riskmodel(test, risk_binary())test + risk_binary() # same as aboveCalculate the Standardized total gain
Description
Computes the standardized total gain for the risk difference. Optionally produces bootstrap standard errors and confidence intervals. The standardized total gain is the area between the risk difference curve and the horizontal line at the marginal risk difference. If the outcome is time to event then the STG is time-dependent, and a time point for evaluation is needed. If one is not provided then the restricted mean survival is estimated from the data and used.
Usage
calc_STG( psdesign, t, sig.level = 0.05, n.samps = 5000, bootstraps = TRUE, permute = TRUE, permute.times = 2000, progress.bar = TRUE)Arguments
psdesign | A psdesign object. It must contain a risk model, anintegration model, and estimated parameters. Bootstrapped parameters areoptional |
t | For time to event outcomes, a fixed time |
sig.level | Significance level for bootstrap confidence intervals |
n.samps | The number of samples to take over the range of S.1 at whichthe VE is calculated |
bootstraps | If true, and bootstrapped estimates are present, willcalculate bootstrap standard errors and confidence interval. |
permute | Not used, included for backwards compatibility |
permute.times | Not used, included for backwards compatibility |
progress.bar | Not used, included for backwards compatibility |
Calculate the risk and functions of the risk
Description
Computes the treatment efficacy (TE) and other functions of the risk in eachtreatment arm over the range of surrogate values observed in the data. TE(s)is defined as 1 - risk(s, z = 1)/risk(s, z = 0), where z is the treatmentindicator. If any other variables are present in the risk model, then therisk is computed at their median value.
Usage
calc_risk( psdesign, contrast = "TE", t, sig.level = 0.05, CI.type = "band", n.samps = 5000, bootstraps = TRUE, newdata = NULL)Arguments
psdesign | A psdesign object. It must contain a risk model, anintegration model, and estimated parameters. Bootstrapped parameters areoptional |
contrast | The contrast function, or the name of the contrast function.See details. |
t | For time to event outcomes, a fixed time |
sig.level | Significance level for bootstrap confidence intervals |
CI.type | Character string, "pointwise" for pointwise confidenceintervals, and "band" for simultaneous confidence band. |
n.samps | The number of samples to take over the range of S.1 at whichthe contrast is calculated |
bootstraps | If true, and bootstrapped estimates are present, willcalculate bootstrap standard errors and confidence bands. |
newdata | Vector of S values. If present, will calculate the contrastfunction at values of newdata instead of the observed S.1 |
Details
The contrast function is a function that takes 2 inputs, the risk_0and risk_1, and returns some one dimensional function of those two inputs.It must be vectorized. Some built-in functions are"TE" for treatmentefficacy = 1 - risk_1(s)/risk_0(s),"RR" for relative risk =risk_1(s)/risk_0(s),"logRR" for log of the relative risk, and"RD" for the risk difference = risk_1(s) - risk_0(s).
Value
A data frame containing columns for the S values, the computedcontrast function at S, R0, and R1 at those S values, and optionallystandard errors and confidence intervals computed using bootstrappedestimates.
Examples
## Not run: # same result passing function name or functioncalc_risk(binary.boot, contrast = "TE", n.samps = 20)calc_risk(binary.boot, contrast = function(R0, R1) 1 - R1/R0, n.samps = 20)## End(Not run)Compute the empirical Treatment Efficacy
Description
Compute the empirical Treatment Efficacy
Usage
empirical_TE(psdesign, t)Arguments
psdesign | An object of classpsdesign |
t | Fixed time for time to event outcomes to compute TE. If missing, uses restricted mean survival. |
Compute the empirical Treatment Efficacy
Description
Included for backwards compatibility
Usage
empirical_VE(psdesign, t)Arguments
psdesign | An object of classpsdesign |
t | Fixed time for time to event outcomes to compute TE. If missing, uses restricted mean survival. |
Expand augmented data using the integration function
Description
Expand augmented data using the integration function
Usage
expand_augdata(model, psdesign, D = 500)Arguments
model | Formula defining the risk model |
psdesign | An object of classpsdesign, that contains at least 1 integration model |
D | The number of samples to take for the simulated annealing |
Generate sample data used for testing
Description
Generate sample data used for testing
Usage
generate_example_data(n)Arguments
n | Integer, the sample size |
Bivariate normal integration models for the missing S(1)
Description
This model assumes that the pair [S(1), W] is bivariate normal, where W isthe BIP. The means, standard deviations, and correlation are estimated orfixed before calling this function. Then the conditional normal formula isapplied in order to get the distribution of S(1) | W. That distribution isused to integrate over the missing S(1) values. This method requires a BIP in thedesign.
Usage
integrate_bivnorm(x = "S.1", mu = c(0, 0), sd = c(1, 1), rho = 0.2)Arguments
x | expression identifying the variable to be integrated. Typically this is S.1 or S.0 |
mu | means of the pair of surrogates, missing one first |
sd | standard deviations of the pair, missing one first |
rho | the correlation between X1 and X2 |
Nonparametric integration model for the missing S(1)
Description
Both S(1) and the BIP or set of BIPs must be categorical. This model integrates over the estimated distribution of S(1) | BIP
Usage
integrate_nonparametric(formula, ...)Arguments
formula | Formula specifying the integration model for the surrogateunder treatment. Generally the candidate surrogate will be on the left sidein the formula, and the BIP or BIPs will be on the right side. In this casethe BIP and the S(1) must be categorical. |
... | Not currently used |
Parametric integration model for the missing S(1)
Description
Parametric integration model for the missing S(1)
Usage
integrate_parametric(formula, family = gaussian, ...)Arguments
formula | Formula specifying the integration model for the surrogateunder treatment. Generally the candidate surrogate will be on the left sidein the formula, and the BIP or BIPs will be on the right side |
family | Assumed distribution for the integration model. Must becompatible with the |
... | Arguments passed toglm |
Semiparametric integration model using the location-scale model
Description
Semiparametric integration model using the location-scale model
Usage
integrate_semiparametric(formula.location, formula.scale, ...)Arguments
formula.location | Formula specifying the integration model for the location component of the surrogateunder treatment. Generally the candidate surrogate will be on the left sidein the formula, and the BIP or BIPs will be on the right side |
formula.scale | Formula specifying the integration model for the scale component of the surrogateunder treatment. Generally the candidate surrogate will be on the left sidein the formula, and the BIP or BIPs will be on the right side |
... | Other parameters passed tosp_locscale |
Plot summary statistics for a psdesign object
Description
Plot the treatment efficacy or another contrast of risk versus S.1 for anestimated psdesign object
Usage
## S3 method for class 'psdesign'plot( x, t, contrast = "TE", sig.level = 0.05, CI.type = "band", n.samps = 500, xlab = "S.1", ylab = contrast, col = 1, lty = 1, lwd = 1, ...)Arguments
x | Apsdesign object that contains a risk model, integrationmodel, and valid estimates |
t | For time to event outcomes, a fixed time |
contrast | Name of contrast function to plot. |
sig.level | Significance level used for confidence bands on the contrastcurve. This is only used if bootstrapped estimates are available. |
CI.type | Character string, "pointwise" for pointwise confidenceintervals, and "band" for simultaneous confidence band. |
n.samps | Number of samples to use over the range of S.1 for plottingthe curve |
xlab | X-axis label |
ylab | Y-axis label |
col | Vector of integers specifying colors for each curve. |
lty | Vector of integers specifying linetypes for each curve. |
lwd | Vector of numeric values for line widths. |
... | Other arguments passed toplot |
Concisely print information about a psdesign object
Description
Concisely print information about a psdesign object
Usage
## S3 method for class 'psdesign'print(x, digits = 3, sig.level = 0.05, ...)Arguments
x | An object of classpsdesign |
digits | Number of significant digits to display |
sig.level | Significance level to use for computing bootstrapped confidence intervals |
... | Currently unused |
Estimate parameters from a specified model using bootstrap resampling and estimated maximum likelihood
Description
Estimate parameters from a specified model using bootstrap resampling and estimated maximum likelihood
Usage
ps_bootstrap( n.boots = 200, progress.bar = TRUE, start = NULL, method = "BFGS", control = list(), ...)Arguments
n.boots | Number of bootstrap replicates |
progress.bar | Logical, if true will display a progress bar in the console |
start | Vector of starting values, if NULL, will come up with starting values |
method | Method to use for optimization, can be "pseudo-score" forcategorical S with nonparametric integration, or any of the methodsavailable inoptim. Defaults to "BFGS" |
control | List of control parameters for passed tooptim |
... | Arguments passed tooptim |
Estimate parameters from a specified model using estimated maximum likelihood
Description
Estimate parameters from a specified model using estimated maximum likelihood
Usage
ps_estimate(start = NULL, method = "BFGS", control = list(), ...)Arguments
start | Vector of starting values, if NULL, will come up with startingvalues |
method | Method to use for optimization, can be "pseudo-score" forcategorical BIP, or any of the methodsavailable inoptim. Defaults to "BFGS" |
control | List of control parameters for passed tooptim |
... | Arguments passed tooptim orpseudo_score. |
Specify a design for a principal surrogate evaluation
Description
Generate mappings that describe how variables in the data are mapped tocomponents of the principal surrogate analysis. Other thandata, thisis a list of key-value pairs describing the common elements of a ps analysis.The required keys are Z, Y, and S. Optional keys are BIP, CPV, BSM, andweights. These elements are described in details below. Additional keys-valuepairs can be included in.... This function generates an augmenteddataset and additional information for subsequent steps in the analysis. Inthe subsequent steps, refer to the variables by the keys. Seeadd_integration andadd_riskmodel for information on how toproceed in the analysis.
Usage
psdesign( data, Z, Y, S, BIP = NULL, CPV = NULL, BSM = NULL, weights = NULL, tau, ...)Arguments
data | Data frame containing data to be analyzed |
Z | Expression defining the treatment variable which has 2 levels |
Y | Expression defining the outcome variable. For binary events thisshould be coded as 0/1 or a factor with 2 levels. For censoredtime-to-event outcomes this can be a call toSurv |
S | Expression defining the candidate surrogate |
BIP | Optional expression defining the baseline irrelevant predictor |
CPV | Optional expression defining the closeout placebo vaccinationmeasurement |
BSM | Optional expression defining the baseline surrogate measurement |
weights | optional expression defining weights to accommodate nonrandomsubsampling, such as case control or two phase |
tau | numeric, When the outcome Y is a survival time, it is possiblethat the surrogate was measured at some time tau after enrollment. Use theargument tau to specify the time when the surrogate was measured, in studytime. Not required for binary Y. |
... | Other key-value pairs that will be included in the augmented data,e.g. additional candidate surrogates, covariates for adjustment, variablesused for integration |
Estimate parameters from a specified model using pseudo-score
Description
Estimate parameters from a specified model using pseudo-score
Usage
pseudo_score(psdesign, start = NULL, epsilon = 1e-05, maxit = 50)Arguments
psdesign | An object of class psdesign |
start | Vector of starting values, if NULL, will come up with starting values |
epsilon | Convergence criteria |
maxit | Maximum number of iterations |
Logit link function
Description
Logit link function
Usage
risk.logit(x)Arguments
x | A vector of linear predictors |
Value
A vector of probabilities
Probit link function
Description
Probit link function
Usage
risk.probit(x)Arguments
x | A vector of linear predictors |
Value
A vector of probabilities
Risk model for binary outcome
Description
Risk model for binary outcome
Usage
risk_binary(model = Y ~ S.1 * Z, D = 5000, risk = risk.logit)Arguments
model | Formula specifying the risk model |
D | number of samples for the simulated annealing integration |
risk | Function for transforming a linear predictor into a probability.E.g., risk.logit for the logistic model, risk.probit for the probit model |
Risk model for continuous outcome
Description
This model assumes that the outcome Y is normally distributed conditional onS.1 and Z, with mean determined by the model formula. It also assumes thatlarger values of Y are more indicative of poor outcomes, e.g., bloodpressure.
Usage
risk_continuous(model = Y ~ S.1 * Z, D = 5000)Arguments
model | Formula specifying the risk model for the mean |
D | number of samples for the simulated annealing integration |
Exponential risk model for time to event outcome
Description
Exponential risk model for time to event outcome
Usage
risk_exponential(model = Y ~ S.1 * Z, D = 5000)Arguments
model | Formula specifying the risk model. The outcome should be aSurv object specifying right censoring |
D | number of samples for simulated annealing |
Poisson risk model for count outcomes
Description
Poisson risk model for count outcomes
Usage
risk_poisson(model = Y ~ S.1 * Z, D = 5000)Arguments
model | Formula specifying the risk model. The outcome should be an integer of counts. This right side of the formula may contain anoffset term. |
D | number of samples for simulated annealing |
Weibull risk model for time to event outcome
Description
Weibull risk model for time to event outcome
Usage
risk_weibull(model = Y ~ S.1 * Z, D = 5000)Arguments
model | Formula specifying the risk model. The outcome should be aSurv object specifying right censoring |
D | number of samples for simulated annealing |
Calculate risks with handlers for survival data
Description
Calculate risks with handlers for survival data
Usage
riskcalc(risk.function, Y, par, t, dat0, dat1)Arguments
risk.function | Function taking three arguments, a data.frame, parameters, and time. It should return a vector the same number of rows as the data frame |
Y | The outcome variable |
par | the vector of parameter values |
t | Time for a survival outcome, may be missing |
dat0 | Data frame containing S and Z = 1 |
dat1 | Data frame containing S and Z = 0 |
Fit the semi-parametric location-scale model
Description
This estimates the location-scale model as described in Heagerty and Pepe (1999) using the Newton-Raphson method. The location and scale formulas must have the same outcome, but they may have different predictors.
Usage
sp_locscale( formula.location, formula.scale, data, weights, tol = 1e-06, maxit = 100)Arguments
formula.location | Formula specifying the model for the location |
formula.scale | Formula specifying the model for the scale |
data | Data used to estimate the model |
weights | Weights applied to the estimating equations |
tol | Convergence tolerance |
maxit | Maximum number of iterations |
Value
A list containing the parameter estimates, the convergence indicator, and residuals
Compute the standardized total gain
Description
Compute the standardized total gain
Usage
stg(R1, R0, stand = TRUE)Arguments
R1 | Risk in the treatment group |
R0 | Risk in the control group |
stand | Standardize? |
Summarize bootstrap samples
Description
Summarize bootstrap samples
Usage
summarize_bs(bootdf, estdf = NULL, sig.level = 0.05, CI.type = "band")Arguments
bootdf | Data frame containing bootstrapped estimates, with a column containing a convergence indicator |
estdf | Data frame containing full sample estimate |
sig.level | Significance level to use for confidence intervals |
CI.type | Character string, "pointwise" for pointwise confidence intervals, and "band" for simultaneous confidence band. |
Summary method for psdesign objects
Description
Summary method for psdesign objects
Usage
## S3 method for class 'psdesign'summary(object, digits = 3, sig.level = 0.05, ...)Arguments
object | An object of classpsdesign |
digits | Number of significant digits to display |
sig.level | Significance level to use for computing bootstrapped confidence intervals |
... | Currently unused |
Value
Invisibly returns the printed table, along with the three estimates of vaccine efficacy. The empirical TE is 1 minus the relative risk comparing the treatment arm to the control arm. The risk is estimated as the proportion in the binary outcome case, or with the Kaplan-Meier estimate at the restricted mean survival in the time-to-event case. The marginal TE estimate is the TE estimate under the specified parametric risk model, ignoring the effect of S.1. The model based average TE is the TE estimate from the specified risk model, averaged over the distribution of S.1. The point of displaying these three is to assess the validity of the parametric model, and to assess the validity of the model estimation. Wild differences among these estimates may indicate problems with the model or convergence.
Check that a variable is suitable for using as binary treatment indicator
Description
Checks for two classes and gives a warning message indicating which level is assumed to be 0/1
Usage
verify_trt(D)Arguments
D | Vector that will be checked for 2-class labels |
Test for wide effect modification
Description
This runs a multivariate Wald test on the interaction terms of the model, using the bootstrap covariance
Usage
wem_test(x)Arguments
x | An object of classpsdesign with bootstrap replicates |