Movatterモバイル変換

Title:

Tidy Estimation of Heterogeneous Treatment Effects

Version:

1.0.4

Description:

Estimates heterogeneous treatment effects using tidy semantics on experimental or observational data. Methods are based on the doubly-robust learner of Kennedy (2023) <doi:10.1214/23-EJS2157>. You provide a simple recipe for what machine learning algorithms to use in estimating the nuisance functions and 'tidyhte' will take care of cross-validation, estimation, model selection, diagnostics and construction of relevant quantities of interest about the variability of treatment effects.

URL:

https://github.com/ddimmery/tidyhte https://ddimmery.github.io/tidyhte/index.html

BugReports:

https://github.com/ddimmery/tidyhte/issues

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.2

Suggests:

covr, devtools, estimatr, ggplot2, glmnet, knitr, mockr,nprobust, palmerpenguins, quadprog, quickblock, rmarkdown,testthat (≥ 3.0.0), vimp, WeightedROC

Config/testthat/edition:

Imports:

checkmate, dplyr, lifecycle, magrittr, progress, purrr, R6,rlang, SuperLearner, tibble

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-07-29 13:55:01 UTC; drewd

Author:

Drew Dimmery

[aut, cre, cph]

Maintainer:

Drew Dimmery <cran@ddimmery.com>

Repository:

CRAN

Date/Publication:

2025-07-29 19:20:02 UTC

tidyhte: Tidy Estimation of Heterogeneous Treatment Effects

Description

Estimates heterogeneous treatment effects using tidy semantics on experimental or observational data. Methods are based on the doubly-robust learner of Kennedy (2023)doi:10.1214/23-EJS2157. You provide a simple recipe for what machine learning algorithms to use in estimating the nuisance functions and 'tidyhte' will take care of cross-validation, estimation, model selection, diagnostics and construction of relevant quantities of interest about the variability of treatment effects.

Details

The best place to get started withtidyhte isvignette("experimental_analysis") whichwalks through a full analysis of HTE on simulated data, orvignette("methodological_details")which gets into more of the details underlying the method.

Author(s)

Maintainer: Drew Dimmerycran@ddimmery.com (ORCID) [copyright holder]

References

Kennedy, E. H. (2020). Towards optimal doubly robust estimation of heterogeneouscausal effects.arXiv preprint arXiv:2004.14497.

Configuration of a Constant Estimator

Description

Constant_cfg is a configuration class for estimating a constant model.That is, the model is a simple, one-parameter mean model.

Super class

tidyhte::Model_cfg ->Constant_cfg

Public fields

model_class: The class of the model, required for all classeswhich inherit fromModel_cfg.

Methods

Public methods

Method`new()`

Create a newConstant_cfg object.

Usage

Constant_cfg$new()

Returns

A newConstant_cfg object.

Examples

Constant_cfg$new()

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Constant_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `Constant_cfg$new`## ------------------------------------------------Constant_cfg$new()

Configuration of Model Diagnostics

Description

Diagnostics_cfg is a configuration class for estimating a variety ofdiagnostics for the models trained in the course of HTE estimation.

Public fields

ps: Model diagnostics for the propensity score model.
outcome: Model diagnostics for the outcome models.
effect: Model diagnostics for the joint effect model.
params: Parameters for any requested diagnostics.

Methods

Public methods

Method`new()`

Create a newDiagnostics_cfg object with specified diagnostics to estimate.

Usage

Diagnostics_cfg$new(ps = NULL, outcome = NULL, effect = NULL, params = NULL)

Arguments

ps: Model diagnostics for the propensity score model.
outcome: Model diagnostics for the outcome models.
effect: Model diagnostics for the joint effect model.
params: List providing values for parameters to any requested diagnostics.

Returns

A newDiagnostics_cfg object.

Examples

Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE", "RROC"),   ps = c("SL_risk", "SL_coefs", "AUC"))

Method`add()`

Add diagnostics to theDiagnostics_cfg object.

Usage

Diagnostics_cfg$add(ps = NULL, outcome = NULL, effect = NULL)

Arguments

ps: Model diagnostics for the propensity score model.
outcome: Model diagnostics for the outcome models.
effect: Model diagnostics for the joint effect model.

Returns

An updatedDiagnostics_cfg object.

Examples

cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE", "RROC"),   ps = c("SL_risk", "SL_coefs"))cfg <- cfg$add(ps = "AUC")

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Diagnostics_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE", "RROC"),   ps = c("SL_risk", "SL_coefs", "AUC"))## ------------------------------------------------## Method `Diagnostics_cfg$new`## ------------------------------------------------Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE", "RROC"),   ps = c("SL_risk", "SL_coefs", "AUC"))## ------------------------------------------------## Method `Diagnostics_cfg$add`## ------------------------------------------------cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE", "RROC"),   ps = c("SL_risk", "SL_coefs"))cfg <- cfg$add(ps = "AUC")

Predictor class for the cross-fit predictor of "partial" CATEs

Description

Predictor class for the cross-fit predictor of "partial" CATEs

Details

The class makes it easier to manage the K predictors for retrieving K-foldcross-validated estimates, as well as to measure how treatment effects changewhen only a single covariate is changed from its "natural" levels (in the sense"natural" used by the direct / indirect effects literature).

Public fields

models: A list of the K model fits
num_splits: The number of folds used in cross-fitting.
num_mc_samples: The number of samples to retrieve across the covariate space.If num_mc_samples is larger than the sample size, then the entire dataset will be used.
covariates: The unquoted names of the covariates used in the second-stage model.
model_class: The model class (in the sense ofModel_cfg). For instance,a SuperLearner model will have model class "SL".

Methods

Public methods

Method`new()`

FX.predictor is a class which simplifies the management of a set of cross-fitprediction models of treatment effects and provides the ability to get the "partial"effects of particular covariates.

Usage

FX.Predictor$new(models, num_splits, num_mc_samples, covariates, model_class)

Arguments

models: A list of the K model fits.
num_splits: Integer number of cross-fitting folds.
num_mc_samples: Integer number of Monte-Carlo samples across the covariatespace. If this is larger than the sample size, then the whole dataset will be used.
covariates: The unquoted names of the covariates.
model_class: The model class (in the sense ofModel_cfg).

Method`predict()`

Predicts the PCATE surface over a particular covariate, returning a tibble withthe predicted HTE for every Monte-Carlo sample.

Usage

FX.Predictor$predict(data, covariate)

Arguments

data: The full dataset
covariate: The unquoted covariate name for which to calculate predictedtreatment effects.

Returns

A tibble with columns:

covariate_value - The value of the covariate of interest
.hte - An estimated HTE
.id - The identifier for the original row (which hadcovariate modified tocovariate_value).

Method`clone()`

The objects of this class are cloneable with this method.

Usage

FX.Predictor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

R6 class to represent partitions of the data between training and held-out

Description

R6 class to represent partitions of the data between training and held-out

Details

This takes a set of folds calculated elsewhere and representsthese folds in a consistent format.

Public fields

train: A dataframe containing only the training set
holdout: A dataframe containing only the held-out data
in_holdout: A logical vector indicating if the initial datalies in the holdout set.

Methods

Public methods

Method`new()`

Creates an R6 object of the data split between training and test set.

Usage

HTEFold$new(data, split_id)

Arguments

data: The dataset to be split
split_id: An identifier indicating which data should lie in the holdout set.

Returns

Returns an object of classHTEFold

Method`clone()`

The objects of this class are cloneable with this method.

Usage

HTEFold$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Configuration of Quantities of Interest

Description

HTE_cfg is a configuration class that pulls everything together, indicatingthe full configuration for a given HTE analysis. This includes how to estimatemodels and what Quantities of Interest to calculate based off those underlying models.

Public fields

outcome: Model_cfg object indicating how outcome models should be estimated.
treatment: Model_cfg object indicating how the propensity scoremodel should be estimated.
effect: Model_cfg object indicating how the joint effect modelshould be estimated.
qoi: QoI_cfg object indicating what the Quantities of Interestare and providing allnecessary detail on how they should be estimated.
verbose: Logical indicating whether to print debugging information.

Methods

Public methods

Method`new()`

Create a newHTE_cfg object with all necessary information about howto carry out an HTE analysis.

Usage

HTE_cfg$new(  outcome = NULL,  treatment = NULL,  effect = NULL,  qoi = NULL,  verbose = FALSE)

Arguments

outcome: Model_cfg object indicating how outcome models shouldbe estimated.
treatment: Model_cfg object indicating how the propensity scoremodel should be estimated.
effect: Model_cfg object indicating how the joint effect modelshould be estimated.
qoi: QoI_cfg object indicating what the Quantities of Interestare and providing allnecessary detail on how they should be estimated.
verbose: Logical indicating whether to print debugging information.

Examples

mcate_cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))pcate_cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))vimp_cfg <- VIMP_cfg$new()diag_cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE"),   ps = c("SL_risk", "SL_coefs", "AUC"))qoi_cfg <- QoI_cfg$new(    mcate = mcate_cfg,    pcate = pcate_cfg,    vimp = vimp_cfg,    diag = diag_cfg)ps_cfg <- SLEnsemble_cfg$new(   learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))y_cfg <- SLEnsemble_cfg$new(   learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))fx_cfg <- SLEnsemble_cfg$new(   learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))HTE_cfg$new(outcome = y_cfg, treatment = ps_cfg, effect = fx_cfg, qoi = qoi_cfg)

Method`clone()`

The objects of this class are cloneable with this method.

Usage

HTE_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `HTE_cfg$new`## ------------------------------------------------mcate_cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))pcate_cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))vimp_cfg <- VIMP_cfg$new()diag_cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE"),   ps = c("SL_risk", "SL_coefs", "AUC"))qoi_cfg <- QoI_cfg$new(    mcate = mcate_cfg,    pcate = pcate_cfg,    vimp = vimp_cfg,    diag = diag_cfg)ps_cfg <- SLEnsemble_cfg$new(   learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))y_cfg <- SLEnsemble_cfg$new(   learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))fx_cfg <- SLEnsemble_cfg$new(   learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))HTE_cfg$new(outcome = y_cfg, treatment = ps_cfg, effect = fx_cfg, qoi = qoi_cfg)

Configuration for a Kernel Smoother

Description

KernelSmooth_cfg is a configuration class for non-parametric local-linearregression to construct a smooth representation of the relationship betweentwo variables. This is typically used for displaying a surface of the conditionalaverage treatment effect over a continuous covariate.

Kernel smoothing is handled by thenprobust package.

Super class

tidyhte::Model_cfg ->KernelSmooth_cfg

Public fields

model_class: The class of the model, required for all classeswhich inherit fromModel_cfg.
neval: The number of points at which to evaluate the localregression. More points will provide a smoother line at the costof somewhat higher computation.
eval_min_quantile: Minimum quantile at which to evaluate the smoother.

Methods

Public methods

Method`new()`

Create a newKernelSmooth_cfg object with specified number of evaluation points.

Usage

KernelSmooth_cfg$new(neval = 100, eval_min_quantile = 0.05)

Arguments

neval: The number of points at which to evaluate the localregression. More points will provide a smoother line at the costof somewhat higher computation.
eval_min_quantile: Minimum quantile at which to evaluate the smoother.A value of zero will do no clipping. Clipping is performed from both thetop and the bottom of the empirical distribution. A value of alpha wouldevaluate over [alpha, 1 - alpha].

Returns

A newKernelSmooth_cfg object.

Examples

KernelSmooth_cfg$new(neval = 100)

Method`clone()`

The objects of this class are cloneable with this method.

Usage

KernelSmooth_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `KernelSmooth_cfg$new`## ------------------------------------------------KernelSmooth_cfg$new(neval = 100)

Configuration of Known Model

Description

Known_cfg is a configuration class for when a particular model is knowna-priori. The prototypical usage of this class is when heterogeneoustreatment effects are estimated in the context of a randomized controltrial with known propensity scores.

Super class

tidyhte::Model_cfg ->Known_cfg

Public fields

covariate_name: The name of the column in the datasetwhich corresponds to the known model score.
model_class: The class of the model, required for all classeswhich inherit fromModel_cfg.

Methods

Public methods

Method`new()`

Create a newKnown_cfg object with specified covariate column.

Usage

Known_cfg$new(covariate_name)

Arguments

covariate_name: The name of the column, a string, in the datasetcorresponding to the known model score (i.e. the true conditional expectation).

Returns

A newKnown_cfg object.

Examples

Known_cfg$new("propensity_score")

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Known_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `Known_cfg$new`## ------------------------------------------------Known_cfg$new("propensity_score")

Configuration of Marginal CATEs

Description

MCATE_cfg is a configuration class for estimating marginal responsesurfaces based on heterogeneous treatment effect estimates. "Marginal"in this context implies that all other covariates are marginalized.Thus, if two covariates are highly correlated, it is likely that theirMCATE surfaces will be extremely similar.

Public fields

cfgs: Named list of covariates names to aModel_cfg object defininghow to present that covariate's CATE surface (while marginalizingover all other covariates).
std_errors: Boolean indicating whether the results should bereturned with standard errors or not.
estimand: String indicating the estimand to target.

Methods

Public methods

Method`new()`

Create a newMCATE_cfg object with specified model name and hyperparameters.

Usage

MCATE_cfg$new(cfgs, std_errors = TRUE)

Arguments

cfgs: Named list from moderator name to aModel_cfg objectdefining how to present that covariate's CATE surface (whilemarginalizing over all other covariates)
std_errors: Boolean indicating whether the results should be returned with standarderrors or not.

Returns

A newMCATE_cfg object.

Examples

MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))

Method`add_moderator()`

Add a moderator to theMCATE_cfg object. This entails defining a configurationfor displaying the effect surface for that moderator.

Usage

MCATE_cfg$add_moderator(var_name, cfg)

Arguments

var_name: The name of the moderator to add (and the name of the column inthe dataset).
cfg: AModel_cfg defining how to display the selected moderator's effectsurface.

Returns

An updatedMCATE_cfg object.

Examples

cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))cfg <- cfg$add_moderator("x2", KernelSmooth_cfg$new(neval = 100))

Method`clone()`

The objects of this class are cloneable with this method.

Usage

MCATE_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))## ------------------------------------------------## Method `MCATE_cfg$new`## ------------------------------------------------MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))## ------------------------------------------------## Method `MCATE_cfg$add_moderator`## ------------------------------------------------cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))cfg <- cfg$add_moderator("x2", KernelSmooth_cfg$new(neval = 100))

Base Class of Model Configurations

Description

Model_cfg is the base class from which all other model configurationsinherit.

Public fields

model_class: The class of the model, required for all classeswhich inherit fromModel_cfg.

Methods

Public methods

Method`new()`

Create a newModel_cfg object with any necessary parameters.

Usage

Model_cfg$new()

Returns

A newModel_cfg object.

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Model_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

R6 class to represent data to be used in estimating a model

Description

R6 class to represent data to be used in estimating a model

Details

This class provides consistent names and interfaces to data which willbe used in a supervised regression / classification model.

Public fields

label: The labels for the eventual model as a vector.
features: The matrix representation of the data to be used for model fitting.Constructed usingstats::model.matrix.
model_frame: The data-frame representation of the data as constructed bystats::model.frame.
split_id: The split identifiers as a vector.
num_splits: The integer number of splits in the data.
cluster: A cluster ID as a vector, constructed using the unit identifiers.
weights: The case-weights as a vector.

Methods

Public methods

Method`new()`

Creates an R6 object to represent data to be used in a prediction model.

Usage

Model_data$new(data, label_col, ..., .weight_col = NULL)

Arguments

data: The full dataset to populate the class with.
label_col: The unquoted name of the column to use as the label insupervised learning models.
...: The unquoted names of features to use in the model.
.weight_col: The unquoted name of the column to use as case-weightsin subsequent models.

Returns

AModel_data object.

Examples

library("dplyr")df <- dplyr::tibble(    uid = 1:100,    x1 = rnorm(100),    x2 = rnorm(100),    x3 = sample(4, 100, replace = TRUE)) %>% dplyr::mutate(    y = x1 + x2 + x3 + rnorm(100),    x3 = factor(x3))df <- make_splits(df, uid, .num_splits = 5)data <- Model_data$new(df, y, x1, x2, x3)

Method`SL_cv_control()`

A helper function to create the cross-validation options to be used by SuperLearner.

Usage

Model_data$SL_cv_control()

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Model_data$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `Model_data$new`## ------------------------------------------------library("dplyr")df <- dplyr::tibble(    uid = 1:100,    x1 = rnorm(100),    x2 = rnorm(100),    x3 = sample(4, 100, replace = TRUE)) %>% dplyr::mutate(    y = x1 + x2 + x3 + rnorm(100),    x3 = factor(x3))df <- make_splits(df, uid, .num_splits = 5)data <- Model_data$new(df, y, x1, x2, x3)

Configuration of Partial CATEs

Description

PCATE_cfg is a configuration class for estimating marginalresponse surfaces based on heterogeneous treatment effect estimates."Partial" in this context is used similarly to the use in partialdependence plots or in partial regression. In essence, a PCATEattempts to partial out the contribution to the CATE from all othercovariates. Two highly correlated variables may have very differentPCATE surfaces.

Public fields

cfgs: Named list of covariates names to aModel_cfg object defininghow to present that covariate's CATE surface.
model_covariates: A character vector of all the covariatesto be included in the second-level effect regression.
num_mc_samples: A named list from covariate name to the numberof Monte Carlo samples to take to calculate the double integral (See Details).
estimand: String indicating the estimand to target.

Methods

Public methods

Method`new()`

Create a newPCATE_cfg object with specified model name and hyperparameters.

Usage

PCATE_cfg$new(model_covariates, cfgs, num_mc_samples = 100)

Arguments

model_covariates: A character vector of all the covariates to beincluded in the second-level effect regression.
cfgs: Named list from moderator name to aModel_cfg object defining how topresent that covariate's CATE surface.
num_mc_samples: A named list from covariate name to the number of Monte Carlosamples to take to calculate the double integral (See Details). If all covariatesshould use the same number of samples, simply pass the (integer) number of samples.
effect_cfg: AModel_cfg object indicating how to fit the second level effectregression (joint across all selected covariates).

Returns

A newPCATE_cfg object.

Examples

PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))

Method`add_moderator()`

Add a moderator to thePCATE_cfg object. This entails adding it to the jointmodel of effects and defines a configuration for displaying the effect surfacefor that moderator.

Usage

PCATE_cfg$add_moderator(var_name, cfg)

Arguments

var_name: The name of the moderator to add (and the name of the column inthe dataset).
cfg: AModel_cfg defining how to display the selected moderator's effectsurface.

Returns

An updatedPCATE_cfg object.

Examples

cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))cfg <- cfg$add_moderator("x2", KernelSmooth_cfg$new(neval = 100))

Method`clone()`

The objects of this class are cloneable with this method.

Usage

PCATE_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))## ------------------------------------------------## Method `PCATE_cfg$new`## ------------------------------------------------PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))## ------------------------------------------------## Method `PCATE_cfg$add_moderator`## ------------------------------------------------cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))cfg <- cfg$add_moderator("x2", KernelSmooth_cfg$new(neval = 100))

Configuration of Quantities of Interest

Description

QoI_cfg is a configuration class for the Quantities of Interest to begenerated by the HTE analysis.

Public fields

mcate: A configuration object of typeMCATE_cfg ofmarginal effects to calculate.
pcate: A configuration object of typePCATE_cfg ofpartial effects to calculate.
vimp: A configuration object of typeVIMP_cfg ofvariable importance to calculate.
diag: A configuration object of typeDiagnostics_cfg ofmodel diagnostics to calculate.
ate: Logical flag indicating whether an estimate of theATE should be returned.
predictions: Logical flag indicating whether estimates ofthe CATE for every unit should be returned.

Methods

Public methods

Method`new()`

Create a newQoI_cfg object with specified Quantities of Interestto estimate.

Usage

QoI_cfg$new(  mcate = NULL,  pcate = NULL,  vimp = NULL,  diag = NULL,  ate = TRUE,  predictions = FALSE)

Arguments

mcate: A configuration object of typeMCATE_cfg of marginaleffects to calculate.
pcate: A configuration object of typePCATE_cfg of partialeffects to calculate.
vimp: A configuration object of typeVIMP_cfg of variableimportance to calculate.
diag: A configuration object of typeDiagnostics_cfg ofmodel diagnostics to calculate.
ate: A logical flag for whether to calculate the AverageTreatment Effect (ATE) or not.
predictions: A logical flag for whether to return predictionsof the CATE for every unit or not.

Returns

A newDiagnostics_cfg object.

Examples

mcate_cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))pcate_cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))vimp_cfg <- VIMP_cfg$new()diag_cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE"),   ps = c("SL_risk", "SL_coefs", "AUC"))QoI_cfg$new(    mcate = mcate_cfg,    pcate = pcate_cfg,    vimp = vimp_cfg,    diag = diag_cfg)

Method`clone()`

The objects of this class are cloneable with this method.

Usage

QoI_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

mcate_cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))pcate_cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))vimp_cfg <- VIMP_cfg$new()diag_cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE"),   ps = c("SL_risk", "SL_coefs", "AUC"))QoI_cfg$new(    mcate = mcate_cfg,    pcate = pcate_cfg,    vimp = vimp_cfg,    diag = diag_cfg)## ------------------------------------------------## Method `QoI_cfg$new`## ------------------------------------------------mcate_cfg <- MCATE_cfg$new(cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)))pcate_cfg <- PCATE_cfg$new(   cfgs = list(x1 = KernelSmooth_cfg$new(neval = 100)),   model_covariates = c("x1", "x2", "x3"),   num_mc_samples = list(x1 = 100))vimp_cfg <- VIMP_cfg$new()diag_cfg <- Diagnostics_cfg$new(   outcome = c("SL_risk", "SL_coefs", "MSE"),   ps = c("SL_risk", "SL_coefs", "AUC"))QoI_cfg$new(    mcate = mcate_cfg,    pcate = pcate_cfg,    vimp = vimp_cfg,    diag = diag_cfg)

Elastic net regression with pairwise interactions

Description

Penalized regression using elastic net. Alpha = 0 corresponds to ridgeregression and alpha = 1 corresponds to Lasso. Included in the modelare pairwise interactions between covariates.

Seevignette("glmnet_beta", package = "glmnet") for a nice tutorial onglmnet.

Usage

SL.glmnet.interaction(  Y,  X,  newX,  family,  obsWeights,  id,  alpha = 1,  nfolds = 10,  nlambda = 100,  useMin = TRUE,  loss = "deviance",  ...)

Arguments

Y

Outcome variable

X

Covariate dataframe

newX

Dataframe to predict the outcome

family

"gaussian" for regression, "binomial" for binaryclassification. Untested options: "multinomial" for multiple classificationor "mgaussian" for multiple response, "poisson" for non-negative outcomewith proportional mean and variance, "cox".

obsWeights

Optional observation-level weights

id

Optional id to group observations from the same unit (not usedcurrently).

alpha

Elastic net mixing parameter, range [0, 1]. 0 = ridge regressionand 1 = lasso.

nfolds

Number of folds for internal cross-validation to optimize lambda.

nlambda

Number of lambda values to check, recommended to be 100 or more.

useMin

If TRUE use lambda that minimizes risk, otherwise use 1standard-error rule which chooses a higher penalty with performance withinone standard error of the minimum (see Breiman et al. 1984 on CART forbackground).

loss

Loss function, can be "deviance", "mse", or "mae". If family =binomial can also be "auc" or "class" (misclassification error).

...

Any additional arguments are passed through to cv.glmnet.

Configuration for a SuperLearner Ensemble

Description

SLEnsemble_cfg is a configuration class for estimation of a modelusing an ensemble of models usingSuperLearner.

Super class

tidyhte::Model_cfg ->SLEnsemble_cfg

Public fields

cvControl: A list of parameters for controlling thecross-validation used in SuperLearner.
SL.library: A vector of the names of learners toinclude in the SuperLearner ensemble.
SL.env: An environment containing all of the programmaticallygenerated learners to be includedin the SuperLearner ensemble.
family: stats::family object to determine how SuperLearnershould be fitted.
model_class: The class of the model, required for all classeswhich inherit fromModel_cfg.

Methods

Public methods

Method`new()`

Create a newSLEnsemble_cfg object with specified settings.

Usage

SLEnsemble_cfg$new(  cvControl = NULL,  learner_cfgs = NULL,  family = stats::gaussian())

Arguments

cvControl: A list of parameters for controlling thecross-validation used in SuperLearner.For more details, seeSuperLearner::SuperLearner.CV.control.
learner_cfgs: A list ofSLLearner_cfg objects.
family: stats::family object to determine how SuperLearner should be fitted.

Returns

A newSLEnsemble_cfg object.

Examples

SLEnsemble_cfg$new(learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))

Method`add_sublearner()`

Adds a model (or class of models) to the SuperLearner ensemble.If hyperparameter values are specified, this method willadd a learner for every element in the cross-product of providedhyperparameter values.

Usage

SLEnsemble_cfg$add_sublearner(learner_name, hps = NULL)

Arguments

learner_name: Possible valuesuseSuperLearner naming conventions. A full list is availablewithSuperLearner::listWrappers("SL")
hps: A named list of hyper-parameters. Every element of thecross-product of these hyper-parameters will be included in theensemble.cfg <- SLEnsemble_cfg$new(learner_cfgs = list(SLLearner_cfg$new("SL.glm")))cfg <- cfg$add_sublearner("SL.gam", list(deg.gam = c(2, 3)))

Method`clone()`

The objects of this class are cloneable with this method.

Usage

SLEnsemble_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

SLEnsemble_cfg$new(learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))## ------------------------------------------------## Method `SLEnsemble_cfg$new`## ------------------------------------------------SLEnsemble_cfg$new(learner_cfgs = list(SLLearner_cfg$new("SL.glm"), SLLearner_cfg$new("SL.gam")))

Configuration of SuperLearner Submodel

Description

SLLearner_cfg is a configuration class for a singlesublearner to be included in SuperLearner. By constructing with a named listof hyperparameters, this configuration allows distinct submodelsfor each unique combination of hyperparameters. To understand what modelsand hyperparameters are available, examine the methods listed inSuperLearner::listWrappers("SL").

Public fields

model_name: The name of the model as passed toSuperLearnerthrough theSL.library parameter.
hyperparameters: Named list from hyperparameter name to a vector ofvalues that should be swept over.

Methods

Public methods

Method`new()`

Create a newSLLearner_cfg object with specified model name and hyperparameters.

Usage

SLLearner_cfg$new(model_name, hp = NULL)

Arguments

model_name: The name of the model as passed toSuperLearnerthrough theSL.library parameter.
hp: Named list from hyperparameter name to a vector of values that should beswept over. Hyperparameters not included in this list are left at their SuperLearnerdefault values.

Returns

A newSLLearner_cfg object.

Examples

SLLearner_cfg$new("SL.glm")SLLearner_cfg$new("SL.gam", list(deg.gam = c(2, 3)))

Method`clone()`

The objects of this class are cloneable with this method.

Usage

SLLearner_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `SLLearner_cfg$new`## ------------------------------------------------SLLearner_cfg$new("SL.glm")SLLearner_cfg$new("SL.gam", list(deg.gam = c(2, 3)))

Configuration for a Stratification Estimator

Description

Stratified_cfg is a configuration class for stratifying a covariateand calculating statistics within each cell.

Super class

tidyhte::Model_cfg ->Stratified_cfg

Public fields

model_class: The class of the model, required for all classeswhich inherit fromModel_cfg.
covariate: The name of the column in the datasetwhich corresponds to the covariate on which to stratify.

Methods

Public methods

Method`new()`

Create a newStratified_cfg object with specified number of evaluation points.

Usage

Stratified_cfg$new(covariate)

Arguments

covariate: The name of the column in the datasetwhich corresponds to the covariate on which to stratify.

Returns

A newStratified_cfg object.

Examples

Stratified_cfg$new(covariate = "test_covariate")

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Stratified_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------## Method `Stratified_cfg$new`## ------------------------------------------------Stratified_cfg$new(covariate = "test_covariate")

Configuration of Variable Importance

Description

VIMP_cfg is a configuration class for estimating a variable importance measureacross all moderators. This provides a meaningful measure of which moderatorsexplain the most of the CATE surface.

Public fields

estimand: String indicating the estimand to target.
sample_splitting: Logical indicating whether to use samplesplitting in the calculation of variable importance.
linear: Logical indicating whether the variable importanceassuming a linear model should be estimated.

Methods

Public methods

Method`new()`

Create a newVIMP_cfg object with specified model configuration.

Usage

VIMP_cfg$new(sample_splitting = TRUE, linear_only = FALSE)

Arguments

sample_splitting: Logical indicating whether to use sample splittingin the calculation of variable importance. Choosing not to use samplesplitting means that inference will only be valid for moderators withnon-null importance.
linear_only: Logical indicating whether the variable importanceshould use only a single linear-only model. Variable importance measurewill only be consistent for the population quantity if the true modelof pseudo-outcomes is linear.

Returns

A newVIMP_cfg object.

Examples

VIMP_cfg$new()

Method`clone()`

The objects of this class are cloneable with this method.

Usage

VIMP_cfg$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Williamson, B. D., Gilbert, P. B., Carone, M., & Simon, N. (2021).Nonparametric variable importance assessment using machine learning techniques.Biometrics, 77(1), 9-22.
Williamson, B. D., Gilbert, P. B., Simon, N. R., & Carone, M. (2021).A general framework for inference on algorithm-agnostic variable importance.Journal of the American Statistical Association, 1-14.

Examples

VIMP_cfg$new()## ------------------------------------------------## Method `VIMP_cfg$new`## ------------------------------------------------VIMP_cfg$new()

Add an additional diagnostic to the effect model

Description

This adds a diagnostic to the effect model.

Usage

add_effect_diagnostic(hte_cfg, diag)

Arguments

hte_cfg

HTE_cfg object to update.

diag

Character indicating the name of the diagnosticto include. Possible values are"MSE","RROC" and, forSuperLearner ensembles,"SL_risk" and"SL_coefs".

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_effect_diagnostic("RROC") -> hte_cfg

Add an additional model to the joint effect ensemble

Description

This adds a learner to the ensemble used for estimating a modelof the conditional expectation of the pseudo-outcome.

Usage

add_effect_model(hte_cfg, model_name, ...)

Arguments

hte_cfg

HTE_cfg object to update.

model_name

Character indicating the name of the model toincorporate into the joint effect ensemble. Possible valuesuseSuperLearner naming conventions. A full list is availablewithSuperLearner::listWrappers("SL")

...

Parameters over which to grid-search for this model class.

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_effect_model("SL.glm.interaction") -> hte_cfg

Uses a known propensity score

Description

This replaces the propensity score model with a known valueof the propensity score.

Usage

add_known_propensity_score(hte_cfg, covariate_name)

Arguments

hte_cfg

HTE_cfg object to update.

covariate_name

Character indicating the name of the covariatename in the dataframe corresponding to the known propensity score.

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_known_propensity_score("ps") -> hte_cfg

Adds moderators to the configuration

Description

This adds a definition about how to display a moderators tothe MCATE config. A moderator is any variable that you want to view informationabout CATEs with respect to.

Usage

add_moderator(hte_cfg, model_type, ..., .model_arguments = NULL)

Arguments

hte_cfg

HTE_cfg object to update.

model_type

Character indicating the model type for these moderators.Currently two model types are supported:"Stratified" for discrete moderatorsand"KernelSmooth" for continuous ones.

...

The (unquoted) names of the moderator variables.

.model_arguments

A named list from argument name to value to pass into theconstructor for the model. SeeStratified_cfg andKernelSmooth_cfg for more details.

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_moderator("Stratified", x2, x3) %>%   add_moderator("KernelSmooth", x1, x4, x5) -> hte_cfg

Add an additional diagnostic to the outcome model

Description

This adds a diagnostic to the outcome model.

Usage

add_outcome_diagnostic(hte_cfg, diag)

Arguments

hte_cfg

HTE_cfg object to update.

diag

Character indicating the name of the diagnosticto include. Possible values are"MSE","RROC" and, forSuperLearner ensembles,"SL_risk" and"SL_coefs".

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_outcome_diagnostic("RROC") -> hte_cfg

Add an additional model to the outcome ensemble

Description

This adds a learner to the ensemble used for estimating a modelof the conditional expectation of the outcome.

Usage

add_outcome_model(hte_cfg, model_name, ...)

Arguments

hte_cfg

HTE_cfg object to update.

model_name

Character indicating the name of the model toincorporate into the outcome ensemble. Possible valuesuseSuperLearner naming conventions. A full list is availablewithSuperLearner::listWrappers("SL")

...

Parameters over which to grid-search for this model class.

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_outcome_model("SL.glm.interaction") -> hte_cfg

Add an additional diagnostic to the propensity score

Description

This adds a diagnostic to the propensity score.

Usage

add_propensity_diagnostic(hte_cfg, diag)

Arguments

hte_cfg

HTE_cfg object to update.

diag

Character indicating the name of the diagnosticto include. Possible values are"MSE","AUC" and, forSuperLearner ensembles,"SL_risk" and"SL_coefs".

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_propensity_diagnostic(c("AUC", "MSE")) -> hte_cfg

Add an additional model to the propensity score ensemble

Description

This adds a learner to the ensemble used for estimating propensityscores.

Usage

add_propensity_score_model(hte_cfg, model_name, ...)

Arguments

hte_cfg

HTE_cfg object to update.

model_name

Character indicating the name of the model toincorporate into the propensity score ensemble. Possible valuesuseSuperLearner naming conventions. A full list is availablewithSuperLearner::listWrappers("SL")

...

Parameters over which to grid-search for this model class.

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_propensity_score_model("SL.glmnet", alpha = c(0, 0.5, 1)) -> hte_cfg

Adds variable importance information

Description

This adds a variable importance quantity of interest to the outputs.

Usage

add_vimp(hte_cfg, sample_splitting = TRUE, linear_only = FALSE)

Arguments

hte_cfg

HTE_cfg object to update.

sample_splitting

Logical indicating whether to use sample splitting or not.Choosing not to use sample splitting means that inference will only be valid formoderators with non-null importance.

linear_only

Logical indicating whether the variable importance should use only a singlelinear-only model. Variable importance measure will only be consistent for the populationquantity if the true model of pseudo-outcomes is linear.

Value

UpdatedHTE_cfg object

References

Williamson, B. D., Gilbert, P. B., Carone, M., & Simon, N. (2021).Nonparametric variable importance assessment using machine learning techniques.Biometrics, 77(1), 9-22.
Williamson, B. D., Gilbert, P. B., Simon, N. R., & Carone, M. (2021).A general framework for inference on algorithm-agnostic variable importance.Journal of the American Statistical Association, 1-14.

Examples

library("dplyr")basic_config() %>%   add_vimp(sample_splitting = FALSE) -> hte_cfg

Attach an`HTE_cfg` to a dataframe

Description

This adds a configuration attribute to a dataframe for HTE estimation.This configuration details the full analysis of HTE that should be performed.

Usage

attach_config(data, .HTE_cfg)

Arguments

data

dataframe

.HTE_cfg

HTE_cfg object representing the full configuration of the HTE analysis.

Details

For information about how to set up anHTE_cfg object, see the Recipe APIdocumentationbasic_config().

To see an example analysis, readvignette("experimental_analysis") in the contextof an experiment,vignette("experimental_analysis") for an observational study, orvignette("methodological_details") for a deeper dive under the hood.

Examples

library("dplyr")if(require("palmerpenguins")) {data(package = 'palmerpenguins')penguins$unitid = seq_len(nrow(penguins))penguins$propensity = rep(0.5, nrow(penguins))penguins$treatment = rbinom(nrow(penguins), 1, penguins$propensity)cfg <- basic_config() %>% add_known_propensity_score("propensity") %>%add_outcome_model("SL.glm.interaction") %>%remove_vimp()attach_config(penguins, cfg) %>%make_splits(unitid, .num_splits = 4) %>%produce_plugin_estimates(outcome = body_mass_g, treatment = treatment, species, sex) %>%construct_pseudo_outcomes(body_mass_g, treatment) %>%estimate_QoI(species, sex)}

Create a basic config for HTE estimation

Description

This provides a basic recipe for HTE estimation that canbe extended by providing additional information about modelsto be estimated and what quantities of interest should bereturned based on those models. This basic model includesonly linear models for nuisance function estimation, andbasic diagnostics.

Usage

basic_config()

Details

Additional models, diagnostics and quantities of interest shouldbe added using their respective helper functions provided as partof the Recipe API.

Value

HTE_cfg object

Examples

library("dplyr")basic_config() %>%   add_known_propensity_score("ps") %>%   add_outcome_model("SL.glm.interaction") %>%   add_outcome_model("SL.glmnet", alpha = c(0.05, 0.15, 0.2, 0.25, 0.5, 0.75)) %>%   add_outcome_model("SL.glmnet.interaction", alpha = c(0.05, 0.15, 0.2, 0.25, 0.5, 0.75)) %>%   add_outcome_diagnostic("RROC") %>%   add_effect_model("SL.glm.interaction") %>%   add_effect_model("SL.glmnet", alpha = c(0.05, 0.15, 0.2, 0.25, 0.5, 0.75)) %>%   add_effect_model("SL.glmnet.interaction", alpha = c(0.05, 0.15, 0.2, 0.25, 0.5, 0.75)) %>%   add_effect_diagnostic("RROC") %>%   add_moderator("Stratified", x2, x3) %>%   add_moderator("KernelSmooth", x1, x4, x5) %>%   add_vimp(sample_splitting = FALSE) -> hte_cfg

Calculates a SATE and a PATE using AIPW

Description

This function takes fully prepared data (with all auxilliary columns from thenecessary models) and estimates average treatment effects using AIPW.

Usage

calculate_ate(data)

Arguments

data

The dataset of interest after it has been prepared fully.

References

Kennedy, E. H. (2020). Towards optimal doubly robust estimation of heterogeneouscausal effects.arXiv preprint arXiv:2004.14497.
Tsiatis, A. A., Davidian, M., Zhang, M., & Lu, X. (2008). Covariate adjustmentfor two‐sample treatment comparisons in randomized clinical trials: a principledyet flexible approach.Statistics in medicine, 27(23), 4658-4677.

Calculate diagnostics

Description

This function calculates the diagnostics requested by theDiagnostics_cfg object.

Usage

calculate_diagnostics(data, treatment, outcome, .diag.cfg)

Arguments

data

Data frame with all additional columns (such as model predictions) included.

treatment

Unquoted treatment variable name

outcome

Unquoted outcome variable name

.diag.cfg

Diagnostics_cfg object

Value

Returns a tibble with columns:

estimand - Character indicating the diagnostic that was calculated
level - Indicates the scope of this diagnostic (e.g. does it applyonly to the model of the outcome under treatment).
term - Indicates a more granular descriptor of what the value is for,such as the specific model within the SuperLearner ensemble.
estimate - Point estimate of the diagnostic.
std_error - Standard error of the diagnostic.

Calculate Linear Variable Importance of HTEs

Description

calculate_linear_vimp estimates the linear hypothesis test of removing a particular moderatorfrom a linear model containing all moderators. Unlikecalculate_vimp, this will only beunbiased and have correct asymptotic coverage rates if the true model is linear. This linearapproach is also substantially faster, so may be useful when prototyping an analysis.

Usage

calculate_linear_vimp(  full_data,  weight_col,  pseudo_outcome,  ...,  .VIMP_cfg,  .Model_cfg)

Arguments

full_data

dataframe

weight_col

Unquoted name of the weight column.

pseudo_outcome

Unquoted name of the pseudo-outcome.

...

Unquoted names of covariates to include in the joint effect model.The variable importance will be calculated for each of these covariates.

.VIMP_cfg

AVIMP_cfg object defining how VIMP should be estimated.

.Model_cfg

AModel_cfg object defining how the joint effect model should be estimated.

References

Williamson, B. D., Gilbert, P. B., Carone, M., & Simon, N. (2021).Nonparametric variable importance assessment using machine learning techniques.Biometrics, 77(1), 9-22.
Williamson, B. D., Gilbert, P. B., Simon, N. R., & Carone, M. (2021).A general framework for inference on algorithm-agnostic variable importance.Journal of the American Statistical Association, 1-14.

Calculate "partial" CATE estimates

Description

Usage

calculate_pcate_quantities(  full_data,  .weights,  .outcome,  fx_model,  ...,  .MCATE_cfg)

Regression ROC Curve calculation

Description

This function calculates the RegressionROC Curve ofof Hernández-Orallodoi:10.1016/j.patcog.2013.06.014.It provides estimates for the positive and negativeerrors when predictions are shifted by a varietyof constants (which range across the domain of observedresiduals). Curves closer to the axes are, in general, to bepreferred. In general, this curve provides a simple way tovisualize the error properties of a regression model.

Usage

calculate_rroc(label, prediction, nbins = 100)

Arguments

label

True label

prediction

Model prediction of the label (out of sample)

nbins

Number of shift values to sweep over

Details

The dot shows the errors when no shift is applied, correspondingto the base model predictions.

Value

A tibble withnbins rows.

References

Hernández-Orallo, J. (2013). ROC curves for regression.Pattern Recognition, 46(12), 3395-3411.

Calculate Variable Importance of HTEs

Description

calculate_vimp estimates the reduction in (population) $R^2$ fromremoving a particular moderator from a model containing all moderators.

Usage

calculate_vimp(  full_data,  weight_col,  pseudo_outcome,  ...,  .VIMP_cfg,  .Model_cfg)

Arguments

full_data

dataframe

weight_col

Unquoted name of the weight column.

pseudo_outcome

Unquoted name of the pseudo-outcome.

...

Unquoted names of covariates to include in the joint effect model.The variable importance will be calculated for each of these covariates.

.VIMP_cfg

AVIMP_cfg object defining how VIMP should be estimated.

.Model_cfg

AModel_cfg object defining how the joint effect model should be estimated.

References

Williamson, B. D., Gilbert, P. B., Carone, M., & Simon, N. (2021).Nonparametric variable importance assessment using machine learning techniques.Biometrics, 77(1), 9-22.
Williamson, B. D., Gilbert, P. B., Simon, N. R., & Carone, M. (2021).A general framework for inference on algorithm-agnostic variable importance.Journal of the American Statistical Association, 1-14.

Checks that a dataframe has an attached configuration for HTEs

Description

This helper function ensures that the provided dataframe hasthe necessary auxilliary configuration information for HTEestimation.

Usage

check_data_has_hte_cfg(data)

Arguments

data

Dataframe of interest.

Value

Returns NULL. Errors if a problem is discovered.

Checks that an appropriate identifier has been provided

Description

This helper function makes a few simple checks to identify obviousissues with the way provided column of unit identifiers.

Usage

check_identifier(data, id_col)

Arguments

data

Dataframe of interest.

id_col

Quoted name of identifier column.

Value

Returns NULL. Errors if a problem is discovered.

Checks that nuisance models have been estimated and exist in the supplied dataset.

Description

This helper function makes a few simple checks to identify obviousissues with the way that nuisance functions are created and prepared.

Usage

check_nuisance_models(data)

Arguments

data

Dataframe which should have appropriate columns of nuisance functionpredictions:.pi_hat,.mu0_hat, and.mu1_hat

Value

Returns NULL. Errors if a problem is discovered.

Checks that splits have been properly created.

Description

This helper function makes a few simple checks to identify obviousissues with the way that splits have been made in the supplied data.

Usage

check_splits(data)

Arguments

data

Dataframe which should have appropriate.split_id column.

Value

Returns NULL. Errors if a problem is discovered.

Checks that an appropriate weighting variable has been provided

Description

This helper function makes a few simple checks to identify obviousissues with the weights provided.

Usage

check_weights(data, weight_col)

Arguments

data

Dataframe of interest.

weight_col

Quoted name of weights column.

Value

Returns NULL. Errors if a problem is discovered.

Construct Pseudo-outcomes

Description

construct_pseudo_outcomes takes a dataset which has been preparedwith plugin estimators of nuisance parameters and transforms these intoa "pseudo-outcome": an unbiased estimator of the conditional averagetreatment effect under exogeneity.

Usage

construct_pseudo_outcomes(data, outcome, treatment, type = "dr")

Arguments

data

dataframe (already prepared withattach_config,make_splits,andproduce_plugin_estimates)

outcome

Unquoted name of outcome variable.

treatment

Unquoted name of treatment variable.

type

String representing how to construct the pseudo-outcome. Validvalues are "dr" (the default), "ipw" and "plugin". See "Details" for morediscussion of these options.

Details

Taking averages of these pseudo-outcomes (or fitting a model to them)will approximate averages (or models) of the underlying treatment effect.

Estimate Quantities of Interest

Description

estimate_QoI takes a dataframe already prepared with split IDs,plugin estimates and pseudo-outcomes and calculates the requestedquantities of interest (QoIs).

Usage

estimate_QoI(data, ...)

Arguments

data

data frame (already prepared withattach_config,make_splits,produce_plugin_estimates andconstruct_pseudo_outcomes)

...

Unquoted names of moderators to calculate QoIs for.

Details

Examples

library("dplyr")if(require("palmerpenguins")) {data(package = 'palmerpenguins')penguins$unitid = seq_len(nrow(penguins))penguins$propensity = rep(0.5, nrow(penguins))penguins$treatment = rbinom(nrow(penguins), 1, penguins$propensity)cfg <- basic_config() %>% add_known_propensity_score("propensity") %>%add_outcome_model("SL.glm.interaction") %>%remove_vimp()attach_config(penguins, cfg) %>%make_splits(unitid, .num_splits = 4) %>%produce_plugin_estimates(outcome = body_mass_g, treatment = treatment, species, sex) %>%construct_pseudo_outcomes(body_mass_g, treatment) %>%estimate_QoI(species, sex)}

Function to calculate diagnostics based on model outputs

Description

This function defines the calculations of common model diagnosticswhich are available.

Usage

estimate_diagnostic(data, label, prediction, diag_name, params)

Arguments

data

The full data frame with all auxilliary columns.

label

The (string) column name for the labels to evaluate against.

prediction

The (string) column name of predictions from the model to diagnose.

diag_name

The (string) name of the diagnostic to calculate. Currentlyavailable are "AUC", "MSE", "SL_coefs", "SL_risk", "RROC"

params

Any other necessary options to pass to the given diagnostic.

Examples

df <- dplyr::tibble(y = rbinom(100, 1, 0.5), p = rep(0.5, 100), w = rexp(100), u = 1:100)attr(df, "weights") <- "w"attr(df, "identifier") <- "u"estimate_diagnostic(df, "y", "p", "AUC")

Fits a treatment effect model using the appropriate settings

Description

This function prepares data, fits the appropriate model and returns theresulting estimates in a standardized format.

Usage

fit_effect(full_data, weight_col, fx_col, ..., .Model_cfg)

Arguments

full_data

The full dataset of interest for the modelling problem.

weight_col

The unquoted weighting variable name to use in model fitting.

fx_col

The unquoted column name of the pseudo-outcome.

...

The unquoted names of covariates to use in the model.

.Model_cfg

AModel_cfg object configuring the appropriate model type to use.

Value

A list with one element,fx. This element contains aPredictor object ofthe appropriate subclass corresponding to theModel_cfg fit to the data.

Fit a predictor for treatment effects

Description

This function predicts treatment effects in a second stage model.

Usage

fit_fx_predictor(full_data, weights, psi_col, ..., .pcate.cfg, .Model_cfg)

Arguments

full_data

The full original data with all auxilliary columns.

weights

Weights to be used in the analysis.

psi_col

The unquoted column name of the calculated pseudo-outcome.

...

Covariate data, passed in as the unquoted names of columns infull_data

.pcate.cfg

APCATE_cfg object describing what PCATEs to calculate (and how)

.Model_cfg

AModel_cfg object describing how the effect model should be estimated.

Value

A list with two items:

model - TheFX.Predictor model object used internally for PCATE estimation.
data - The data augmented with column.pseudo_outcome_hat for the cross-fit predictionsof the HTE for each unit.

Fits a plugin model using the appropriate settings

Description

This function prepares data, fits the appropriate models and returns theresulting estimates in a standardized format.

Usage

fit_plugin(full_data, weight_col, outcome_col, ..., .Model_cfg)

Arguments

full_data

The full dataset of interest for the modelling problem.

weight_col

The unquoted weighting variable name to use in model fitting.

outcome_col

The unquoted column name to use as a label for the supervisedlearning problem.

...

The unquoted names of covariates to use in the model.

.Model_cfg

AModel_cfg object configuring the appropriate model type to use.

Value

A newPredictor object of the appropriate subclass corresponding to theModel_cfg fit to the data.

Fits a propensity score model using the appropriate settings

Description

This function prepares data, fits the appropriate model and returns theresulting estimates in a standardized format.

Usage

fit_plugin_A(full_data, weight_col, a_col, ..., .Model_cfg)

Arguments

full_data

The full dataset of interest for the modelling problem.

weight_col

The unquoted weighting variable name to use in model fitting.

a_col

The unquoted column name of the treatment.

...

The unquoted names of covariates to use in the model.

.Model_cfg

AModel_cfg object configuring the appropriate model type to use.

Value

A list with one element,ps. This element contains aPredictor object ofthe appropriate subclass corresponding to theModel_cfg fit to the data.

Fits a T-learner using the appropriate settings

Description

This function prepares data, fits the appropriate model and returns theresulting estimates in a standardized format.

Usage

fit_plugin_Y(full_data, weight_col, y_col, a_col, ..., .Model_cfg)

Arguments

full_data

The full dataset of interest for the modelling problem.

weight_col

The unquoted weighting variable name to use in model fitting.

y_col

The unquoted column name of the outcome.

a_col

The unquoted column name of the treatment.

...

The unquoted names of covariates to use in the model.

.Model_cfg

AModel_cfg object configuring the appropriate model type to use.

Value

A list with two elements,mu1 andmu0 corresponding to the models fit tothe treatment and control potential outcomes, respectively. Each is a newPredictorobject of the appropriate subclass corresponding to the theModel_cfg fit to the data.

Removes rows which have missing data on any of the supplied columns.

Description

This function removes rows with missingness based on the columns provided.If rows are dropped, a message is displayed to the user to inform them of thisfact.

Usage

listwise_deletion(data, ...)

Arguments

data

The dataset from which to drop cases which are not fully observed.

...

Unquoted column names which must be non-missing. Missingness in thesecolumns will result in dropped observations. Missingness in other columns will not.

Value

The original data with all observations which are fully observed.

Define splits for cross-fitting

Description

This takes a dataset, a column with a unique identifier and anarbitrary number of covariates on which to stratify the splits.It returns the original dataset with an additional column.split_idcorresponding to an identifier for the split.

Usage

make_splits(data, identifier, ..., .num_splits)

Arguments

data

dataframe

identifier

Unquoted name of unique identifier column

...

variables on which to stratify (requires thatquickblock be installed.)

.num_splits

number of splits to create. If VIMP is requested inQoI_cfg, thismust be an even number.

Details

Value

original dataframe with additional.split_id column

Examples

library("dplyr")if(require("palmerpenguins")) {data(package = 'palmerpenguins')penguins$unitid = seq_len(nrow(penguins))penguins$propensity = rep(0.5, nrow(penguins))penguins$treatment = rbinom(nrow(penguins), 1, penguins$propensity)cfg <- basic_config() %>% add_known_propensity_score("propensity") %>%add_outcome_model("SL.glm.interaction") %>%remove_vimp()attach_config(penguins, cfg) %>%make_splits(unitid, .num_splits = 4) %>%produce_plugin_estimates(outcome = body_mass_g, treatment = treatment, species, sex) %>%construct_pseudo_outcomes(body_mass_g, treatment) %>%estimate_QoI(species, sex)}

Prediction for an SL.glmnet object

Description

Prediction for the glmnet wrapper.

Usage

## S3 method for class 'SL.glmnet.interaction'predict(  object,  newdata,  remove_extra_cols = TRUE,  add_missing_cols = TRUE,  ...)

Arguments

object

Result object from SL.glmnet

newdata

Dataframe or matrix that will generate predictions.

remove_extra_cols

Remove any extra columns in the new data that werenot part of the original model.

add_missing_cols

Add any columns from original data that do not existin the new data, and set values to 0.

...

Any additional arguments (not used).

Estimate models of nuisance functions

Description

This takes a dataset with an identified outcome and treatment column alongwith any number of covariates and appends three columns to the dataset correspondingto an estimate of the conditional expectation of treatment (.pi_hat), along with theconditional expectation of the control and treatment potential outcome surfaces(.mu0_hat and.mu1_hat respectively).

Usage

produce_plugin_estimates(data, outcome, treatment, ..., .weights = NULL)

Arguments

data

dataframe (already prepared withattach_config andmake_splits)

outcome

Unquoted name of the outcome variable.

treatment

Unquoted name of the treatment variable.

...

Unquoted names of covariates to include in the models of the nuisance functions.

.weights

Unquoted name of weights column. If NULL, all analysis will assume weightsare all equal to one and sample-based quantities will be returned.

Details

Examples

library("dplyr")if(require("palmerpenguins")) {data(package = 'palmerpenguins')penguins$unitid = seq_len(nrow(penguins))penguins$propensity = rep(0.5, nrow(penguins))penguins$treatment = rbinom(nrow(penguins), 1, penguins$propensity)cfg <- basic_config() %>% add_known_propensity_score("propensity") %>%add_outcome_model("SL.glm.interaction") %>%remove_vimp()attach_config(penguins, cfg) %>%make_splits(unitid, .num_splits = 4) %>%produce_plugin_estimates(outcome = body_mass_g, treatment = treatment, species, sex) %>%construct_pseudo_outcomes(body_mass_g, treatment) %>%estimate_QoI(species, sex)}

Removes variable importance information

Description

This removes the variable importance quantity of interestfrom anHTE_cfg.

Usage

remove_vimp(hte_cfg)

Arguments

hte_cfg

HTE_cfg object to update.

Value

UpdatedHTE_cfg object

Examples

library("dplyr")basic_config() %>%   remove_vimp() -> hte_cfg

Partition the data into folds

Description

This takes a dataset and a split ID and generates two subsets of thedata corresponding to a training set and a holdout.

Usage

split_data(data, split_id)

Arguments

data

dataframe

split_id

integer representing the split to construct

Value

Returns an R6 objectHTEFold with three public fields:

train - The split to be used for training the plugin estimates
holdout - The split not used for training
in_holdout - A logical vector indicating for each unit whether they lie in the holdout.

Movatterモバイル変換

tidyhte: Tidy Estimation of Heterogeneous Treatment Effects

Description

Details

Author(s)

References

See Also

Configuration of a Constant Estimator

Description

Super class

Public fields

Methods

Public methods

Methodnew()

Usage

Returns

Examples

Methodclone()

Usage

Arguments

Examples

Configuration of Model Diagnostics

Description

Public fields

Methods

Public methods

Methodnew()

Usage

Arguments

Returns

Examples

Methodadd()

Usage

Arguments

Returns

Examples

Methodclone()

Usage

Arguments

Examples

Predictor class for the cross-fit predictor of "partial" CATEs

Description

Details

Public fields

Methods

Public methods

Methodnew()

Usage

Arguments

Methodpredict()

Usage

Arguments

Returns

Methodclone()

Usage

Arguments

R6 class to represent partitions of the data between training and held-out

Description

Details

Public fields

Methods

Public methods

Methodnew()

Usage

Arguments

Returns

Methodclone()

Usage

Arguments

Configuration of Quantities of Interest

Description

Public fields

Methods

Public methods

Methodnew()

Usage

Arguments

Examples

Methodclone()

Usage

Method`new()`

Method`clone()`

Method`new()`

Method`add()`

Method`clone()`

Method`new()`

Method`predict()`

Method`clone()`

Method`new()`

Method`clone()`

Method`new()`

Method`clone()`

Method`new()`

Method`clone()`

Method`new()`

Method`clone()`

Method`new()`

Method`add_moderator()`

Method`clone()`

Method`new()`

Method`clone()`