Movatterモバイル変換

Title:

Generalized Random Forests

Version:

2.5.0

BugReports:

Description:

Forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.

Depends:

R (≥ 3.5.0)

License:

GPL-3

LinkingTo:

Rcpp, RcppEigen

Imports:

DiceKriging, lmtest, Matrix, methods, Rcpp (≥ 0.12.15),sandwich (≥ 2.4-0)

RoxygenNote:

7.3.2

Suggests:

DiagrammeR, MASS, rdrobust, survival (≥ 3.2-8), testthat (≥3.0.4)

SystemRequirements:

GNU make

URL:

https://github.com/grf-labs/grf

Encoding:

UTF-8

NeedsCompilation:

yes

Packaged:

2025-10-07 20:21:49 UTC; erikcs

Author:

Julie Tibshirani [aut], Susan Athey [aut], Rina Friedberg [ctb], Vitor Hadad [ctb], David Hirshberg [ctb], Luke Miner [ctb], Erik Sverdrup [aut, cre], Stefan Wager [aut], Marvin Wright [ctb]

Maintainer:

Erik Sverdrup <erik.sverdrup@monash.edu>

Repository:

CRAN

Date/Publication:

2025-10-09 05:10:17 UTC

grf: Generalized Random Forests

Description

A package for forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.

In addition, GRF supports 'honest' estimation (where one subset of the data is used for choosing splits, and another for populating the leaves of the tree), and confidence intervals for least-squares regression and treatment effect estimation.

Some helpful links for getting started:

* The R package documentation contains usage examples and method reference (https://grf-labs.github.io/grf/).

* The GRF reference gives a detailed description of the GRF algorithm and includes troubleshooting suggestions (https://grf-labs.github.io/grf/REFERENCE.html).

* For community questions and answers around usage, see Github issues labelled 'question' (https://github.com/grf-labs/grf/issues?q=label%3Aquestion).

Author(s)

Maintainer: Erik Sverdruperik.sverdrup@monash.edu

Authors:

Julie Tibshiranijtibs@cs.stanford.edu
Susan Athey
Stefan Wager

Other contributors:

Rina Friedberg [contributor]
Vitor Hadad [contributor]
David Hirshberg [contributor]
Luke Miner [contributor]
Marvin Wright [contributor]

Examples

# The following script demonstrates how to use GRF for heterogeneous treatment# effect estimation. For examples of how to use other types of forest,# please consult the documentation on the relevant forest methods (quantile_forest,# instrumental_forest, etc.).# Generate data.n <- 2000; p <- 10X <- matrix(rnorm(n*p), n, p)X.test <- matrix(0, 101, p)X.test[,1] <- seq(-2, 2, length.out = 101)# Train a causal forest.W <- rbinom(n, 1, 0.4 + 0.2 * (X[,1] > 0))Y <- pmax(X[,1], 0) * W + X[,2] + pmin(X[,3], 0) + rnorm(n)tau.forest <- causal_forest(X, Y, W)# Estimate treatment effects for the training data using out-of-bag prediction.tau.hat.oob <- predict(tau.forest)hist(tau.hat.oob$predictions)# Estimate treatment effects for the test sample.tau.hat <- predict(tau.forest, X.test)plot(X.test[,1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2),xlab = "x", ylab = "tau", type = "l")lines(X.test[,1], pmax(0, X.test[,1]), col = 2, lty = 2)# Estimate the conditional average treatment effect on the full sample (CATE).average_treatment_effect(tau.forest, target.sample = "all")# Estimate the conditional average treatment effect on the treated sample (CATT).average_treatment_effect(tau.forest, target.sample = "treated")# Add confidence intervals for heterogeneous treatment effects; growing more# trees is now recommended.tau.forest <- causal_forest(X, Y, W, num.trees = 4000)tau.hat <- predict(tau.forest, X.test, estimate.variance = TRUE)sigma.hat <- sqrt(tau.hat$variance.estimates)ylim <- range(tau.hat$predictions + 1.96 * sigma.hat, tau.hat$predictions - 1.96 * sigma.hat, 0, 2)plot(X.test[,1], tau.hat$predictions, ylim = ylim, xlab = "x", ylab = "tau", type = "l")lines(X.test[,1], tau.hat$predictions + 1.96 * sigma.hat, col = 1, lty = 2)lines(X.test[,1], tau.hat$predictions - 1.96 * sigma.hat, col = 1, lty = 2)lines(X.test[,1], pmax(0, X.test[,1]), col = 2, lty = 1)# In some examples, pre-fitting models for Y and W separately may# be helpful (e.g., if different models use different covariates).# In some applications, one may even want to get Y.hat and W.hat# using a completely different method (e.g., boosting).# Generate new data.n <- 4000; p <- 20X <- matrix(rnorm(n * p), n, p)TAU <- 1 / (1 + exp(-X[, 3]))W <- rbinom(n ,1, 1 / (1 + exp(-X[, 1] - X[, 2])))Y <- pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)forest.W <- regression_forest(X, W, tune.parameters = "all")W.hat <- predict(forest.W)$predictionsforest.Y <- regression_forest(X, Y, tune.parameters = "all")Y.hat <- predict(forest.Y)$predictionsforest.Y.varimp <- variable_importance(forest.Y)# Note: Forests may have a hard time when trained on very few variables# (e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive# in selection.selected.vars <- which(forest.Y.varimp / mean(forest.Y.varimp) > 0.2)tau.forest <- causal_forest(X[, selected.vars], Y, W,                           W.hat = W.hat, Y.hat = Y.hat,                           tune.parameters = "all")# See if a causal forest succeeded in capturing heterogeneity by plotting# the TOC and calculating a 95% CI for the AUTOC.train <- sample(1:n, n / 2)train.forest <- causal_forest(X[train, ], Y[train], W[train])eval.forest <- causal_forest(X[-train, ], Y[-train], W[-train])rate <- rank_average_treatment_effect(eval.forest,                                      predict(train.forest, X[-train, ])$predictions)plot(rate)paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2))

Get doubly robust estimates of average treatment effects.

Description

In the case of a causal forest with binary treatment, we provideestimates of one of the following:

The average treatment effect (target.sample = all): E[Y(1) - Y(0)]
The average treatment effect on the treated (target.sample = treated): E[Y(1) - Y(0) | Wi = 1]
The average treatment effect on the controls (target.sample = control): E[Y(1) - Y(0) | Wi = 0]
The overlap-weighted average treatment effect (target.sample = overlap):E[e(X) (1 - e(X)) (Y(1) - Y(0))] / E[e(X) (1 - e(X)), where e(x) = P[Wi = 1 | Xi = x].

This last estimand is recommended by Li, Morgan, and Zaslavsky (2018)in case of poor overlap (i.e., when the propensities e(x) may be very closeto 0 or 1), as it doesn't involve dividing by estimated propensities.

Usage

average_treatment_effect(  forest,  target.sample = c("all", "treated", "control", "overlap"),  method = c("AIPW", "TMLE"),  subset = NULL,  debiasing.weights = NULL,  compliance.score = NULL,  num.trees.for.weights = 500)

Arguments

forest

The trained forest.

target.sample

Which sample to aggregate treatment effects over.Note: Options other than "all" are only currently implementedfor causal forests.

method

Method used for doubly robust inference. Can be eitheraugmented inverse-propensity weighting (AIPW), ortargeted maximum likelihood estimation (TMLE). Note:TMLE is currently only implemented for causal forests witha binary treatment.

subset

Specifies subset of the training examples over which weestimate the ATE. WARNING: For valid statistical performance,the subset should be defined only using features Xi, not usingthe treatment Wi or the outcome Yi.

debiasing.weights

A vector of length n (or the subset length) of debiasing weights.If NULL (default) these are obtained via the appropriate doubly robust scoreconstruction, e.g., in the case of causal_forests with a binary treatment, theyare obtained via inverse-propensity weighting.

compliance.score

Only used with instrumental forests. An estimate of the causaleffect of Z on W, i.e., Delta(X) = E[W | X, Z = 1] - E[W | X, Z = 0],which can then be used to produce debiasing.weights. If not provided,this is estimated via an auxiliary causal forest.

num.trees.for.weights

In some cases (e.g., with causal forests with a continuoustreatment), we need to train auxiliary forests to learn debiasing weights.This is the number of trees used for this task. Note: this argument is onlyused when debiasing.weights = NULL.

Details

In the case of a causal forest with continuous treatment, we provide estimates of theaverage partial effect, i.e., E[Cov[W, Y | X] / Var[W | X]]. In the case of a binary treatment,the average partial effect matches the average treatment effect. Computing the average partialeffect is somewhat more involved, as the relevant doubly robust scores require an estimateof Var[Wi | Xi = x]. By default, we get such estimates by training an auxiliary forest;however, these weights can also be passed manually by specifying debiasing.weights.

In the case of instrumental forests with a binary treatment, we provide an estimateof the the Average (Conditional) Local Average Treatment (ACLATE).Specifically, given an outcome Y, treatment W and instrument Z, the (conditional) localaverage treatment effect is tau(x) = Cov[Y, Z | X = x] / Cov[W, Z | X = x].This is the quantity that is estimated with an instrumental forest.It can be intepreted causally in various ways. Given a homogeneityassumption, tau(x) is simply the CATE at x. When W is binaryand there are no "defiers", Imbens and Angrist (1994) show that tau(x) canbe interpreted as an average treatment effect on compliers. This functionprovides an estimate of tau = E[tau(X)]. See Chernozhukovet al. (2022) for a discussion, and Section 5.2 of Athey and Wager (2021)for an example using forests.

If clusters are specified, then each unit gets equal weight by default. Forexample, if there are 10 clusters with 1 unit each and per-cluster ATE = 1,and there are 10 clusters with 19 units each and per-cluster ATE = 0, thenthe overall ATE is 0.05 (additional sample.weights allow for customweighting). If equalize.cluster.weights = TRUE each cluster gets equal weightand the overall ATE is 0.5.

Value

An estimate of the average treatment effect, along with standard error.

References

Athey, Susan, and Stefan Wager. "Policy Learning With Observational Data."Econometrica 89.1 (2021): 133-161.

Chernozhukov, Victor, Juan Carlos Escanciano, Hidehiko Ichimura,Whitney K. Newey, and James M. Robins. "Locally robust semiparametricestimation." Econometrica 90(4), 2022.

Imbens, Guido W., and Joshua D. Angrist. "Identification and Estimation ofLocal Average Treatment Effects." Econometrica 62(2), 1994.

Li, Fan, Kari Lock Morgan, and Alan M. Zaslavsky."Balancing covariates via propensity score weighting."Journal of the American Statistical Association 113(521), 2018.

Mayer, Imke, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager, and Julie Josse."Doubly robust treatment effect estimation with missing attributes."Annals of Applied Statistics, 14(3), 2020.

Robins, James M., and Andrea Rotnitzky. "Semiparametric efficiency inmultivariate regression models with missing data." Journal of theAmerican Statistical Association 90(429), 1995.

Examples

# Train a causal forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.5)Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)c.forest <- causal_forest(X, Y, W)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)c.pred <- predict(c.forest, X.test)# Estimate the conditional average treatment effect on the full sample (CATE).average_treatment_effect(c.forest, target.sample = "all")# Estimate the conditional average treatment effect on the treated sample (CATT).# We don't expect much difference between the CATE and the CATT in this example,# since treatment assignment was randomized.average_treatment_effect(c.forest, target.sample = "treated")# Estimate the conditional average treatment effect on samples with positive X[,1].average_treatment_effect(c.forest, target.sample = "all", subset = X[, 1] > 0)# Example for causal forests with a continuous treatment.n <- 2000p <- 10X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 1 / (1 + exp(-X[, 2]))) + rnorm(n)Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)tau.forest <- causal_forest(X, Y, W)tau.hat <- predict(tau.forest)average_treatment_effect(tau.forest)average_treatment_effect(tau.forest, subset = X[, 1] > 0)

Estimate the best linear projection of a conditional average treatment effect.

Description

Let tau(Xi) = E[Y(1) - Y(0) | X = Xi] be the CATE, and Ai be a vector of user-providedcovariates. This function provides a (doubly robust) fit to the linear model tau(Xi) ~ beta_0 + Ai * beta.

Usage

best_linear_projection(  forest,  A = NULL,  subset = NULL,  debiasing.weights = NULL,  compliance.score = NULL,  num.trees.for.weights = 500,  vcov.type = "HC3",  target.sample = c("all", "overlap"))

Arguments

forest

The trained forest.

A

The covariates we want to project the CATE onto.

subset

debiasing.weights

compliance.score

num.trees.for.weights

vcov.type

Optional covariance type for standard errors. The possibleoptions are HC0, ..., HC3. The default is "HC3", which is recommended in smallsamples and corresponds to the "shortcut formula" for the jackknife(see MacKinnon & White for more discussion, and Cameron & Miller for a review).For large data sets with clusters, "HC0" or "HC1" are significantly faster to compute.

target.sample

Which sample to compute the BLP over. The default is "all".Option "overlap" uses weights equal to e(X)(1 - e(X)), where e(x) are estimates ofthe propensity score.

Details

Procedurally, we do so by regressing doubly robust scores derived from theforest against the Ai. Note the covariates Ai may consist of a subset of the Xi,or they may be distinct. The case of the null model tau(Xi) ~ beta_0 is equivalentto fitting an average treatment effect via AIPW.

In the event the treatment is continuous the inverse-propensity weight component of thedouble robust scores are replaced with a component based on a forest basedestimate of Var[Wi | Xi = x]. These weights can also be passed manually by specifyingdebiasing.weights.

Value

An estimate of the best linear projection, along with coefficient standard errors.

References

Cameron, A. Colin, and Douglas L. Miller. "A practitioner's guide tocluster-robust inference." Journal of Human Resources 50, no. 2 (2015): 317-372.

Cui, Yifan, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu."Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests."Journal of the Royal Statistical Society: Series B, 85(2), 2023.

MacKinnon, James G., and Halbert White. "Some heteroskedasticity-consistentcovariance matrix estimators with improved finite sample properties."Journal of Econometrics 29.3 (1985): 305-325.

Semenova, Vira, and Victor Chernozhukov. "Debiased Machine Learning ofConditional Average Treatment Effects and Other Causal Functions".The Econometrics Journal 24.2 (2021).

Examples

n <- 800p <- 5X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.25 + 0.5 * (X[, 1] > 0))Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)forest <- causal_forest(X, Y, W)best_linear_projection(forest, X[,1:2])

Boosted regression forest

Description

Trains a boosted regression forest that can be used to estimatethe conditional mean function mu(x) = E[Y | X = x]. Selectsnumber of boosting iterations based on cross-validation.

Usage

boosted_regression_forest(  X,  Y,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  ci.group.size = 2,  tune.parameters = "none",  tune.num.trees = 10,  tune.num.reps = 100,  tune.num.draws = 1000,  boost.steps = NULL,  boost.error.reduction = 0.97,  boost.max.steps = 5,  boost.trees.tune = 10,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the regression.

Y

The outcome.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to each observation in estimation.If NULL, each observation receives the same weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

If FALSE, each unit is given the same weight (so that biggerclusters get more weight). If TRUE, each cluster is given equal weight in the forest. In this case,during training, each tree uses the same number of observations from each drawn cluster: If thesmallest cluster has K units, then when we sample a cluster during training, we only give a randomK elements of the cluster to the tree-growing procedure. When estimating average treatment effects,each observation is given weight 1/cluster size, so that the total weight of each cluster is thesame. Note that, if this argument is FALSE, sample weights may also be directly adjusted via thesample.weights argument. If this argument is TRUE, sample.weights must be set to NULL. Default isFALSE.

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

Whether to use honest splitting (i.e., sub-sample splitting). Default is TRUE.For a detailed description of honesty, honesty.fraction, honesty.prune.leaves, and recommendations forparameter tuning, see the grf algorithm reference.

honesty.fraction

The fraction of data that will be used for determining splits if honesty = TRUE. Correspondsto set J1 in the notation of the paper. Default is 0.5 (i.e. half of the data is used fordetermining splits).

honesty.prune.leaves

If TRUE, prunes the estimation sample tree such that no leavesare empty. If FALSE, keep the same tree as determined in the splits sample (if an empty leave is encountered, thattree is skipped and does not contribute to the estimate). Setting this to FALSE may improve performance onsmall/marginally powered data, but requires more trees (note: tuning does not adjust the number of trees).Only applies if honesty is enabled. Default is TRUE.

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2.

tune.parameters

If true, NULL parameters are tuned by cross-validation; if FALSENULL parameters are set to defaults. Default is FALSE.

tune.num.trees

The number of trees in each 'mini forest' used to fit the tuning model. Default is 10.

tune.num.reps

The number of forests used to fit the tuning model. Default is 100.

tune.num.draws

The number of random parameter values considered when using the modelto select the optimal parameters. Default is 1000.

boost.steps

The number of boosting iterations. If NULL, selected by cross-validation. Default is NULL.

boost.error.reduction

If boost.steps is NULL, the percentage of previous steps' error that must be estimatedby cross validation in order to take a new step, default 0.97.

boost.max.steps

The maximum number of boosting iterations to try when boost.steps=NULL. Default is 5.

boost.trees.tune

If boost.steps is NULL, the number of trees used to test a new boosting step when tuningboost.steps. Default is 10.

num.threads

Number of threads used in training. If set to NULL, the softwareautomatically selects an appropriate amount.

seed

The seed for the C++ random number generator.

Value

A boosted regression forest object. $error contains the mean debiased error for each step, and $forestscontains the trained regression forest for each step.

Examples

# Train a boosted regression forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)boosted.forest <- boosted_regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)boost.pred <- predict(boosted.forest, X.test)# Predict on out-of-bag training samples.boost.pred <- predict(boosted.forest)# Check how many boosting iterations were usedprint(length(boosted.forest$forests))

Simple clustered bootstrap.

Description

Inspired by the 'boot' function in the bootstrap package with clusters + half-sampling added.A future TODO could be to add parallel (not necessarily worth it)https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

Usage

boot_grf(data, statistic, R, clusters, half.sample = TRUE, ...)

Arguments

data

A data frame with the original data.

statistic

A function computing estimate(s) with signature (data, indices, ...) wheredata is the original data, and indices a vector which defines the bootstrap sample.

R

The number of bootstrap replications.

clusters

Integer vector of cluster assignment, setting to 1:N corresponds to an ordinaryunclustered bootstrap.

half.sample

Whether to do half sample bootstrap (half the clusters are drawn). Default is TRUE.

...

Additional arguments passed on to statistic.

Value

A list with the original estimate t0, and bootstrap estimates t.

References

Angelo Canty and Brian Ripley (2021). boot: Bootstrap R (S-Plus) Functions.

Causal forest

Description

Trains a causal forest that can be used to estimateconditional average treatment effects tau(X). Whenthe treatment assignment W is binary and unconfounded,we have tau(X) = E[Y(1) - Y(0) | X = x], where Y(0) andY(1) are potential outcomes corresponding to the two possibletreatment states. When W is continuous, we effectively estimatean average partial effect Cov[Y, W | X = x] / Var[W | X = x],and interpret it as a treatment effect given unconfoundedness.

Usage

causal_forest(  X,  Y,  W,  Y.hat = NULL,  W.hat = NULL,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  stabilize.splits = TRUE,  ci.group.size = 2,  tune.parameters = "none",  tune.num.trees = 200,  tune.num.reps = 50,  tune.num.draws = 1000,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the causal regression.

Y

The outcome (must be a numeric vector with no NAs).

W

The treatment assignment (must be a binary or real numeric vector with no NAs).

Y.hat

Estimates of the expected responses E[Y | Xi], marginalizingover treatment. If Y.hat = NULL, these are estimated usinga separate regression forest. See section 6.1.1 of the GRF paper forfurther discussion of this quantity. Default is NULL.

W.hat

Estimates of the treatment propensities E[W | Xi]. If W.hat = NULL,these are estimated using a separate regression forest. Default is NULL.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to each sample in estimation.If NULL, each observation receives the same weight.Note: To avoid introducing confounding, weights should beindependent of the potential outcomes given X. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

stabilize.splits

Whether or not the treatment should be taken into account whendetermining the imbalance of a split. Default is TRUE.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2.

tune.parameters

A vector of parameter names to tune.If "all": all tunable parameters are tuned by cross-validation. The following parameters aretunable: ("sample.fraction", "mtry", "min.node.size", "honesty.fraction","honesty.prune.leaves", "alpha", "imbalance.penalty"). If honesty is FALSE the honesty.* parameters are not tuned.Default is "none" (no parameters are tuned).

tune.num.trees

The number of trees in each 'mini forest' used to fit the tuning model. Default is 200.

tune.num.reps

The number of forests used to fit the tuning model. Default is 50.

tune.num.draws

The number of random parameter values considered when using the modelto select the optimal parameters. Default is 1000.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained causal forest object. If tune.parameters is enabled,then tuning information will be included through the 'tuning.output' attribute.

References

Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized Random Forests".Annals of Statistics, 47(2), 2019.

Wager, Stefan, and Susan Athey. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests".Journal of the American Statistical Association, 113(523), 2018.

Nie, Xinkun, and Stefan Wager. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects".Biometrika, 108(2), 2021.

Examples

# Train a causal forest.n <- 500p <- 10X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.5)Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)c.forest <- causal_forest(X, Y, W)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)c.pred <- predict(c.forest, X.test)# Predict on out-of-bag training samples.c.pred <- predict(c.forest)# Predict with confidence intervals; growing more trees is now recommended.c.forest <- causal_forest(X, Y, W, num.trees = 4000)c.pred <- predict(c.forest, X.test, estimate.variance = TRUE)# In some examples, pre-fitting models for Y and W separately may# be helpful (e.g., if different models use different covariates).# In some applications, one may even want to get Y.hat and W.hat# using a completely different method (e.g., boosting).n <- 2000p <- 20X <- matrix(rnorm(n * p), n, p)TAU <- 1 / (1 + exp(-X[, 3]))W <- rbinom(n, 1, 1 / (1 + exp(-X[, 1] - X[, 2])))Y <- pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)forest.W <- regression_forest(X, W, tune.parameters = "all")W.hat <- predict(forest.W)$predictionsforest.Y <- regression_forest(X, Y, tune.parameters = "all")Y.hat <- predict(forest.Y)$predictionsforest.Y.varimp <- variable_importance(forest.Y)# Note: Forests may have a hard time when trained on very few variables# (e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive# in selection.selected.vars <- which(forest.Y.varimp / mean(forest.Y.varimp) > 0.2)tau.forest <- causal_forest(X[, selected.vars], Y, W,  W.hat = W.hat, Y.hat = Y.hat,  tune.parameters = "all")tau.hat <- predict(tau.forest)$predictions# See if a causal forest succeeded in capturing heterogeneity by plotting# the TOC and calculating a 95% CI for the AUTOC.train <- sample(1:n, n / 2)train.forest <- causal_forest(X[train, ], Y[train], W[train])eval.forest <- causal_forest(X[-train, ], Y[-train], W[-train])rate <- rank_average_treatment_effect(eval.forest,                                      predict(train.forest, X[-train, ])$predictions)plot(rate)paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2))

Causal survival forest

Description

Trains a causal survival forest that can be used to estimateconditional treatment effects tau(X) with right-censored outcomes.We estimate either 1)tau(X) = E[min(T(1), horizon) - min(T(0), horizon) | X = x],where T(1) and T(0) are potental outcomes corresponding to the two possible treatment statesand 'horizon' is the maximum follow-up time, or 2)tau(X) = P[T(1) > horizon | X = x] - P[T(0) > horizon | X = x], for a chosen time point 'horizon'.

Usage

causal_survival_forest(  X,  Y,  W,  D,  W.hat = NULL,  target = c("RMST", "survival.probability"),  horizon = NULL,  failure.times = NULL,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  stabilize.splits = TRUE,  ci.group.size = 2,  tune.parameters = "none",  compute.oob.predictions = TRUE,  fast.logrank = FALSE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates.

Y

The event time (must be non-negative).

W

The treatment assignment (must be a binary or real numeric vector with no NAs).

D

The event type (0: censored, 1: failure/observed event).

W.hat

Estimates of the treatment propensities E[W | X = x]. If W.hat = NULL,these are estimated using a separate regression forest. Default is NULL.

target

The target estimand. Choices are Restricted Mean Survival Time ("RMST") which estimates 1)E[min(T(1), horizon) - min(T(0), horizon) | X = x], or "survival.probability" which estimates 2)P[T(1) > horizon | X = x] - P[T(0) > horizon | X = x]. Default is "RMST".

horizon

A scalar that defines the estimand (required). If target is "RMST" then this definesthe maximum follow-up time. If target is "survival.probability", then this defines the time pointfor the absolute risk difference estimate.

failure.times

A vector of event times to fit the survival curves at. If NULL, then all the uniqueevent times are used. This speeds up forest estimation by constraining the event grid. Observed eventtimes are rounded down to the last sorted occurance less than or equal to the specified failure time.The time points should be in increasing order.Default is NULL.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. This parameter plays the samerole as in causal forest and survival forest, where for the latter the number of failures ineach child has to be at least one or 'alpha' times the number of samples in the parent node. Default is 0.05.(On data with very low event rate the default value may be too high for the forest to splitand lowering it may be beneficial).

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

stabilize.splits

Whether or not the treatment and censoring status should be taken into account whendetermining the imbalance of a split. The requirement for valid split candidates is the same as in causal_forestwith the additional constraint that num.failures(child) >= num.samples(parent) * alpha. Default is TRUE.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2.

tune.parameters

(Currently only applies to the regression forest used in W.hat estimation)A vector of parameter names to tune.If "all": all tunable parameters are tuned by cross-validation. The following parameters aretunable: ("sample.fraction", "mtry", "min.node.size", "honesty.fraction","honesty.prune.leaves", "alpha", "imbalance.penalty"). If honesty is FALSE the honesty.* parameters are not tuned.Default is "none" (no parameters are tuned).

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

fast.logrank

Whether to use the 'fast.logrank' option when estimating censoring corrections with'survival_forest'. Default is FALSE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Details

When W is continuous, we effectively estimate an average partial effect corresponding to1) Cov[min(T, horizon), W | X = x] / Var[W | X = x] or 2) Cov[1(T > horizon), W | X = x] / Var[W | X = x],and interpret it as a treatment effect given unconfoundedness.

Value

A trained causal_survival_forest forest object.

References

Cui, Yifan, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu."Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests".Journal of the Royal Statistical Society: Series B, 85(2), 2023.

Sverdrup, Erik, and Stefan Wager."Treatment Heterogeneity with Right-Censored Outcomes Using grf".ASA Lifetime Data Science Newsletter, January 2024(arXiv:2312.02482).

Examples

# Train a causal survival forest targeting a Restricted Mean Survival Time (RMST)# with maximum follow-up time set to `horizon`.n <- 2000p <- 5X <- matrix(runif(n * p), n, p)W <- rbinom(n, 1, 0.5)horizon <- 1failure.time <- pmin(rexp(n) * X[, 1] + W, horizon)censor.time <- 2 * runif(n)Y <- pmin(failure.time, censor.time)D <- as.integer(failure.time <= censor.time)# Save computation time by constraining the event grid by discretizing (rounding) continuous events.cs.forest <- causal_survival_forest(X, round(Y, 2), W, D, horizon = horizon)# Or do so more flexibly by defining your own time grid using the failure.times argument.# grid <- seq(min(Y), max(Y), length.out = 150)# cs.forest <- causal_survival_forest(X, Y, W, D, horizon = horizon, failure.times = grid)# Predict using the forest.X.test <- matrix(0.5, 10, p)X.test[, 1] <- seq(0, 1, length.out = 10)cs.pred <- predict(cs.forest, X.test)# Predict on out-of-bag training samples.cs.pred <- predict(cs.forest)# Predict with confidence intervals; growing more trees is now recommended.c.pred <- predict(cs.forest, X.test, estimate.variance = TRUE)# Compute a doubly robust estimate of the average treatment effect.average_treatment_effect(cs.forest)# Compute the best linear projection on the first covariate.best_linear_projection(cs.forest, X[, 1])# See if a causal survival forest succeeded in capturing heterogeneity by plotting# the TOC and calculating a 95% CI for the AUTOC.train <- sample(1:n, n / 2)eval <- -traintrain.forest <- causal_survival_forest(X[train, ], Y[train], W[train], D[train], horizon = horizon)eval.forest <- causal_survival_forest(X[eval, ], Y[eval], W[eval], D[eval], horizon = horizon)rate <- rank_average_treatment_effect(eval.forest,                                      predict(train.forest, X[eval, ])$predictions)plot(rate)paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2))

Writes each node informationIf it is a leaf node: show it in different color, show number of samples, show leaf idIf it is a non-leaf node: show its splitting variable and splitting valueIf trained with missing values, the edge arrow is filled according to which direction the NAs are sent.

Description

Writes each node informationIf it is a leaf node: show it in different color, show number of samples, show leaf idIf it is a non-leaf node: show its splitting variable and splitting valueIf trained with missing values, the edge arrow is filled according to which direction the NAs are sent.

Usage

create_dot_body(tree, index = 1, include.na.path)

Arguments

tree

the tree to convert

index

the index of the current node

include.na.path

A boolean toggling whether to include the path of missing values or not.

Compute rate estimates, a function to be passed on to bootstrap routine.

Description

Compute rate estimates, a function to be passed on to bootstrap routine.

Usage

estimate_rate(data, indices, q, wtd.mean)

Arguments

data

A data.frame with the original data, column 1: DR.scores*sample.weights, column 2: sample.weights (positive),column 3: priority scores (integer vector with scores 1,...,num.unique.prios).

indices

A vector of indices which define the bootstrap sample.

q

A vector of fractions to compute TOC on, with last entry = 1.

wtd.mean

A weighting function determining the type of RATE estimate.

Value

an estimate of RATE, together with the TOC curve.

Compute E[T | X]

Description

Compute E[T | X]

Usage

expected_survival(S.hat, Y.grid)

Arguments

S.hat

The estimated survival curve.

Y.grid

The time values corresponding to S.hat.

Value

A vector of expected values.

Export a tree in DOT format.This function generates a GraphViz representation of the tree,which is then written into 'dot_string'.

Description

Export a tree in DOT format.This function generates a GraphViz representation of the tree,which is then written into 'dot_string'.

Usage

export_graphviz(tree, include.na.path)

Arguments

tree

the tree to convert

include.na.path

A boolean toggling whether to include the path of missing values or not.

Generate causal forest data

Description

The following DGPs are available for benchmarking purposes:

"simple": tau = max(X1, 0), e = 0.4 + 0.2 * 1(X1 > 0).
"aw1": equation (27) of https://arxiv.org/pdf/1510.04342.pdf
"aw2": equation (28) of https://arxiv.org/pdf/1510.04342.pdf
"aw3": confounding is from "aw1" and tau is from "aw2"
"aw3reverse": Same as aw3, but HTEs anticorrelated with baseline
"ai1": "Setup 1" from section 6 of https://arxiv.org/pdf/1504.01132.pdf
"ai2": "Setup 2" from section 6 of https://arxiv.org/pdf/1504.01132.pdf
"kunzel": "Simulation 1" from A.1 in https://arxiv.org/pdf/1706.03461.pdf
"nw1": "Setup A" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw2": "Setup B" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw3": "Setup C" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw4": "Setup D" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

Usage

generate_causal_data(  n,  p,  sigma.m = 1,  sigma.tau = 0.1,  sigma.noise = 1,  dgp = c("simple", "aw1", "aw2", "aw3", "aw3reverse", "ai1", "ai2", "kunzel", "nw1",    "nw2", "nw3", "nw4"))

Arguments

n

The number of observations.

p

The number of covariates (note: the minimum varies by DGP).

sigma.m

The standard deviation of the unconditional mean of Y. Default is 1.

sigma.tau

The standard deviation of the treatment effect. Default is 0.1.

sigma.noise

The conditional variance of Y. Default is 1.

dgp

The kind of dgp. Default is "simple".

Details

Each DGP is parameterized byX: observables,m: conditional mean of Y,tau: treatment effect,e: propensity scores,V: conditional variance of Y.

The following rescaled data is returnedm = m / sd(m) * sigma.m,tau = tau / sd(tau) * sigma.tau,V = V / mean(V) * sigma.noise^2,W = rbinom(e),Y = m + (W - e) * tau + sqrt(V) + rnorm(n).

Value

A list consisting of:X, Y, W, tau, m, e, dgp.

Examples

# Generate simple benchmark datadata <- generate_causal_data(100, 5, dgp = "simple")# Generate data from Wager and Athey (2018)data <- generate_causal_data(100, 5, dgp = "aw1")data2 <- generate_causal_data(100, 5, dgp = "aw2")

Simulate causal survival data

Description

The following DGPs are available for benchmarking purposes, T is the failure timeand C the censoring time:

"simple1": T = X1*eps + W, C ~ U(0, 2) where eps ~ Exp(1) and Y.max = 1.
"type1": T is drawn from an accelerated failure time model and C from a Cox model (scenario 1 in https://arxiv.org/abs/2001.09887)
"type2": T is drawn from a proportional hazard model and C from a accelerated failure time (scenario 2 in https://arxiv.org/abs/2001.09887)
"type3": T and C are drawn from a Poisson distribution (scenario 3 in https://arxiv.org/abs/2001.09887)
"type4": T and C are drawn from a Poisson distribution (scenario 4 in https://arxiv.org/abs/2001.09887)
"type5": is similar to "type2" but with censoring generated from an accelerated failure time model.

Usage

generate_causal_survival_data(  n,  p,  Y.max = NULL,  y0 = NULL,  X = NULL,  rho = 0,  n.mc = 10000,  dgp = c("simple1", "type1", "type2", "type3", "type4", "type5"))

Arguments

n

The number of samples.

p

The number of covariates.

Y.max

The maximum follow-up time (optional).

y0

Query time to estimate P(T(1) > y0 | X) - P(T(0) > y0 | X) (optional).

X

The covariates (optional).

rho

The correlation coefficient of the X's covariance matrix V_(ij) = rho^|i-j|. Default is 0.

n.mc

The number of monte carlo draws to estimate the treatment effect with. Default is 10000.

dgp

The type of DGP.

Value

A list with entries:'X': the covariates, 'Y': the event times, 'W': the treatment indicator, 'D': the censoring indicator,'cate': the treatment effect (RMST) estimated by monte carlo, 'cate.prob' the difference in survival probability,'cate.sign': the true sign of the cate for ITR comparison, 'dgp': the dgp name, 'Y.max': the maximum follow-up time,'y0': the query time for difference in survival probability.

Examples

# Generate datan <- 1000p <- 5data <- generate_causal_survival_data(n, p)# Get true CATE on a test setX.test <- matrix(seq(0, 1, length.out = 5), 5, p)cate.test <- generate_causal_survival_data(n, p, X = X.test)$cate

Given a trained forest and test data, compute the kernel weights for each test point.

Description

During normal prediction, these weights (named alpha in the GRF paper) are computed as an intermediatestep towards producing estimates. This function allows for examining the weights directly, so theycould be potentially be used as the input to a different analysis.

Usage

get_forest_weights(forest, newdata = NULL, num.threads = NULL)

Arguments

forest

The trained forest.

newdata

num.threads

Number of threads used in training. If set to NULL, the softwareautomatically selects an appropriate amount.

Value

A sparse matrix where each row represents a test sample, and each column is a sample in thetraining data. The value at (i, j) gives the weight of training sample j for test sample i.

Examples

p <- 10n <- 100X <- matrix(2 * runif(n * p) - 1, n, p)Y <- (X[, 1] > 0) + 2 * rnorm(n)rrf <- regression_forest(X, Y, mtry = p)forest.weights.oob <- get_forest_weights(rrf)n.test <- 15X.test <- matrix(2 * runif(n.test * p) - 1, n.test, p)forest.weights <- get_forest_weights(rrf, X.test)

Find the leaf node for a test sample.

Description

Given a GRF tree object, compute the leaf node a test sample falls into. The nodes in a GRF treeare numbered breadth first, and the returned numbers will be the leaf integer accordingto this ordering. To get kernel weights based on leaf membership, see the functionget_forest_weights.

Usage

get_leaf_node(tree, newdata, node.id = TRUE)

Arguments

tree

A GRF tree object (retrieved by 'get_tree').

newdata

Points at which leaf predictions should be made.

node.id

Boolean indicating whether to return the node.id for each query sample (default), orif FALSE, a list of node numbers with the samples contained.

Value

A vector of integers indicating the leaf number for each sample in the given tree.

Examples

p <- 10n <- 100X <- matrix(2 * runif(n * p) - 1, n, p)Y <- (X[, 1] > 0) + 2 * rnorm(n)r.forest <- regression_forest(X, Y, num.tree = 50)n.test <- 5X.test <- matrix(2 * runif(n.test * p) - 1, n.test, p)tree <- get_tree(r.forest, 1)# Get a vector of node numbers for each sample.get_leaf_node(tree, X.test)# Get a list of samples per node.get_leaf_node(tree, X.test, node.id = FALSE)

Compute doubly robust scores for a GRF forest object

Description

Compute doubly robust scores for a GRF forest object

Usage

get_scores(forest, ...)

Arguments

forest

A grf forest object

...

Additional arguments

Value

A vector of scores

Compute doubly robust scores for a causal forest.

Description

Compute doubly robust (AIPW) scores for average treatment effect estimationor average partial effect estimation with continuous treatment,using a causal forest. Under regularity conditions, the average of the DR.scoresis an efficient estimate of the average treatment effect.

Usage

## S3 method for class 'causal_forest'get_scores(  forest,  subset = NULL,  debiasing.weights = NULL,  num.trees.for.weights = 500,  ...)

Arguments

forest

A trained causal forest.

subset

debiasing.weights

A vector of length n (or the subset length) of debiasing weights.If NULL (default) they are obtained via inverse-propensity weighting in the caseof binary treatment or by estimating Var[W | X = x] using a new forestin the case of a continuous treatment.

num.trees.for.weights

Number of trees used to estimate Var[W | X = x]. Note: thisargument is only used when debiasing.weights = NULL.

...

Additional arguments (currently ignored).

Value

A vector of scores.

References

Farrell, Max H. "Robust inference on average treatment effects withpossibly more covariates than observations." Journal of Econometrics189(1), 2015.

Graham, Bryan S., and Cristine Campos de Xavier Pinto. "Semiparametricallyefficient estimation of the average linear regression function."Journal of Econometrics 226(1), 2022.

Hirshberg, David A., and Stefan Wager. "Augmented minimax linear estimation."The Annals of Statistics 49(6), 2021.

Robins, James M., and Andrea Rotnitzky. "Semiparametric efficiency inmultivariate regression models with missing data." Journal of theAmerican Statistical Association 90(429), 1995.

Compute doubly robust scores for a causal survival forest.

Description

For details see section 3.2 in the causal survival forest paper.

Usage

## S3 method for class 'causal_survival_forest'get_scores(forest, subset = NULL, num.trees.for.weights = 500, ...)

Arguments

forest

A trained causal survival forest.

subset

num.trees.for.weights

Number of trees used to estimate Var[W | X = x]. Note: thisargument is only used in the case of a continuous treatment(seeget_scores.causal_forest for details).

...

Additional arguments (currently ignored).

Value

A vector of scores.

Doubly robust scores for estimating the average conditional local average treatment effect.

Description

Given an outcome Y, treatment W and instrument Z, the (conditional) localaverage treatment effect is tau(x) = Cov[Y, Z | X = x] / Cov[W, Z | X = x].This is the quantity that is estimated with an instrumental forest.It can be intepreted causally in various ways. Given a homogeneityassumption, tau(x) is simply the CATE at x. When W is binaryand there are no "defiers", Imbens and Angrist (1994) show that tau(x) canbe interpreted as an average treatment effect on compliers. This doubly robustscores provided here are for estimating tau = E[tau(X)].

Usage

## S3 method for class 'instrumental_forest'get_scores(  forest,  subset = NULL,  debiasing.weights = NULL,  compliance.score = NULL,  num.trees.for.weights = 500,  ...)

Arguments

forest

A trained instrumental forest.

subset

debiasing.weights

compliance.score

An estimate of the causal effect of Z on W, i.e., Delta(X) = E[W | X, Z = 1]- E[W | X, Z = 0], which can then be used to produce debiasing.weights. If not provided,this is estimated via an auxiliary causal forest.

num.trees.for.weights

...

Additional arguments (currently ignored).

Value

A vector of scores.

References

Aronow, Peter M., and Allison Carnegie. "Beyond LATE: Estimation of theaverage treatment effect with an instrumental variable." PoliticalAnalysis 21(4), 2013.

Chernozhukov, Victor, Juan Carlos Escanciano, Hidehiko Ichimura,Whitney K. Newey, and James M. Robins. "Locally robust semiparametricestimation." Econometrica 90(4), 2022.

Imbens, Guido W., and Joshua D. Angrist. "Identification and Estimation ofLocal Average Treatment Effects." Econometrica 62(2), 1994.

Compute doubly robust scores for a multi arm causal forest.

Description

Compute doubly robust (AIPW) scores for average treatment effect estimationusing a multi arm causal forest. Under regularity conditions, the average of the DR.scoresis an efficient estimate of the average treatment effect.

Usage

## S3 method for class 'multi_arm_causal_forest'get_scores(forest, subset = NULL, drop = FALSE, ...)

Arguments

forest

A trained multi arm causal forest.

subset

drop

If TRUE, coerce the result to the lowest possible dimension. Default is FALSE.

...

Additional arguments (currently ignored).

Value

An array of scores for each contrast and outcome.

Retrieve a single tree from a trained forest object.

Description

Retrieve a single tree from a trained forest object.

Usage

get_tree(forest, index)

Arguments

forest

The trained forest.

index

The index of the tree to retrieve.

Value

A GRF tree object containing the below attributes.drawn_samples: a list of examples that were used in training the tree. This includesexamples that were used in choosing splits, as well as the examples that populate the leafnodes. Put another way, if honesty is enabled, this list includes both subsamples from thesplit (J1 and J2 in the notation of the paper).num_samples: the number of examples used in training the tree.nodes: a list of objects representing the nodes in the tree, starting with the root node. Eachnode will contain an 'is_leaf' attribute, which indicates whether it is an interior or leaf node.Interior nodes contain the attributes 'left_child' and 'right_child', which give the indices oftheir children in the list, as well as 'split_variable', and 'split_value', which describe thesplit that was chosen. Leaf nodes only have the attribute 'samples', which is a list of thetraining examples that the leaf contains. Note that if honesty is enabled, this list will onlycontain examples from the second subsample that was used to 'repopulate' the tree (J2 in thenotation of the paper).

Examples

# Train a quantile forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)q.forest <- quantile_forest(X, Y, quantiles = c(0.1, 0.5, 0.9))# Examine a particular tree.q.tree <- get_tree(q.forest, 3)q.tree$nodes

grf package options

Description

grf package options can be set using R'soptions command.The current available options are:

‘grf.legacy.seed': controls whether grf’s random seed behavior depends onthe number of CPU threads used to train the forest. The default value is 'FALSE'.Set to 'TRUE' to recover results produced with grf versions prior to 2.4.0.

Usage

grf_options()

Value

Prints the current grf package options.

Examples

# Use random seed behavior prior to version 2.4.0.options(grf.legacy.seed = TRUE)# Print current package options.grf_options()# Use random seed independent of num.threads (default as of version 2.4.0 and higher).options(grf.legacy.seed = FALSE)

Intrumental forest

Description

Trains an instrumental forest that can be used to estimateconditional local average treatment effects tau(X) identifiedusing instruments. Formally, the forest estimatestau(X) = Cov[Y, Z | X = x] / Cov[W, Z | X = x].Note that when the instrument Z and treatment assignment Wcoincide, an instrumental forest is equivalent to a causal forest.

Usage

instrumental_forest(  X,  Y,  W,  Z,  Y.hat = NULL,  W.hat = NULL,  Z.hat = NULL,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  stabilize.splits = TRUE,  ci.group.size = 2,  reduced.form.weight = 0,  tune.parameters = "none",  tune.num.trees = 200,  tune.num.reps = 50,  tune.num.draws = 1000,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the instrumental regression.

Y

The outcome.

W

The treatment assignment (may be binary or real).

Z

The instrument (may be binary or real).

Y.hat

Estimates of the expected responses E[Y | Xi], marginalizingover treatment. If Y.hat = NULL, these are estimated usinga separate regression forest. Default is NULL.

W.hat

Estimates of the treatment propensities E[W | Xi]. If W.hat = NULL,these are estimated using a separate regression forest. Default is NULL.

Z.hat

Estimates of the instrument propensities E[Z | Xi]. If Z.hat = NULL,these are estimated using a separate regression forest. Default is NULL.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to each observation in estimation.If NULL, each observation receives equal weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

stabilize.splits

Whether or not the instrument should be taken into account whendetermining the imbalance of a split. Default is TRUE.

ci.group.size

The forst will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2.

reduced.form.weight

Whether splits should be regularized towards a naivesplitting criterion that ignores the instrument (andinstead emulates a causal forest).

tune.parameters

tune.num.trees

The number of trees in each 'mini forest' used to fit the tuning model. Default is 200.

tune.num.reps

The number of forests used to fit the tuning model. Default is 50.

tune.num.draws

The number of random parameter values considered when using the modelto select the optimal parameters. Default is 1000.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained instrumental forest object.

References

Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized Random Forests".Annals of Statistics, 47(2), 2019.

Examples

# Train an instrumental forest.n <- 2000p <- 5X <- matrix(rbinom(n * p, 1, 0.5), n, p)Z <- rbinom(n, 1, 0.5)Q <- rbinom(n, 1, 0.5)W <- Q * Ztau <-  X[, 1] / 2Y <- rowSums(X[, 1:3]) + tau * W + Q + rnorm(n)iv.forest <- instrumental_forest(X, Y, W, Z)# Predict on out-of-bag training samples.iv.pred <- predict(iv.forest)# Estimate a (local) average treatment effect.average_treatment_effect(iv.forest)

Calculate summary stats given a set of samples for causal forests.

Description

Calculate summary stats given a set of samples for causal forests.

Usage

## S3 method for class 'causal_forest'leaf_stats(forest, samples, ...)

Arguments

forest

The GRF forest

samples

The samples to include in the calculations.

...

Additional arguments (currently ignored).

Value

A named vector containing summary stats

A default leaf_stats for forests classes without a leaf_stats methodthat always returns NULL.

Description

A default leaf_stats for forests classes without a leaf_stats methodthat always returns NULL.

Usage

## Default S3 method:leaf_stats(forest, samples, ...)

Arguments

forest

Any forest

samples

The samples to include in the calculations.

...

Additional arguments (currently ignored).

Calculate summary stats given a set of samples for instrumental forests.

Description

Calculate summary stats given a set of samples for instrumental forests.

Usage

## S3 method for class 'instrumental_forest'leaf_stats(forest, samples, ...)

Arguments

forest

The GRF forest

samples

The samples to include in the calculations.

...

Additional arguments (currently ignored).

Value

A named vector containing summary stats

Calculate summary stats given a set of samples for regression forests.

Description

Calculate summary stats given a set of samples for regression forests.

Usage

## S3 method for class 'regression_forest'leaf_stats(forest, samples, ...)

Arguments

forest

The GRF forest

samples

The samples to include in the calculations.

...

Additional arguments (currently ignored).

Value

A named vector containing summary stats

Local linear forest

Description

Trains a local linear forest that can be used to estimatethe conditional mean function mu(x) = E[Y | X = x]

Usage

ll_regression_forest(  X,  Y,  enable.ll.split = FALSE,  ll.split.weight.penalty = FALSE,  ll.split.lambda = 0.1,  ll.split.variables = NULL,  ll.split.cutoff = NULL,  num.trees = 2000,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  ci.group.size = 2,  tune.parameters = "none",  tune.num.trees = 50,  tune.num.reps = 100,  tune.num.draws = 1000,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the regression.

Y

The outcome.

enable.ll.split

(experimental) Optional choice to make forest splits based on ridge residuals as opposed tostandard CART splits. Defaults to FALSE.

ll.split.weight.penalty

If using local linear splits, user can specify whether or not to use acovariance ridge penalty, analogously to the prediction case. Defaults to FALSE.

ll.split.lambda

Ridge penalty for splitting. Defaults to 0.1.

ll.split.variables

Linear correction variables for splitting. Defaults to all variables.

ll.split.cutoff

Enables the option to use regression coefficients from the full dataset for LL splittingonce leaves get sufficiently small. Leaf size after which we use the overall beta.Defaults to the square root of the number of samples. If desired, users can enforce noregulation (i.e., using the leaf betas at each step) by setting this parameter to zero.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 1.

tune.parameters

If true, NULL parameters are tuned by cross-validation; if FALSENULL parameters are set to defaults. Default is FALSE. Currently, local linear tuningis based on regression forest fit, and is only supported for 'enable.ll.split = FALSE'.

tune.num.trees

The number of trees in each 'mini forest' used to fit the tuning model. Default is 10.

tune.num.reps

The number of forests used to fit the tuning model. Default is 100.

tune.num.draws

The number of random parameter values considered when using the modelto select the optimal parameters. Default is 1000.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained local linear forest object.

References

Friedberg, Rina, Julie Tibshirani, Susan Athey, and Stefan Wager. "Local Linear Forests".Journal of Computational and Graphical Statistics, 30(2), 2020.

Examples

# Train a standard regression forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)forest <- ll_regression_forest(X, Y)

LM Forest

Description

Trains a linear model forest that can be used to estimateh_k(x), k = 1..K at X = x in the the conditional linear modelY = c(x) + h_1(x)W_1 + ... + h_K(x)W_K,where Y is a (potentially vector-valued) response andW a set of regressors.

Usage

lm_forest(  X,  Y,  W,  Y.hat = NULL,  W.hat = NULL,  num.trees = 2000,  sample.weights = NULL,  gradient.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  stabilize.splits = FALSE,  ci.group.size = 2,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the regression.

Y

The outcome (must be a numeric vector or matrix [one column per outcome] with no NAs).Multiple outcomes should be on the same scale.

W

The conditional regressors (must be a vector or matrix with no NAs).

Y.hat

Estimates of the conditional means E[Y | Xi].If Y.hat = NULL, these are estimated usinga separate multi-task regression forest. Default is NULL.

W.hat

Estimates of the conditional means E[Wk | Xi].If W.hat = NULL, these are estimated usinga separate multi-task regression forest. Default is NULL.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to each sample in estimation.If NULL, each observation receives the same weight.Default is NULL.

gradient.weights

Weights given to each coefficient h_k(x) when targeting heterogeneityin the estimates. These enter the GRF algorithm through the split criterion\Delta:the k-th coordinate of this is\Delta_k * gradient.weights[k].If NULL, each coefficient is given the same weight.Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

stabilize.splits

Whether or not Wk should be taken into account whendetermining the imbalance of a split. It is an exact extension of the single-arm constraints (detailedin the causal forest algorithm reference) to multiple arms, where the constraints apply to each regressor Wk.Default is FALSE.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2. (Confidence intervals arecurrently only supported for univariate outcomes Y).

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained lm forest object.

References

Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized Random Forests".Annals of Statistics, 47(2), 2019.

Zeileis, Achim, Torsten Hothorn, and Kurt Hornik. "Model-based Recursive Partitioning."Journal of Computational and Graphical Statistics 17(2), 2008.

Examples

if (require("rdrobust", quietly = TRUE)) {# Train a LM Forest to estimate CATEs in a regression discontinuity design.# Simulate a simple example with a heterogeneous jump in the CEF.n <- 2000p <- 5X <- matrix(rnorm(n * p), n, p)Z <- runif(n, -4, 4)cutoff <- 0W <- as.numeric(Z >= cutoff)tau <- pmax(0.5 * X[, 1], 0)Y <- tau * W  + 1 / (1 + exp(2 * Z)) + 0.2 * rnorm(n)# Compute the MSE-optimal bandwidth for a local linear regression.bandwidth <- rdrobust::rdbwselect(Y, Z, cutoff)$bws[[1]] # Alternatively, specify bandwith manually.# Compute kernel weights for a triangular kernel.dist <- abs((Z - cutoff) / bandwidth)sample.weights <- (1 - dist) * (dist <= 1) / bandwidth# Estimate a local linear regression with the running variable Z conditional on covariates X = x:# Y = c(x) + tau(x) W + b(x) Z.# Specify gradient.weights = c(1, 0) to target heterogeneity in the RDD coefficient tau(x).# Also, fit forest on subset with non-zero weights for faster estimation.subset <- sample.weights > 0lmf <- lm_forest(X[subset, ], Y[subset], cbind(W, Z)[subset, ],                 sample.weights = sample.weights[subset], gradient.weights = c(1, 0))tau.hat <- predict(lmf)$predictions[, 1, ]# Plot estimated tau(x) vs simulated ground truth.plot(X[subset, 1], tau.hat)points(X[subset, 1], tau[subset], col = "red", cex = 0.1)}

Merges a list of forests that were grown using the same data into one large forest.

Description

Merges a list of forests that were grown using the same data into one large forest.

Usage

merge_forests(forest_list, compute.oob.predictions = TRUE)

Arguments

forest_list

A 'list' of forests to be concatenated.All forests must be of the same type, and the type must be a subclass of 'grf'.In addition, all forests must have the same 'ci.group.size'.Other tuning parameters (e.g. alpha, mtry, min.node.size, imbalance.penalty) areallowed to differ across forests.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed.Note that even if OOB predictions have already been precomputed for the forests in 'forest_list',those predictions are not used. Instead, a new set of oob predictions is computed anew using thelarger forest. Default is TRUE.

Value

A single forest containing all the trees in each forest in the input list.

Examples

# Train standard regression forestsn <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)r.forest1 <- regression_forest(X, Y, compute.oob.predictions = FALSE, num.trees = 100)r.forest2 <- regression_forest(X, Y, compute.oob.predictions = FALSE, num.trees = 100)# Join the forests together. The resulting forest will contain 200 trees.big_rf <- merge_forests(list(r.forest1, r.forest2))

Multi-arm/multi-outcome causal forest

Description

Trains a causal forest that can be used to estimateconditional average treatment effects tau_k(X). Whenthe treatment assignment W is {1, ..., K} and unconfounded,we have tau_k(X) = E[Y(k) - Y(1) | X = x] where Y(k) andY(1) are potential outcomes corresponding to the treatmentstate for arm k and the baseline arm 1.

Usage

multi_arm_causal_forest(  X,  Y,  W,  Y.hat = NULL,  W.hat = NULL,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  stabilize.splits = TRUE,  ci.group.size = 2,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the causal regression.

Y

The outcome (must be a numeric vector or matrix [one column per outcome] with no NAs).Multiple outcomes should be on the same scale.

W

The treatment assignment (must be a factor vector with no NAs). The reference treatmentis set to the first treatment according to the ordinality of the factors, this can be changedwith the 'relevel' function in R.

Y.hat

Estimates of the expected responses E[Y | Xi], marginalizingover treatment. If Y.hat = NULL, these are estimated usinga separate multi-task regression forest. Default is NULL.

W.hat

Matrix with estimates of the treatment propensities E[Wk | Xi].If W.hat = NULL, these are estimated using a probability forest.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

stabilize.splits

Whether or not the treatment should be taken into account whendetermining the imbalance of a split. It is an exact extension of the single-arm constraints (detailedin the causal forest algorithm reference) to multiple arms, where the constraints apply to each treatment arm independently.Default is TRUE.

ci.group.size

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Details

This forest fits a multi-arm treatment estimate following the multivariateextension of the "R-learner" suggested in Nie and Wager (2021), with kernelweights derived by the GRF algorithm (Athey, Tibshirani, and Wager, 2019).In particular, with K arms, and W encoded as {0, 1}^(K-1), we estimate, fora target sample x, and a chosen baseline arm:

\hat \tau(x) = argmin_{\tau} \left\{ \sum_{i=1}^{n}\alpha_i (x) \left( Y_i - \hat m^{(-i)}(X_i) - c(x) -\left\langle W_i - \hat e^{(-i)}(X_i), \, \tau(X_i) \right\rangle\right)^2 \right\},

where the angle brackets indicates an inner product, e(X) = E[W | X = x] isa (vector valued) generalized propensity score, and m(x) = E[Y | X = x].The forest weights alpha(x) are derived from a generalized random forestsplitting on the vector-valued gradient of tau(x). (The intercept c(x)is a nuisance parameter not directly estimated). By default, e(X) andm(X) are estimated using two separate random forests, a probability forestand regression forest respectively (optionally provided through the argumentsW.hat and Y.hat). The k-th element of tau(x) measures the conditional averagetreatment effect of the k-th treatment arm at X = x for k = 1, ..., K-1.The treatment effect for multiple outcomes can be estimated jointly (i.e. Y canbe vector-valued) - in which case the splitting rule takes into accountall outcomes simultaneously (specifically, we concatenate the gradientvector for each outcome).

For a single treatment and outcome, this forest is equivalent to a causal forest, however,they may produce different results due to differences in numerics.

Value

A trained multi arm causal forest object.

References

Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized Random Forests".Annals of Statistics, 47(2), 2019.

Nie, Xinkun, and Stefan Wager. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects".Biometrika, 108(2), 2021.

Examples

# Train a multi arm causal forest.n <- 500p <- 10X <- matrix(rnorm(n * p), n, p)W <- as.factor(sample(c("A", "B", "C"), n, replace = TRUE))Y <- X[, 1] + X[, 2] * (W == "B") - 1.5 * X[, 2] * (W == "C") + rnorm(n)mc.forest <- multi_arm_causal_forest(X, Y, W)# Predict contrasts (out-of-bag) using the forest.# Fitting several outcomes jointly is supported, and the returned prediction array has# dimension [num.samples, num.contrasts, num.outcomes]. Since num.outcomes is one in# this example, we use drop = TRUE to ignore this singleton dimension.mc.pred <- predict(mc.forest, drop = TRUE)# By default, the first ordinal treatment is used as baseline ("A" in this example),# giving two contrasts tau_B = Y(B) - Y(A), tau_C = Y(C) - Y(A)tau.hat <- mc.pred$predictionsplot(X[, 2], tau.hat[, "B - A"], ylab = "tau.contrast")abline(0, 1, col = "red")points(X[, 2], tau.hat[, "C - A"], col = "blue")abline(0, -1.5, col = "red")legend("topleft", c("B - A", "C - A"), col = c("black", "blue"), pch = 19)# A doubly robust estimate (AIPW) of the average treatment effect of the arms.average_treatment_effect(mc.forest)# The conditional response surfaces mu_k(X) for a single outcome can be reconstructed from# the contrasts tau_k(x), the treatment propensities e_k(x), and the conditional mean m(x).# Given treatment "A" as baseline we have:# m(x) := E[Y | X] = E[Y(A) | X] + E[W_B (Y(B) - Y(A))] + E[W_C (Y(C) - Y(A))]# which given unconfoundedness is equal to:# m(x) = mu(A, x) + e_B(x) tau_B(X) + e_C(x) tau_C(x)# Rearranging and plugging in the above expressions, we obtain the following estimates# * mu(A, x) = m(x) - e_B(x) tau_B(x) - e_C(x) tau_C(x)# * mu(B, x) = m(x) + (1 - e_B(x)) tau_B(x) - e_C(x) tau_C(x)# * mu(C, x) = m(x) - e_B(x) tau_B(x) + (1 - e_C(x)) tau_C(x)Y.hat <- mc.forest$Y.hatW.hat <- mc.forest$W.hatmuA <- Y.hat - W.hat[, "B"] * tau.hat[, "B - A"] - W.hat[, "C"] * tau.hat[, "C - A"]muB <- Y.hat + (1 - W.hat[, "B"]) * tau.hat[, "B - A"] - W.hat[, "C"] * tau.hat[, "C - A"]muC <- Y.hat - W.hat[, "B"] * tau.hat[, "B - A"] + (1 - W.hat[, "C"]) * tau.hat[, "C - A"]# These can also be obtained with some array manipulations.# (the first column is always the baseline arm)Y.hat.baseline <- Y.hat - rowSums(W.hat[, -1, drop = FALSE] * tau.hat)mu.hat.matrix <- cbind(Y.hat.baseline, c(Y.hat.baseline) + tau.hat)colnames(mu.hat.matrix) <- levels(W)head(mu.hat.matrix)# The reference level for contrast prediction can be changed with `relevel`.# Fit and predict with treatment B as baseline:W <- relevel(W, ref = "B")mc.forest.B <- multi_arm_causal_forest(X, Y, W)

Multi-task regression forest

Description

Trains a regression forest that can be used to estimatethe conditional mean functions mu_i(x) = E[Y_i | X = x]

Usage

multi_regression_forest(  X,  Y,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the regression.

Y

The outcomes (must be a numeric vector/matrix with no NAs).

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to an observation in estimation.If NULL, each observation is given the same weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained multi regression forest object.

Examples

# Train a standard regression forest.n <- 500p <- 5X <- matrix(rnorm(n * p), n, p)Y <-  X[, 1, drop = FALSE] %*% cbind(1, 2) + rnorm(n)mr.forest <- multi_regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)mr.pred <- predict(mr.forest, X.test)# Predict on out-of-bag training samples.mr.pred <- predict(mr.forest)

Plot a GRF tree object.

Description

The direction NAs are sent are indicated with the arrow fill. An empty arrow indicatesthat NAs are sent that way. If trained without missing values, both arrows are filled.

Usage

## S3 method for class 'grf_tree'plot(x, include.na.path = NULL, ...)

Arguments

x

The tree to plot

include.na.path

A boolean toggling whether to include the path of missing values or not.It defaults to whether the forest was trained with NAs.

...

Additional arguments (currently ignored).

Examples

## Not run: # Plot a tree in the forest (requires the `DiagrammeR` package).n <- 500p <- 10X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.5)Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)c.forest <- causal_forest(X, Y, W)plot(tree <- get_tree(c.forest, 1))# Compute the leaf nodes the first five samples falls into.leaf.nodes <- get_leaf_node(tree, X[1:5, ])# Saving a plot in .svg can be done with the `DiagrammeRsvg` package.install.packages("DiagrammeRsvg")tree.plot = plot(tree)cat(DiagrammeRsvg::export_svg(tree.plot), file = 'plot.svg')## End(Not run)

Plot the Targeting Operator Characteristic curve.

Description

Plot the Targeting Operator Characteristic curve.

Usage

## S3 method for class 'rank_average_treatment_effect'plot(x, ..., ci.args = list(), abline.args = list(), legend.args = list())

Arguments

x

The output of rank_average_treatment_effect.

...

Additional arguments passed to plot.

ci.args

Additional arguments passed to points.

abline.args

Additional arguments passed to abline.

legend.args

Additional arguments passed to legend.

Predict with a boosted regression forest.

Description

Gets estimates of E[Y|X=x] using a trained regression forest.

Usage

## S3 method for class 'boosted_regression_forest'predict(  object,  newdata = NULL,  boost.predict.steps = NULL,  num.threads = NULL,  ...)

Arguments

object

The trained forest.

newdata

Points at which predictions should be made. If NULL, makes out-of-bagpredictions on the training set instead (i.e., provides predictions atXi using only trees that did not use the i-th training example). Notethat this matrix should have the number of columns as the trainingmatrix, and that the columns must appear in the same order

boost.predict.steps

Number of boosting iterations to use for prediction. If blank, uses the full number ofsteps for the object given

num.threads

the number of threads used in prediction

...

Additional arguments (currently ignored).

Value

A vector of predictions.

Examples

# Train a boosted regression forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)r.boosted.forest <- boosted_regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)r.pred <- predict(r.boosted.forest, X.test)# Predict on out-of-bag training samples.r.pred <- predict(r.boosted.forest)

Predict with a causal forest

Description

Gets estimates of tau(x) using a trained causal forest.

Usage

## S3 method for class 'causal_forest'predict(  object,  newdata = NULL,  linear.correction.variables = NULL,  ll.lambda = NULL,  ll.weight.penalty = FALSE,  num.threads = NULL,  estimate.variance = FALSE,  ...)

Arguments

object

The trained forest.

newdata

linear.correction.variables

Optional subset of indexes for variables to be used in locallinear prediction. If NULL, standard GRF prediction is used. Otherwise,we run a locally weighted linear regression on the included variables.Please note that this is a beta feature still in development, and may slow downprediction considerably. Defaults to NULL.

ll.lambda

Ridge penalty for local linear predictions. Defaults to NULL and will be cross-validated.

ll.weight.penalty

Option to standardize ridge penalty by covariance (TRUE),or penalize all covariates equally (FALSE). Penalizes equally by default.

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat\tau(x) are desired(for confidence intervals).

...

Additional arguments (currently ignored).

Value

Vector of predictions, along with estimates of the error and(optionally) its variance estimates. Column 'predictions' contains estimatesof the conditional average treatent effect (CATE). The square-root ofcolumn 'variance.estimates' is the standard error of CATE.For out-of-bag estimates, we also output the following error measures.First, column 'debiased.error' contains estimates of the 'R-loss' criterion,(See Nie and Wager, 2021 for a justification). Second, column 'excess.error'contains jackknife estimates of the Monte-carlo error (Wager, Hastie, Efron 2014),a measure of how unstable estimates are if we grow forests of the same sizeon the same data set. The sum of 'debiased.error' and 'excess.error' is the raw errorattained by the current forest, and 'debiased.error' alone is an estimate of the errorattained by a forest with an infinite number of trees. We recommend that users growenough forests to make the 'excess.error' negligible.

References

Friedberg, Rina, Julie Tibshirani, Susan Athey, and Stefan Wager. "Local Linear Forests".Journal of Computational and Graphical Statistics, 30(2), 2020.

Wager, Stefan, Trevor Hastie, and Bradley Efron."Confidence intervals for random forests: The jackknife and the infinitesimal jackknife."The Journal of Machine Learning Research 15(1), 2014.

Nie, Xinkun, and Stefan Wager. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects".Biometrika, 108(2), 2021.

Examples

# Train a causal forest.n <- 100p <- 10X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.5)Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)c.forest <- causal_forest(X, Y, W)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)c.pred <- predict(c.forest, X.test)# Predict on out-of-bag training samples.c.pred <- predict(c.forest)# Predict with confidence intervals; growing more trees is now recommended.c.forest <- causal_forest(X, Y, W, num.trees = 500)c.pred <- predict(c.forest, X.test, estimate.variance = TRUE)

Predict with a causal survival forest forest

Description

Gets estimates of tau(X) using a trained causal survival forest.

Usage

## S3 method for class 'causal_survival_forest'predict(  object,  newdata = NULL,  num.threads = NULL,  estimate.variance = FALSE,  ...)

Arguments

object

The trained forest.

newdata

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat\tau(x) are desired(for confidence intervals).

...

Additional arguments (currently ignored).

Value

Vector of predictions along with optional variance estimates.

Examples

# Train a causal survival forest targeting a Restricted Mean Survival Time (RMST)# with maximum follow-up time set to `horizon`.n <- 2000p <- 5X <- matrix(runif(n * p), n, p)W <- rbinom(n, 1, 0.5)horizon <- 1failure.time <- pmin(rexp(n) * X[, 1] + W, horizon)censor.time <- 2 * runif(n)Y <- pmin(failure.time, censor.time)D <- as.integer(failure.time <= censor.time)# Save computation time by constraining the event grid by discretizing (rounding) continuous events.cs.forest <- causal_survival_forest(X, round(Y, 2), W, D, horizon = horizon)# Or do so more flexibly by defining your own time grid using the failure.times argument.# grid <- seq(min(Y), max(Y), length.out = 150)# cs.forest <- causal_survival_forest(X, Y, W, D, horizon = horizon, failure.times = grid)# Predict using the forest.X.test <- matrix(0.5, 10, p)X.test[, 1] <- seq(0, 1, length.out = 10)cs.pred <- predict(cs.forest, X.test)# Predict on out-of-bag training samples.cs.pred <- predict(cs.forest)# Predict with confidence intervals; growing more trees is now recommended.c.pred <- predict(cs.forest, X.test, estimate.variance = TRUE)# Compute a doubly robust estimate of the average treatment effect.average_treatment_effect(cs.forest)# Compute the best linear projection on the first covariate.best_linear_projection(cs.forest, X[, 1])# See if a causal survival forest succeeded in capturing heterogeneity by plotting# the TOC and calculating a 95% CI for the AUTOC.train <- sample(1:n, n / 2)eval <- -traintrain.forest <- causal_survival_forest(X[train, ], Y[train], W[train], D[train], horizon = horizon)eval.forest <- causal_survival_forest(X[eval, ], Y[eval], W[eval], D[eval], horizon = horizon)rate <- rank_average_treatment_effect(eval.forest,                                      predict(train.forest, X[eval, ])$predictions)plot(rate)paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2))

Predict with an instrumental forest

Description

Gets estimates of tau(x) using a trained instrumental forest.

Usage

## S3 method for class 'instrumental_forest'predict(  object,  newdata = NULL,  num.threads = NULL,  estimate.variance = FALSE,  ...)

Arguments

object

The trained forest.

newdata

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat\tau(x) are desired(for confidence intervals).

...

Additional arguments (currently ignored).

Value

Vector of predictions, along with (optional) variance estimates.

Examples

# Train an instrumental forest.n <- 2000p <- 5X <- matrix(rbinom(n * p, 1, 0.5), n, p)Z <- rbinom(n, 1, 0.5)Q <- rbinom(n, 1, 0.5)W <- Q * Ztau <-  X[, 1] / 2Y <- rowSums(X[, 1:3]) + tau * W + Q + rnorm(n)iv.forest <- instrumental_forest(X, Y, W, Z)# Predict on out-of-bag training samples.iv.pred <- predict(iv.forest)# Estimate a (local) average treatment effect.average_treatment_effect(iv.forest)

Predict with a local linear forest

Description

Gets estimates of E[Y|X=x] using a trained regression forest.

Usage

## S3 method for class 'll_regression_forest'predict(  object,  newdata = NULL,  linear.correction.variables = NULL,  ll.lambda = NULL,  ll.weight.penalty = FALSE,  num.threads = NULL,  estimate.variance = FALSE,  ...)

Arguments

object

The trained forest.

newdata

linear.correction.variables

Optional subset of indexes for variables to be used in locallinear prediction. If left NULL, all variables are used.We run a locally weighted linear regression on the included variables.Please note that this is a beta feature still in development, and may slow downprediction considerably. Defaults to NULL.

ll.lambda

Ridge penalty for local linear predictions. Defaults to NULL and will be cross-validated.

ll.weight.penalty

Option to standardize ridge penalty by covariance (TRUE),or penalize all covariates equally (FALSE). Defaults to FALSE.

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat\tau(x) are desired(for confidence intervals).

...

Additional arguments (currently ignored).

Value

A vector of predictions.

Examples

# Train the forest.n <- 50p <- 5X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)forest <- ll_regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)predictions <- predict(forest, X.test)# Predict on out-of-bag training samples.predictions.oob <- predict(forest)

Predict with a lm forest

Description

Gets estimates ofh_k(x), k = 1..K in the conditionally linear modelY = c(x) + h_1(x)W_1 + ... + h_K(x)W_K, for a target sample X = x.

Usage

## S3 method for class 'lm_forest'predict(  object,  newdata = NULL,  num.threads = NULL,  estimate.variance = FALSE,  drop = FALSE,  ...)

Arguments

object

The trained forest.

newdata

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat h_k(x) are desired(for confidence intervals). This option is currentlyonly supported for univariate outcomes Y.

drop

If TRUE, coerce the prediction result to the lowest possible dimension. Default is FALSE.

...

Additional arguments (currently ignored).

Value

A list with elements 'predictions': a 3d array of dimension [num.samples, K, M] withpredictions for regressor W, for each outcome 1,..,M (singleton dimensions in this array canbe dropped by passing the 'drop' argument to '[', or with the shorthand '$predictions[,,]'),and optionally 'variance.estimates': a matrix with K columns with variance estimates.

Examples

if (require("rdrobust", quietly = TRUE)) {# Train a LM Forest to estimate CATEs in a regression discontinuity design.# Simulate a simple example with a heterogeneous jump in the CEF.n <- 2000p <- 5X <- matrix(rnorm(n * p), n, p)Z <- runif(n, -4, 4)cutoff <- 0W <- as.numeric(Z >= cutoff)tau <- pmax(0.5 * X[, 1], 0)Y <- tau * W  + 1 / (1 + exp(2 * Z)) + 0.2 * rnorm(n)# Compute the MSE-optimal bandwidth for a local linear regression.bandwidth <- rdrobust::rdbwselect(Y, Z, cutoff)$bws[[1]] # Alternatively, specify bandwith manually.# Compute kernel weights for a triangular kernel.dist <- abs((Z - cutoff) / bandwidth)sample.weights <- (1 - dist) * (dist <= 1) / bandwidth# Estimate a local linear regression with the running variable Z conditional on covariates X = x:# Y = c(x) + tau(x) W + b(x) Z.# Specify gradient.weights = c(1, 0) to target heterogeneity in the RDD coefficient tau(x).# Also, fit forest on subset with non-zero weights for faster estimation.subset <- sample.weights > 0lmf <- lm_forest(X[subset, ], Y[subset], cbind(W, Z)[subset, ],                 sample.weights = sample.weights[subset], gradient.weights = c(1, 0))tau.hat <- predict(lmf)$predictions[, 1, ]# Plot estimated tau(x) vs simulated ground truth.plot(X[subset, 1], tau.hat)points(X[subset, 1], tau[subset], col = "red", cex = 0.1)}

Predict with a multi arm causal forest

Description

Gets estimates of contrasts tau_k(x) using a trained multi arm causal forest (k = 1,...,K-1where K is the number of treatments).

Usage

## S3 method for class 'multi_arm_causal_forest'predict(  object,  newdata = NULL,  num.threads = NULL,  estimate.variance = FALSE,  drop = FALSE,  ...)

Arguments

object

The trained forest.

newdata

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat\tau(x) are desired(for confidence intervals). This option is currentlyonly supported for univariate outcomes Y.

drop

If TRUE, coerce the prediction result to the lowest possible dimension. Default is FALSE.

...

Additional arguments (currently ignored).

Value

A list with elements 'predictions': a 3d array of dimension [num.samples, K-1, M] withpredictions for each contrast, for each outcome 1,..,M (singleton dimensions in this array canbe dropped by passing the 'drop' argument to '[', or with the shorthand '$predictions[,,]'),and optionally 'variance.estimates': a matrix with K-1 columns with variance estimates for each contrast.

Examples

# Train a multi arm causal forest.n <- 500p <- 10X <- matrix(rnorm(n * p), n, p)W <- as.factor(sample(c("A", "B", "C"), n, replace = TRUE))Y <- X[, 1] + X[, 2] * (W == "B") - 1.5 * X[, 2] * (W == "C") + rnorm(n)mc.forest <- multi_arm_causal_forest(X, Y, W)# Predict contrasts (out-of-bag) using the forest.# Fitting several outcomes jointly is supported, and the returned prediction array has# dimension [num.samples, num.contrasts, num.outcomes]. Since num.outcomes is one in# this example, we use drop = TRUE to ignore this singleton dimension.mc.pred <- predict(mc.forest, drop = TRUE)# By default, the first ordinal treatment is used as baseline ("A" in this example),# giving two contrasts tau_B = Y(B) - Y(A), tau_C = Y(C) - Y(A)tau.hat <- mc.pred$predictionsplot(X[, 2], tau.hat[, "B - A"], ylab = "tau.contrast")abline(0, 1, col = "red")points(X[, 2], tau.hat[, "C - A"], col = "blue")abline(0, -1.5, col = "red")legend("topleft", c("B - A", "C - A"), col = c("black", "blue"), pch = 19)# The average treatment effect of the arms with "A" as baseline.average_treatment_effect(mc.forest)# The conditional response surfaces mu_k(X) for a single outcome can be reconstructed from# the contrasts tau_k(x), the treatment propensities e_k(x), and the conditional mean m(x).# Given treatment "A" as baseline we have:# m(x) := E[Y | X] = E[Y(A) | X] + E[W_B (Y(B) - Y(A))] + E[W_C (Y(C) - Y(A))]# which given unconfoundedness is equal to:# m(x) = mu(A, x) + e_B(x) tau_B(X) + e_C(x) tau_C(x)# Rearranging and plugging in the above expressions, we obtain the following estimates# * mu(A, x) = m(x) - e_B(x) tau_B(x) - e_C(x) tau_C(x)# * mu(B, x) = m(x) + (1 - e_B(x)) tau_B(x) - e_C(x) tau_C(x)# * mu(C, x) = m(x) - e_B(x) tau_B(x) + (1 - e_C(x)) tau_C(x)Y.hat <- mc.forest$Y.hatW.hat <- mc.forest$W.hatmuA <- Y.hat - W.hat[, "B"] * tau.hat[, "B - A"] - W.hat[, "C"] * tau.hat[, "C - A"]muB <- Y.hat + (1 - W.hat[, "B"]) * tau.hat[, "B - A"] - W.hat[, "C"] * tau.hat[, "C - A"]muC <- Y.hat - W.hat[, "B"] * tau.hat[, "B - A"] + (1 - W.hat[, "C"]) * tau.hat[, "C - A"]# These can also be obtained with some array manipulations.# (the first column is always the baseline arm)Y.hat.baseline <- Y.hat - rowSums(W.hat[, -1, drop = FALSE] * tau.hat)mu.hat.matrix <- cbind(Y.hat.baseline, c(Y.hat.baseline) + tau.hat)colnames(mu.hat.matrix) <- levels(W)head(mu.hat.matrix)# The reference level for contrast prediction can be changed with `relevel`.# Fit and predict with treatment B as baseline:W <- relevel(W, ref = "B")mc.forest.B <- multi_arm_causal_forest(X, Y, W)

Predict with a multi regression forest

Description

Gets estimates of E[Y_i | X = x] using a trained multi regression forest.

Usage

## S3 method for class 'multi_regression_forest'predict(object, newdata = NULL, num.threads = NULL, drop = FALSE, ...)

Arguments

object

The trained forest.

newdata

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

drop

If TRUE, coerce the prediction result to the lowest possible dimension. Default is FALSE.

...

Additional arguments (currently ignored).

Value

A list containing 'predictions': a matrix of predictions for each outcome.

Examples

# Train a standard regression forest.n <- 500p <- 5X <- matrix(rnorm(n * p), n, p)Y <-  X[, 1, drop = FALSE] %*% cbind(1, 2) + rnorm(n)mr.forest <- multi_regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)mr.pred <- predict(mr.forest, X.test)# Predict on out-of-bag training samples.mr.pred <- predict(mr.forest)

Predict with a probability forest

Description

Gets estimates of P[Y = k | X = x] using a trained forest.

Usage

## S3 method for class 'probability_forest'predict(  object,  newdata = NULL,  num.threads = NULL,  estimate.variance = FALSE,  ...)

Arguments

object

The trained forest.

newdata

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for P[Y = k | X] are desired (for confidence intervals).

...

Additional arguments (currently ignored).

Value

A list with attributes 'predictions': a matrix of predictions for each class, and optionallythe attribute 'variance.estimates': a matrix of variance estimates for each class.

Examples

# Train a probability forest.p <- 5n <- 2000X <- matrix(rnorm(n*p), n, p)prob <- 1 / (1 + exp(-X[, 1] - X[, 2]))Y <- as.factor(rbinom(n, 1, prob))p.forest <- probability_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 10, p)X.test[, 1] <- seq(-1.5, 1.5, length.out = 10)p.hat <- predict(p.forest, X.test, estimate.variance = TRUE)# Plot the estimated success probabilities with 95 % confidence bands.prob.test <- 1 / (1 + exp(-X.test[, 1] - X.test[, 2]))p.true <- cbind(`0` = 1 - prob.test, `1` = prob.test)plot(X.test[, 1], p.true[, "1"], col = 'red', ylim = c(0, 1))points(X.test[, 1], p.hat$predictions[, "1"], pch = 16)lines(X.test[, 1], (p.hat$predictions + 2 * sqrt(p.hat$variance.estimates))[, "1"])lines(X.test[, 1], (p.hat$predictions - 2 * sqrt(p.hat$variance.estimates))[, "1"])# Predict on out-of-bag training samples.p.hat <- predict(p.forest)

Predict with a quantile forest

Description

Gets estimates of the conditional quantiles of Y given X using a trained forest.

Usage

## S3 method for class 'quantile_forest'predict(object, newdata = NULL, quantiles = NULL, num.threads = NULL, ...)

Arguments

object

The trained forest.

newdata

quantiles

Vector of quantiles at which estimates are required. If NULL, the quantilesused to train the forest is used. Default is NULL.

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

...

Additional arguments (currently ignored).

Value

A list with elements 'predictions': a matrix with predictions at each test point for each desired quantile.

Examples

# Train a quantile forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)q.forest <- quantile_forest(X, Y, quantiles = c(0.1, 0.5, 0.9))# Predict on out-of-bag training samples.q.pred <- predict(q.forest)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)q.pred <- predict(q.forest, X.test)

Predict with a regression forest

Description

Gets estimates of E[Y|X=x] using a trained regression forest.

Usage

## S3 method for class 'regression_forest'predict(  object,  newdata = NULL,  linear.correction.variables = NULL,  ll.lambda = NULL,  ll.weight.penalty = FALSE,  num.threads = NULL,  estimate.variance = FALSE,  ...)

Arguments

object

The trained forest.

newdata

linear.correction.variables

ll.lambda

Ridge penalty for local linear predictions. Defaults to NULL and will be cross-validated.

ll.weight.penalty

Option to standardize ridge penalty by covariance (TRUE),or penalize all covariates equally (FALSE). Defaults to FALSE.

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

estimate.variance

Whether variance estimates for\hat\tau(x) are desired(for confidence intervals).

...

Additional arguments (currently ignored).

Value

Vector of predictions, along with estimates of the error and(optionally) its variance estimates. Column 'predictions' containsestimates of E[Y|X=x]. The square-root of column 'variance.estimates' is the standard errorthe test mean-squared error. Column 'excess.error' containsjackknife estimates of the Monte-carlo error. The sum of 'debiased.error'and 'excess.error' is the raw error attained by the current forest, and'debiased.error' alone is an estimate of the error attained by a forest withan infinite number of trees. We recommend that users growenough forests to make the 'excess.error' negligible.

Examples

# Train a standard regression forest.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)r.forest <- regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)r.pred <- predict(r.forest, X.test)# Predict on out-of-bag training samples.r.pred <- predict(r.forest)# Predict with confidence intervals; growing more trees is now recommended.r.forest <- regression_forest(X, Y, num.trees = 100)r.pred <- predict(r.forest, X.test, estimate.variance = TRUE)

Predict with a survival forest

Description

Gets estimates of the conditional survival function S(t, x) = P[T > t | X = x] using a trained survival forest.The curve can be estimated by Kaplan-Meier, or Nelson-Aalen.

Usage

## S3 method for class 'survival_forest'predict(  object,  newdata = NULL,  failure.times = NULL,  prediction.times = c("curve", "time"),  prediction.type = c("Kaplan-Meier", "Nelson-Aalen"),  num.threads = NULL,  ...)

Arguments

object

The trained forest.

newdata

failure.times

A vector of survival times to make predictions at. If NULL, then thefailure times used for training the forest is used. If prediction.times = "curve" then thetime points should be in increasing order. Default is NULL.

prediction.times

"curve" predicts the survival curve S(t, x) on grid t = failure.times for each sample Xi."time" predicts S(t, x) at an event time t = failure.times[i] for each sample Xi.Default is "curve".

prediction.type

The type of estimate of the survival function, choices are "Kaplan-Meier" or "Nelson-Aalen".The default is the prediction.type used to train the forest.

num.threads

Number of threads used in prediction. If set to NULL, the softwareautomatically selects an appropriate amount.

...

Additional arguments (currently ignored).

Value

A list with elements

predictions: a matrix of survival curves. If prediction.times = "curve" then each rowis the survival curve for sample Xi: predictions[i, j] = S(failure.times[j], Xi).If prediction.times = "time" then each row is the survival curve at time point failure.times[i]for sample Xi: predictions[i, ] = S(failure.times[i], Xi).
failure.times: a vector of event times t for the survival curve.

Examples

# Train a standard survival forest.n <- 2000p <- 5X <- matrix(rnorm(n * p), n, p)failure.time <- exp(0.5 * X[, 1]) * rexp(n)censor.time <- 2 * rexp(n)Y <- pmin(failure.time, censor.time)D <- as.integer(failure.time <= censor.time)# Save computation time by constraining the event grid by discretizing (rounding) continuous events.s.forest <- survival_forest(X, round(Y, 2), D)# Or do so more flexibly by defining your own time grid using the failure.times argument.# grid <- seq(min(Y[D==1]), max(Y[D==1]), length.out = 150)# s.forest <- survival_forest(X, Y, D, failure.times = grid)# Predict using the forest.X.test <- matrix(0, 3, p)X.test[, 1] <- seq(-2, 2, length.out = 3)s.pred <- predict(s.forest, X.test)# Plot the survival curve.plot(NA, NA, xlab = "failure time", ylab = "survival function",     xlim = range(s.pred$failure.times),     ylim = c(0, 1))for(i in 1:3) {  lines(s.pred$failure.times, s.pred$predictions[i,], col = i)  s.true = exp(-s.pred$failure.times / exp(0.5 * X.test[i, 1]))  lines(s.pred$failure.times, s.true, col = i, lty = 2)}# Predict on out-of-bag training samples.s.pred <- predict(s.forest)# Compute OOB concordance based on the mortality score in Ishwaran et al. (2008).s.pred.nelson.aalen <- predict(s.forest, prediction.type = "Nelson-Aalen")chf.score <- rowSums(-log(s.pred.nelson.aalen$predictions))if (require("survival", quietly = TRUE)) { concordance(Surv(Y, D) ~ chf.score, reverse = TRUE)}

Print a boosted regression forest

Description

Print a boosted regression forest

Usage

## S3 method for class 'boosted_regression_forest'print(x, ...)

Arguments

x

The boosted forest to print.

...

Additional arguments (currently ignored).

Print a GRF forest object.

Description

Print a GRF forest object.

Usage

## S3 method for class 'grf'print(x, decay.exponent = 2, max.depth = 4, ...)

Arguments

x

The tree to print.

decay.exponent

A tuning parameter that controls the importance of split depth.

max.depth

The maximum depth of splits to consider.

...

Additional arguments (currently ignored).

Print a GRF tree object.

Description

Print a GRF tree object.

Usage

## S3 method for class 'grf_tree'print(x, ...)

Arguments

x

The tree to print.

...

Additional arguments (currently ignored).

Print the Rank-Weighted Average Treatment Effect (RATE).

Description

Print the Rank-Weighted Average Treatment Effect (RATE).

Usage

## S3 method for class 'rank_average_treatment_effect'print(x, ...)

Arguments

x

The output of rank_average_treatment_effect.

...

Additional arguments (currently ignored).

Print tuning output.Displays average error for q-quantiles of tuned parameters.

Description

Print tuning output.Displays average error for q-quantiles of tuned parameters.

Usage

## S3 method for class 'tuning_output'print(x, tuning.quantiles = seq(0, 1, 0.2), ...)

Arguments

x

The tuning output to print.

tuning.quantiles

vector of quantiles to display average error over.Default: seq(0, 1, 0.2) (quintiles)

...

Additional arguments (currently ignored).

Probability forest

Description

Trains a probability forest that can be used to estimatethe conditional class probabilities P[Y = k | X = x]

Usage

probability_forest(  X,  Y,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  ci.group.size = 2,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates.

Y

The class label (must be a factor vector with no NAs).

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to an observation in estimation.If NULL, each observation is given the same weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained probability forest object.

Examples

# Train a probability forest.p <- 5n <- 2000X <- matrix(rnorm(n*p), n, p)prob <- 1 / (1 + exp(-X[, 1] - X[, 2]))Y <- as.factor(rbinom(n, 1, prob))p.forest <- probability_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 10, p)X.test[, 1] <- seq(-1.5, 1.5, length.out = 10)p.hat <- predict(p.forest, X.test, estimate.variance = TRUE)# Plot the estimated success probabilities with 95 % confidence bands.prob.test <- 1 / (1 + exp(-X.test[, 1] - X.test[, 2]))p.true <- cbind(`0` = 1 - prob.test, `1` = prob.test)plot(X.test[, 1], p.true[, "1"], col = 'red', ylim = c(0, 1))points(X.test[, 1], p.hat$predictions[, "1"], pch = 16)lines(X.test[, 1], (p.hat$predictions + 2 * sqrt(p.hat$variance.estimates))[, "1"])lines(X.test[, 1], (p.hat$predictions - 2 * sqrt(p.hat$variance.estimates))[, "1"])# Predict on out-of-bag training samples.p.hat <- predict(p.forest)

Quantile forest

Description

Trains a regression forest that can be used to estimatequantiles of the conditional distribution of Y given X = x.

Usage

quantile_forest(  X,  Y,  num.trees = 2000,  quantiles = c(0.1, 0.5, 0.9),  regression.splitting = FALSE,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  compute.oob.predictions = FALSE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the quantile regression.

Y

The outcome.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

quantiles

Vector of quantiles used to calibrate the forest. Default is (0.1, 0.5, 0.9).

regression.splitting

Whether to use regression splits when growing trees insteadof specialized splits based on the quantiles (the default).Setting this flag to true corresponds to the approach toquantile forests from Meinshausen (2006). Default is FALSE.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is FALSE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained quantile forest object.

References

Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized Random Forests".Annals of Statistics, 47(2), 2019.

Examples

# Generate data.n <- 50p <- 10X <- matrix(rnorm(n * p), n, p)X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)Y <- X[, 1] * rnorm(n)# Train a quantile forest.q.forest <- quantile_forest(X, Y, quantiles = c(0.1, 0.5, 0.9))# Make predictions.q.hat <- predict(q.forest, X.test)# Make predictions for different quantiles than those used in training.q.hat <- predict(q.forest, X.test, quantiles = c(0.1, 0.9))# Train a quantile forest using regression splitting instead of quantile-based# splits, emulating the approach in Meinshausen (2006).meins.forest <- quantile_forest(X, Y, regression.splitting = TRUE)# Make predictions for the desired quantiles.q.hat <- predict(meins.forest, X.test, quantiles = c(0.1, 0.5, 0.9))

Estimate a Rank-Weighted Average Treatment Effect (RATE).

Description

Consider a ruleS(X_i) assigning scores to units in decreasing order of treatment prioritization.In the case of a forest with binary treatment, we provide estimates of the following, where1/n <= q <= 1 represents the fraction of treated units:

The Rank-Weighted Average Treatment Effect (RATE):\int_{0}^{1} alpha(q) TOC(q; S) dq, where alpha is a weighting methodcorresponding to either 'AUTOC' or 'QINI'.
The Targeting Operator Characteristic (TOC):E[Y_i(1) - Y_i(0) | F(S(X_i)) \geq 1 - q] - E[Y_i(1) - Y_i(0)],whereF(\cdot) is the distribution function ofS(X_i).

The Targeting Operator Characteristic (TOC) is a curve comparing the benefit of treating only a certainfraction q of units (as prioritized byS(X_i)), to the overall average treatment effect.The Rank-Weighted Average Treatment Effect (RATE) is a weighted sum of this curve,and is a measure designed to identify prioritization rules that effectively targets treatment(and can thus be used to test for the presence of heterogeneous treatment effects).

Usage

rank_average_treatment_effect(  forest,  priorities,  target = c("AUTOC", "QINI"),  q = seq(0.1, 1, by = 0.1),  R = 200,  subset = NULL,  debiasing.weights = NULL,  compliance.score = NULL,  num.trees.for.weights = 500)

Arguments

forest

The evaluation set forest.

priorities

Treatment prioritization scores S(Xi) for the units used to train the evaluation forest.Two prioritization rules can be compared by supplying a two-column array or named list of priorities(yielding paired standard errors that account for the correlation between RATE metrics estimated onthe same evaluation data).WARNING: for valid statistical performance, these scores should be constructed independently from the evaluationforest training data.

target

The type of RATE estimate, options are "AUTOC" (exhibits greater power when only a small subsetof the population experience nontrivial heterogeneous treatment effects) or "QINI" (exhibits greater powerwhen the entire population experience diffuse or substantial heterogeneous treatment effects).Default is "AUTOC".

q

The grid q to compute the TOC curve on. Default is(10%, 20%, ..., 100%).

R

Number of bootstrap replicates for SEs. Default is 200.

subset

Specifies subset of the training examples over which weestimate the RATE. WARNING: For valid statistical performance,the subset should be defined only using features Xi, not usingthe treatment Wi or the outcome Yi.

debiasing.weights

compliance.score

num.trees.for.weights

Value

A list of class 'rank_average_treatment_effect' with elements

estimate: the RATE estimate.
std.err: bootstrapped standard error of RATE.
target: the type of estimate.
TOC: a data.frame with the Targeting Operator Characteristic curveestimated on grid q, along with bootstrapped SEs.

References

Yadlowsky, Steve, Scott Fleming, Nigam Shah, Emma Brunskill, and Stefan Wager."Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects."Journal of the American Statistical Association, 120(549), 2025.

Examples

# Simulate a simple medical example with a binary outcome and heterogeneous treatment effects.# We're imagining that the treatment W decreases the risk of getting a stroke for some units,# while having no effect on the other units (those with X1 < 0).n <- 2000p <- 5X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.5)stroke.probability <- 1 / (1 + exp(2 * (pmax(2 * X[, 1], 0) * W - X[, 2])))Y.stroke <- rbinom(n, 1, stroke.probability)# We'll label the outcome Y such that "large" values are "good" to make interpretation easier.# With Y=1 ("no stroke") and Y=0 ("stroke"), then an average treatment effect,# E[Y(1) - Y(0)] = P[Y(1) = 1] - P[Y(0) = 1], quantifies the counterfactual risk difference# of being stroke-free with treatment over being stroke-free without treatment.# This will be positive if the treatment decreases the risk of getting a stroke.Y <- 1 - Y.stroke# Train a CATE estimator on a training set.train <- sample(1:n, n / 2)cf.cate <- causal_forest(X[train, ], Y[train], W[train])# Predict treatment effects on a held-out test set.test <- -traincate.hat <-  predict(cf.cate, X[test, ])$predictions# Next, use the RATE metric to assess heterogeneity.# Fit an evaluation forest for estimating the RATE.cf.eval <- causal_forest(X[test, ], Y[test], W[test])# Form a doubly robust RATE estimate on the held-out test set.rate <- rank_average_treatment_effect(cf.eval, cate.hat)# Plot the Targeting Operator Characteristic (TOC) curve.# In this example, the ATE among the units with high predicted CATEs# is substantially larger than the overall ATE.plot(rate)# Get an estimate of the area under the TOC (AUTOC).rate# Construct a 95% CI for the AUTOC.# A significant result suggests that there are HTEs and that the CATE-based prioritization rule# is effective at stratifying the sample.# A non-significant result would suggest that either there are no HTEs# or that the treatment prioritization rule does not predict them effectively.rate$estimate + 1.96*c(-1, 1)*rate$std.err# In some applications, we may be interested in other ways to target treatment.# One example is baseline risk. In our example, we could estimate the probability of getting# a stroke in the absence of treatment, and then use this as a non-causal heuristic# to prioritize individuals with a high baseline risk.# The hope would be that patients with a high predicted risk of getting a stroke,# also have a high treatment effect.# We can use the RATE metric to evaluate this treatment prioritization rule.# First, fit a baseline risk model on the training set control group (W=0).train.control <- train[W[train] == 0]rf.risk <- regression_forest(X[train.control, ], Y.stroke[train.control])# Then, on the test set, predict the baseline risk of getting a stroke.baseline.risk.hat <- predict(rf.risk, X[test, ])$predictions# Use RATE to compare CATE and risk-based prioritization rules.rate.diff <- rank_average_treatment_effect(cf.eval, cbind(cate.hat, baseline.risk.hat))plot(rate.diff)# Construct a 95 % CI for the AUTOC and the difference in AUTOC.rate.diff$estimate + data.frame(lower = -1.96 * rate.diff$std.err,                                upper = 1.96 * rate.diff$std.err,                                row.names = rate.diff$target)

Fitter function for Rank-Weighted Average Treatment Effect (RATE).

Description

Provides an optional interface torank_average_treatment_effect which allows for user-suppliedevaluation scores.

Usage

rank_average_treatment_effect.fit(  DR.scores,  priorities,  target = c("AUTOC", "QINI"),  q = seq(0.1, 1, by = 0.1),  R = 200,  sample.weights = NULL,  clusters = NULL)

Arguments

DR.scores

A vector with the evaluation set scores.

priorities

Treatment prioritization scores S(Xi) for the units in the evaluation set.Two prioritization rules can be compared by supplying a two-column array or named list of priorities(yielding paired standard errors that account for the correlation between RATE metrics estimated onthe same evaluation data).WARNING: for valid statistical performance, these scores should be constructed independently from the evaluationdataset used to construct DR.scores.

target

q

The grid q to compute the TOC curve on. Default is(10%, 20%, ..., 100%).

R

Number of bootstrap replicates for SEs. Default is 200.

sample.weights

Weights given to an observation in estimation.If NULL, each observation is given the same weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

Value

A list of class 'rank_average_treatment_effect' with elements

estimate: the RATE estimate.
std.err: bootstrapped standard error of RATE.
target: the type of estimate.
TOC: a data.frame with the Targeting Operator Characteristic curveestimated on grid q, along with bootstrapped SEs.

Examples

# Estimate CATEs with a causal forest.n <- 2000p <- 5X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.5)event.probability <- 1 / (1 + exp(2 * (pmax(2 * X[, 1], 0) * W - X[, 2])))Y <- 1 - rbinom(n, 1, event.probability)train <- sample(1:n, n / 2)cf.cate <- causal_forest(X[train, ], Y[train], W[train])# Predict treatment effects on a held-out test set.test <- -traincate.hat <-  predict(cf.cate, X[test, ])$predictions# Estimate AIPW nuisance components on the held-out test set.Y.forest.eval <- regression_forest(X[test, ], Y[test], num.trees = 500)Y.hat.eval <- predict(Y.forest.eval)$predictionsW.forest.eval <- regression_forest(X[test, ], W[test], num.trees = 500)W.hat.eval <- predict(W.forest.eval)$predictionscf.eval <- causal_forest(X[test, ], Y[test], W[test],                         Y.hat = Y.hat.eval,                         W.hat = W.hat.eval)# Form doubly robust scores.tau.hat.eval <- predict(cf.eval)$predictionsdebiasing.weights.eval <- (W[test] - W.hat.eval) / (W.hat.eval * (1 - W.hat.eval))Y.residual.eval <- Y[test] - (Y.hat.eval + tau.hat.eval * (W[test] - W.hat.eval))DR.scores <- tau.hat.eval + debiasing.weights.eval * Y.residual.eval# Could equivalently be obtained by# DR.scores <- get_scores(cf.eval)# Form a doubly robust RATE estimate on the held-out test set.rate <- rank_average_treatment_effect.fit(DR.scores, cate.hat)rate# Same as# rate <- rank_average_treatment_effect(cf.eval, cate.hat)# In settings where the treatment randomization probabilities W.hat are known, an# alternative to AIPW scores is to use inverse-propensity weighting (IPW):# 1(W=1) * Y / W.hat - 1(W=0) * Y / (1 - W.hat).# Here, W.hat = 0.5, and an IPW-based estimate of RATE is:IPW.scores <- ifelse(W[test] == 1, Y[test] / 0.5, -Y[test] / 0.5)rate.ipw <- rank_average_treatment_effect.fit(IPW.scores, cate.hat)rate.ipw# IPW-based estimators typically have higher variance. For details on# score constructions for other causal estimands, please see the RATE paper.

Regression forest

Description

Trains a regression forest that can be used to estimatethe conditional mean function mu(x) = E[Y | X = x]

Usage

regression_forest(  X,  Y,  num.trees = 2000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 5,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  imbalance.penalty = 0,  ci.group.size = 2,  tune.parameters = "none",  tune.num.trees = 50,  tune.num.reps = 100,  tune.num.draws = 1000,  compute.oob.predictions = TRUE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates used in the regression.

Y

The outcome.

num.trees

Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000.

sample.weights

Weights given to an observation in estimation.If NULL, each observation is given the same weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. Default is 0.05.

imbalance.penalty

A tuning parameter that controls how harshly imbalanced splits are penalized. Default is 0.

ci.group.size

The forest will grow ci.group.size trees on each subsample.In order to provide confidence intervals, ci.group.size mustbe at least 2. Default is 2.

tune.parameters

tune.num.trees

The number of trees in each 'mini forest' used to fit the tuning model. Default is 50.

tune.num.reps

The number of forests used to fit the tuning model. Default is 100.

tune.num.draws

The number of random parameter values considered when using the modelto select the optimal parameters. Default is 1000.

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained regression forest object. If tune.parameters is enabled,then tuning information will be included through the 'tuning.output' attribute.

Examples

# Train a standard regression forest.n <- 500p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)r.forest <- regression_forest(X, Y)# Predict using the forest.X.test <- matrix(0, 101, p)X.test[, 1] <- seq(-2, 2, length.out = 101)r.pred <- predict(r.forest, X.test)# Predict on out-of-bag training samples.r.pred <- predict(r.forest)# Predict with confidence intervals; growing more trees is now recommended.r.forest <- regression_forest(X, Y, num.trees = 100)r.pred <- predict(r.forest, X.test, estimate.variance = TRUE)

Calculate which features the forest split on at each depth.

Description

Calculate which features the forest split on at each depth.

Usage

split_frequencies(forest, max.depth = 4)

Arguments

forest

The trained forest.

max.depth

Maximum depth of splits to consider.

Value

A matrix of split depth by feature index, where each valueis the number of times the feature was split on at that depth.

Examples

# Train a quantile forest.n <- 250p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)q.forest <- quantile_forest(X, Y, quantiles = c(0.1, 0.5, 0.9))# Calculate the split frequencies for this forest.split_frequencies(q.forest)

Survival forest

Description

Trains a forest for right-censored surival data that can be used to estimate theconditional survival function S(t, x) = P[T > t | X = x]

Usage

survival_forest(  X,  Y,  D,  failure.times = NULL,  num.trees = 1000,  sample.weights = NULL,  clusters = NULL,  equalize.cluster.weights = FALSE,  sample.fraction = 0.5,  mtry = min(ceiling(sqrt(ncol(X)) + 20), ncol(X)),  min.node.size = 15,  honesty = TRUE,  honesty.fraction = 0.5,  honesty.prune.leaves = TRUE,  alpha = 0.05,  prediction.type = c("Kaplan-Meier", "Nelson-Aalen"),  compute.oob.predictions = TRUE,  fast.logrank = FALSE,  num.threads = NULL,  seed = runif(1, 0, .Machine$integer.max))

Arguments

X

The covariates.

Y

The event time (must be non-negative).

D

The event type (0: censored, 1: failure/observed event).

failure.times

A vector of event times to fit the survival curve at. If NULL, then all the observedfailure times are used. This speeds up forest estimation by constraining the event grid. Observed eventtimes are rounded down to the last sorted occurance less than or equal to the specified failure time.The time points should be in increasing order. Default is NULL.

num.trees

Number of trees grown in the forest. Default is 1000.

sample.weights

Weights given to an observation in prediction.If NULL, each observation is given the same weight. Default is NULL.

clusters

Vector of integers or factors specifying which cluster each observation corresponds to.Default is NULL (ignored).

equalize.cluster.weights

sample.fraction

Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5.

mtry

Number of variables tried for each split. Default is\sqrt p + 20 where p is the number of variables.

min.node.size

A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 15.

honesty

honesty.fraction

honesty.prune.leaves

alpha

A tuning parameter that controls the maximum imbalance of a split. The number of failures ineach child has to be at least one or 'alpha' times the number of samples in the parent node. Default is 0.05.(On data with very low event rate the default value may be too high for the forest to splitand lowering it may be beneficial).

prediction.type

The type of estimate of the survival function, choices are "Kaplan-Meier" or "Nelson-Aalen".Only relevant if 'compute.oob.predictions' is TRUE. Default is "Kaplan-Meier".

compute.oob.predictions

Whether OOB predictions on training set should be precomputed. Default is TRUE.

fast.logrank

If TRUE, uses a fast approximate log-rank criterion that speeds up forest training withoutloss of accuracy. When enabled, there is no need to discretize, or constrain the event grid to improve speed.Predictions may differ slightly from the exact method. Default is FALSE for consistency with earlier versions.

num.threads

Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency.

seed

The seed of the C++ random number generator.

Value

A trained survival_forest forest object.

References

Ishwaran, Hemant, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer."Random survival forests." The Annals of Applied Statistics 2.3 (2008): 841-860.

Sverdrup, Erik, James Yang, and Michael LeBlanc."Efficient Log-Rank Updates for Random Survival Forests."arXiv preprint arXiv:2510.03665, 2025.

Examples

# Train a standard survival forest.n <- 2000p <- 5X <- matrix(rnorm(n * p), n, p)failure.time <- exp(0.5 * X[, 1]) * rexp(n)censor.time <- 2 * rexp(n)Y <- pmin(failure.time, censor.time)D <- as.integer(failure.time <= censor.time)# Save computation time by constraining the event grid by discretizing (rounding) continuous events.s.forest <- survival_forest(X, round(Y, 2), D)# Or do so more flexibly by defining your own time grid using the failure.times argument.# grid <- seq(min(Y[D==1]), max(Y[D==1]), length.out = 150)# s.forest <- survival_forest(X, Y, D, failure.times = grid)# Predict using the forest.X.test <- matrix(0, 3, p)X.test[, 1] <- seq(-2, 2, length.out = 3)s.pred <- predict(s.forest, X.test)# Plot the survival curve.plot(NA, NA, xlab = "failure time", ylab = "survival function",     xlim = range(s.pred$failure.times),     ylim = c(0, 1))for(i in 1:3) {  lines(s.pred$failure.times, s.pred$predictions[i,], col = i)  s.true = exp(-s.pred$failure.times / exp(0.5 * X.test[i, 1]))  lines(s.pred$failure.times, s.true, col = i, lty = 2)}# Predict on out-of-bag training samples.s.pred <- predict(s.forest)# Compute OOB concordance based on the mortality score in Ishwaran et al. (2008).s.pred.nelson.aalen <- predict(s.forest, prediction.type = "Nelson-Aalen")chf.score <- rowSums(-log(s.pred.nelson.aalen$predictions))if (require("survival", quietly = TRUE)) { concordance(Surv(Y, D) ~ chf.score, reverse = TRUE)}

Omnibus evaluation of the quality of the random forest estimates via calibration.

Description

Test calibration of the forest. Computes the best linear fit of the targetestimand using the forest prediction (on held-out data) as well as the meanforest prediction as the sole two regressors. A coefficient of 1 for'mean.forest.prediction' suggests that the mean forest prediction is correct,whereas a coefficient of 1 for 'differential.forest.prediction' additionally suggeststhat the heterogeneity estimates from the forest are well calibrated.The p-value of the 'differential.forest.prediction' coefficientalso acts as an omnibus test for the presence of heterogeneity: If the coefficientis significantly greater than 0, then we can reject the null ofno heterogeneity. For another class of omnnibus tests seerank_average_treatment_effect.

Usage

test_calibration(forest, vcov.type = "HC3")

Arguments

forest

The trained forest.

vcov.type

Value

A heteroskedasticity-consistent test of calibration.

References

Cameron, A. Colin, and Douglas L. Miller. "A practitioner's guide tocluster-robust inference." Journal of Human Resources 50, no. 2 (2015): 317-372.

Chernozhukov, Victor, Mert Demirer, Esther Duflo, and Ivan Fernandez-Val."Generic Machine Learning Inference on Heterogenous Treatment Effects inRandomized Experiments." arXiv preprint arXiv:1712.04802 (2017).

MacKinnon, James G., and Halbert White. "Some heteroskedasticity-consistentcovariance matrix estimators with improved finite sample properties."Journal of Econometrics 29.3 (1985): 305-325.

Examples

n <- 800p <- 5X <- matrix(rnorm(n * p), n, p)W <- rbinom(n, 1, 0.25 + 0.5 * (X[, 1] > 0))Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)forest <- causal_forest(X, Y, W)test_calibration(forest)

Tune a forest

Description

Finds the optimal parameters to be used in training a forest.

Usage

tune_forest(  data,  nrow.X,  ncol.X,  args,  tune.parameters,  tune.parameters.defaults,  tune.num.trees,  tune.num.reps,  tune.num.draws,  train)

Arguments

data

The data arguments (output from create_train_matrices) for the forest.

nrow.X

The number of observations.

ncol.X

The number of variables.

args

The remaining call arguments for the forest.

tune.parameters

The vector of parameter names to tune.

tune.parameters.defaults

The grf default values for the vector of parameter names to tune.

tune.num.trees

The number of trees in each 'mini forest' used to fit the tuning model.

tune.num.reps

The number of forests used to fit the tuning model.

tune.num.draws

The number of random parameter values considered when using the modelto select the optimal parameters.

train

The grf forest training function.

Value

tuning output

Local linear forest tuning

Description

Finds the optimal ridge penalty for local linear causal prediction.

Usage

tune_ll_causal_forest(  forest,  linear.correction.variables = NULL,  ll.weight.penalty = FALSE,  num.threads = NULL,  lambda.path = NULL)

Arguments

forest

The forest used for prediction.

linear.correction.variables

Variables to use for local linear prediction. If left null,all variables are used. Default is NULL.

ll.weight.penalty

Option to standardize ridge penalty by covariance (TRUE),or penalize all covariates equally (FALSE). Defaults to FALSE.

num.threads

Number of threads used in training. If set to NULL, the softwareautomatically selects an appropriate amount.

lambda.path

Optional list of lambdas to use for cross-validation.

Value

A list of lambdas tried, corresponding errors, and optimal ridge penalty lambda.

Local linear forest tuning

Description

Finds the optimal ridge penalty for local linear prediction.

Usage

tune_ll_regression_forest(  forest,  linear.correction.variables = NULL,  ll.weight.penalty = FALSE,  num.threads = NULL,  lambda.path = NULL)

Arguments

forest

The forest used for prediction.

linear.correction.variables

Variables to use for local linear prediction. If left null,all variables are used. Default is NULL.

ll.weight.penalty

Option to standardize ridge penalty by covariance (TRUE),or penalize all covariates equally (FALSE). Defaults to FALSE.

num.threads

Number of threads used in training. If set to NULL, the softwareautomatically selects an appropriate amount.

lambda.path

Optional list of lambdas to use for cross-validation.

Value

A list of lambdas tried, corresponding errors, and optimal ridge penalty lambda.

Calculate a simple measure of 'importance' for each feature.

Description

A simple weighted sum of how many times feature i was split on at each depth in the forest.

Usage

variable_importance(forest, decay.exponent = 2, max.depth = 4)

Arguments

forest

The trained forest.

decay.exponent

A tuning parameter that controls the importance of split depth.

max.depth

Maximum depth of splits to consider.

Value

A list specifying an 'importance value' for each feature.

Examples

# Train a quantile forest.n <- 250p <- 10X <- matrix(rnorm(n * p), n, p)Y <- X[, 1] * rnorm(n)q.forest <- quantile_forest(X, Y, quantiles = c(0.1, 0.5, 0.9))# Calculate the 'importance' of each feature.variable_importance(q.forest)

Movatterモバイル変換

grf: Generalized Random Forests

Description

Author(s)

See Also

Examples

Get doubly robust estimates of average treatment effects.

Description

Usage

Arguments

Details

Value

References

Examples

Estimate the best linear projection of a conditional average treatment effect.

Description

Usage

Arguments

Details

Value

References

Examples

Boosted regression forest

Description

Usage

Arguments

Value

Examples

Simple clustered bootstrap.

Description

Usage

Arguments

Value

References

Causal forest

Description

Usage

Arguments

Value

References

Examples

Causal survival forest

Description

Usage

Arguments

Details

Value

References

Examples

Writes each node informationIf it is a leaf node: show it in different color, show number of samples, show leaf idIf it is a non-leaf node: show its splitting variable and splitting valueIf trained with missing values, the edge arrow is filled according to which direction the NAs are sent.

Description

Usage

Arguments

Compute rate estimates, a function to be passed on to bootstrap routine.

Description

Usage

Arguments

Value

Compute E[T | X]

Description

Usage

Arguments

Value

Export a tree in DOT format.This function generates a GraphViz representation of the tree,which is then written into 'dot_string'.

Description

Usage

Arguments

Generate causal forest data

Description

Usage

Arguments

Details

Value

Examples

Simulate causal survival data

Description

Usage

Arguments

Value

Examples