| Type: | Package |
| Title: | Targeted Inference |
| Version: | 0.7 |
| Date: | 2025-12-09 |
| Author: | Klaus K. Holst [aut, cre], Benedikt Sommer [aut], Andreas Nordland [aut], Christian B. Pipper [ctb] |
| Maintainer: | Klaus K. Holst <klaus@holst.it> |
| Description: | Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>). |
| Depends: | R (≥ 4.1) |
| Imports: | R6, Rcpp (≥ 1.0.0), abind, cli, future.apply, lava (≥1.8.2), methods, mets (≥ 1.3.8), quadprog, progressr, rlang,survival |
| Suggests: | SuperLearner (≥ 2.0-28), MASS, cmprsk, data.table, e1071,earth, glmnet, grf, hal9001, mgcv, nnls, optimx, polle (≥1.5), pracma, quarto, randomForestSRC, ranger, riskRegression,scatterplot3d, tinytest, viridisLite, xgboost (≥ 3.1.2.1), |
| BugReports: | https://github.com/kkholst/targeted/issues |
| URL: | https://kkholst.github.io/targeted/ |
| License: | Apache License (== 2.0) |
| LinkingTo: | Rcpp, RcppArmadillo |
| LazyLoad: | yes |
| NeedsCompilation: | yes |
| ByteCompile: | yes |
| RcppModules: | riskregmodel |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | quarto |
| SystemRequirements: | Quarto command line tools(https://github.com/quarto-dev/quarto-cli). |
| Packaged: | 2025-12-09 17:27:41 UTC; kkzh |
| Repository: | CRAN |
| Date/Publication: | 2025-12-10 06:20:08 UTC |
targeted: Targeted Inference
Description

Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005)doi:10.1111/j.1541-0420.2005.00377.x), variable importance and conditional average treatment effects (CATE) (van der Laan (2006)doi:10.2202/1557-4679.1008), estimators for risk differences and relative risks (Richardson et al. (2017)doi:10.1080/01621459.2016.1192546), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022)doi:10.1111/rssb.12504).
Author(s)
Maintainer: Klaus K. Holstklaus@holst.it
Authors:
Benedikt Sommerbenediktsommer92@gmail.com
Andreas Nordlandandreasnordland@gmail.com
Other contributors:
Christian B. Pippercbpipper@gmail.com [contributor]
Klaus K. Holst (Maintainer)klaus@holst.it
References
Bang & Robins (2005) Doubly Robust Estimation in Missing Data andCausal Inference Models, Biometrics.
Vansteelandt & Dukes (2022) Assumption-lean inference forgeneralised linear model parameters, Journal of the Royal StatisticalSociety: Series B (Statistical Methodology).
Thomas S. Richardson, James M. Robins & Linbo Wang (2017) On Modeling andEstimation for the Relative Risk and Risk Difference, Journal of the AmericanStatistical Association.
Mark J. van der Laan (2006) Statistical Inference for Variable Importance,The International Journal of Biostatistics.
See Also
Useful links:
Examples
## Not run: example(riskreg)example(cate)example(ate)example(calibration)## End(Not run)ML model
Description
Wrapper for ml_model
Usage
ML(formula, model = "glm", ...)Arguments
formula | formula |
model | model (sl, rf, pf, glm, ...) |
... | additional arguments to model object |
Responder Average Treatment Effect
Description
Estimation of the Average Treatment Effect among Responders
Usage
RATE( response, post.treatment, treatment, data, family = gaussian(), M = 5, pr.treatment, treatment.level, SL.args.response = list(family = gaussian(), SL.library = c("SL.mean", "SL.glm")), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), preprocess = NULL, efficient = TRUE, ...)Arguments
response | Response formula (e.g, Y ~ D*A) |
post.treatment | Post treatment marker formula (e.g., D ~ W) |
treatment | Treatment formula (e.g, A ~ 1) |
data | data.frame |
family | Exponential family for response (default gaussian) |
M | Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment | (optional) Randomization probability of treatment. |
treatment.level | Treatment level in binary treatment (default 1) |
SL.args.response | Arguments to SuperLearner for the response model |
SL.args.post.treatment | Arguments to SuperLearner for the posttreatment indicator |
preprocess | (optional) Data preprocessing function |
efficient | If TRUE, the estimate will be efficient. If FALSE, theestimate will be a simple plug-in estimate. |
... | Additional arguments to lower level functions |
Value
estimate object
Author(s)
Andreas Nordland, Klaus K. Holst
Responder Average Treatment Effect
Description
Estimation of the Average Treatment Effect among Responders for SurvivalOutcomes
Usage
RATE.surv( response, post.treatment, treatment, censoring, tau, data, M = 5, pr.treatment, call.response, args.response = list(), SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")), call.censoring, args.censoring = list(), preprocess = NULL, ...)Arguments
response | Response formula (e.g., Surv(time, event) ~ D + W). |
post.treatment | Post treatment marker formula (e.g., D ~ W) |
treatment | Treatment formula (e.g, A ~ 1) |
censoring | Censoring formula (e.g., Surv(time, event == 0) ~ D + A +W)). |
tau | Time-point of interest, see Details. |
data | data.frame |
M | Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment | (optional) Randomization probability of treatment. |
call.response | Model call for the response model (e.g. "mets::phreg"). |
args.response | Additional arguments to the response model. |
SL.args.post.treatment | Arguments to SuperLearner for the posttreatment indicator |
call.censoring | Similar to call.response. |
args.censoring | Similar to args.response. |
preprocess | (optional) Data preprocessing function |
... | Additional arguments to lower level functions |
Details
Estimation of
\frac{P(T \leq \tau|A=1) - P(T \leq \tau|A=1)}{E[D|A=1]}
under right censoring based on plug-in estimates ofP(T \leq \tau|A=a)andE[D|A=1].
An efficient one-step estimator ofP(T \leq \tau|A=a) is constructedusing the efficient influence function
\frac{I\{A=a\}}{P(A = a)} \Big(\frac{\Delta}{S^c_{0}(\tilde T|X)}I\{\tilde T \leq \tau\} + \int_0^\tau\frac{S_0(u|X)-S_0(\tau|X)}{S_0(u|X)S^c_0(u|X)} d M^c_0(u|X)\Big)
+ \Big(1 - \frac{I\{A=a\}}{P(A = a)}\Big)F_0(\tau|A=a, W)- P(T \leq \tau|A=a).
An efficient one-step estimator ofE[D|A=1] is constructed using theefficient influence function
\frac{A}{P(A = 1)}\left(D-E[D|A=1, W]\right) + E[D|A=1, W] -E[D|A=1].
Value
estimate object
Author(s)
Andreas Nordland, Klaus K. Holst
SuperLearner wrapper for learner
Description
SuperLearner wrapper for learner
Usage
SL( formula = ~., ..., SL.library = c("SL.mean", "SL.glm"), binomial = FALSE, data = NULL, info = "SuperLearner")Arguments
formula | Model design |
... | Additional arguments for SuperLearner::SuperLearner |
SL.library | character vector of prediction algorithms |
binomial | boolean specifying binomial or gaussian family (defaultFALSE) |
data | Optional data.frame |
info | model information (optional) |
Value
learner object
Author(s)
Klaus Kähler Holst
AIPW estimator
Description
AIPW for the mean (and linear projections of the EIF) with missingobservations
Usage
aipw(response_model, propensity_model, formula = ~1, data, ...)Arguments
response_model | Model for the response given covariates (learner orformula) |
propensity_model | Optional missing data mechanism model (propensitymodel) (learner or formula) |
formula | design specifying the OLS estimator with outcome given by theEIF |
data | data.frame |
... | additional arguments (see |
Examples
m <- lava::lvm(y ~ x+z, r ~ x) |> lava::distribution(~ r, value = lava::binomial.lvm()) |> transform(y0~r+y, value = \(x) { x[x[,1]==0,2] <- NA; x[,2] })d <- lava::sim(m,1e3,seed=1)aipw(y0 ~ x, data=d)Assumption Lean inference for generalized linear model parameters
Description
Assumption lean inference via cross-fitting (Double ML). See<doi:10.1111/rssb.12504
Usage
alean( response_model, exposure_model, data, link = "identity", g_model, nfolds = 1, silent = FALSE, mc.cores, ...)Arguments
response_model | formula orlearner object (formula => glm) |
exposure_model | model for the exposure |
data | data.frame |
link | Link function (g) |
g_model | Model for |
nfolds | Number of folds |
silent | supress all messages and progressbars |
mc.cores | mc.cores Optional number of cores.parallel::mcmapply used instead of future |
... | additional arguments to future.apply::future_mapply |
Details
LetY be the response variable,A the exposure andWcovariates. The target parameter is:
\Psi(P) = \frac{E(Cov[A,g\{E(Y|A,W)\}\mid W])} {E\{Var(A\mid W)\}}
Theresponse_model is the model forE(Y|A,W), andexposure_model is the model forE(A|W).link specifiesg.
Value
alean.targeted object
Author(s)
Klaus Kähler Holst
Examples
sim1 <- function(n, family=gaussian(), ...) { m <- lava::lvm() |> lava::distribution(~y, value=lava::binomial.lvm()) |> lava::regression('a', value=function(l) l) |> lava::regression('y', value=function(a,l) a + l) if (family$family=="binomial") lava::distribution(m, ~a) <- lava::binomial.lvm() lava::sim(m, n)}library(splines)f <- binomial()d <- sim1(1e4, family=f)e <- alean( response_model=learner_glm(y ~ a + bs(l, df=3), family=binomial), exposure_model=learner_glm(a ~ bs(l, df=3), family=f), data=d, link = "logit", mc.cores=1, nfolds=1)ee <- alean(response_model=learner_glm(y ~ a + l, family=binomial), exposure_model=learner_glm(a ~ l), data=d, link = "logit", mc.cores=1, nfolds=1)eAIPW (doubly-robust) estimator for Average Treatment Effect
Description
Augmented Inverse Probability Weighting estimator for the Average (Causal)Treatment Effect. All nuisance models are here parametric (glm). For a moregeneral approach see thecate implementation. In this implementationthe standard errors are correct even when the nuisance models aremis-specified (the influence curve is calculated including the term comingfrom the parametric nuisance models). The estimate is consistent if eitherthe propensity model or the outcome model / Q-model is correctly specified.
Usage
ate( formula, data = parent.frame(), weights, offset, family = stats::gaussian(identity), nuisance = NULL, propensity = nuisance, all, labels = NULL, adjust.nuisance = TRUE, adjust.propensity = TRUE, ...)Arguments
formula | formula (see details below) |
data | data.frame |
weights | optional frequency weights |
offset | optional offset (character or vector). can also be specified inthe formula. |
family | Exponential family argument for outcome model |
nuisance | outcome regression formula (Q-model) |
propensity | propensity model formula |
all | when TRUE all standard errors are calculated (default TRUE whenexposure only has two levels) |
labels | optional treatment labels |
adjust.nuisance | adjust for uncertainty due to estimation of outcomeregression model parameters |
adjust.propensity | adjust for uncertainty due to estimation ofpropensity regression model parameters |
... | additional arguments to lower level functions |
Details
The formula may either be specified as: response ~ treatment |nuisance-formula | propensity-formula
For example:ate(y~a | x+z+a | x*z, data=...)
Alternatively, as a list:ate(list(y~a, ~x+z, ~x*z), data=...)
Or using the nuisance (and propensity argument):ate(y~a, nuisance=~x+z, ...)
Value
An object of class 'ate.targeted' is returned. Seetargeted-class for more details about this class and itsgeneric functions.
Author(s)
Klaus K. Holst
See Also
cate
Examples
m <- lava::lvm(y ~ a+x, a~x) |> lava::distribution(~y, value = lava::binomial.lvm()) |> lava::ordinal(K=4, ~a) |> transform(~a, value = factor)d <- lava::sim(m, 1e3, seed=1)# (a <- ate(y~a|a*x|x, data=d))(a <- ate(y~a, nuisance=~a*x, propensity=~x, data = d))# Comparison with randomized experimentm0 <- lava::cancel(m, a~x)lm(y~a-1, lava::sim(m0,2e4))# Choosing a different contrast for the association measuressummary(a, contrast=c(2,4))Calibration (training)
Description
Calibration for multiclassication methods
Usage
calibration( pr, cl, weights = NULL, threshold = 10, method = "bin", breaks = nclass.Sturges, df = 3, ...)Arguments
pr | matrix with probabilities for each class |
cl | class variable |
weights | counts |
threshold | do not calibrate if less then 'threshold' events |
method | either 'isotonic' (pava), 'logistic','mspline' (monotone spline), 'bin' (local constant) |
breaks | optional number of bins (only for method 'bin') |
df | degrees of freedom (only for spline methods) |
... | additional arguments to lower level functions |
Details
...
Value
An object of class 'calibration' is returned.Seecalibration-classfor more details about this class and its generic functions.
Author(s)
Klaus K. Holst
Examples
sim1 <- function(n, beta=c(-3, rep(.5,10)), rho=.5) { p <- length(beta)-1 xx <- lava::rmvn0(n,sigma=diag(nrow=p)*(1-rho)+rho) y <- rbinom(n, 1, lava::expit(cbind(1,xx)%*%beta)) d <- data.frame(y=y, xx) names(d) <- c("y",paste0("x",1:p)) return(d)}set.seed(1)beta <- c(-2,rep(1,10))d <- sim1(1e4, beta=beta)a1 <- naivebayes(y ~ ., data=d)a2 <- glm(y ~ ., data=d, family=binomial)## a3 <- randomForest(factor(y) ~ ., data=d, family=binomial)d0 <- sim1(1e4, beta=beta)p1 <- predict(a1, newdata=d0)p2 <- predict(a2, newdata=d0, type="response")## p3 <- predict(a3, newdata=d0, type="prob")c2 <- calibration(p2, d0$y, method="isotonic")c1 <- calibration(p1, d0$y, breaks=100)if (interactive()) { plot(c1) plot(c2,col="red",add=TRUE) abline(a=0,b=1) with(c1$xy[[1]], points(pred,freq,type="b", col="red"))}set.seed(1)beta <- c(-2,rep(1,10))dd <- lava::csplit(sim1(1e4, beta=beta), k=3)mod <- naivebayes(y ~ ., data=dd[[1]])p1 <- predict(mod, newdata=dd[[2]])cal <- calibration(p1, dd[[2]]$y)p2 <- predict(mod, newdata=dd[[3]])pp <- predict(c1, p2)cc <- calibration(pp, dd[[3]]$y)if (interactive()) {#' plot(cal) plot(cc, add=TRUE, col="blue")}calibration class object
Description
The functionscalibration returns an objectof the classcalibration.
An object of class 'calibration' is a list with at least thefollowing components:
- stepfun
estimated step-functions(see
stepfun) for each class- classes
the unique classes
- model
model/method type (string)
- xy
list of data.frame's with predictions (pr) and estimatedprobabilities of success (only for 'bin' method)
Value
objects of the S3 class 'calibration'
S3 generics
The following S3 generic functions are available for an objectof classtargeted:
predictApply calibration to new data.
plotPlot the calibration curves (reliability plot).
printBasic print method.
See Also
Examples
## See example(calibration) for examplesConditional Average Treatment Effect estimation
Description
Conditional Average Treatment Effect estimation with cross-fitting.
Usage
cate( response.model, propensity.model, cate.model = ~1, calibration.model = NULL, data, contrast, nfolds = 1, rep = 1, silent = FALSE, stratify = FALSE, mc.cores = NULL, rep.type = c("nuisance", "average"), var.type = "IC", second.order = TRUE, response_model = deprecated, cate_model = deprecated, propensity_model = deprecated, treatment = deprecated, ...)Arguments
response.model | formula or learner object (formula => learner_glm) |
propensity.model | formula or learner object (formula => learner_glm) |
cate.model | formula specifying regression design for conditionalaverage treatment effects |
calibration.model | linear calibration model. Specify covariates inaddition to predicted potential outcomes to include in the calibration. |
data | data.frame |
contrast | treatment contrast (default 1 vs 0) |
nfolds | number of folds |
rep | number of replications of cross-fitting procedure |
silent | suppress all messages and progressbars |
stratify | if TRUE the response.model will be stratified by treatment |
mc.cores | (optional) number of cores. parallel::mcmapply used insteadof future |
rep.type | repeated cross-fitting applied by averaging nuisance models( |
var.type | when equal to "IC" the asymptotic variance is derived fromthe influence function. Otherwise, based on expressions in Bannick et al.(2025) valid under different covariate-adaptive randomization schemes (onlyavailable for ATE and when |
second.order | add seconder order term to IF to handle misspecificationof outcome models |
response_model | Deprecated. Use response.model instead. |
cate_model | Deprecated. Use cate.model instead. |
propensity_model | Deprecated. Use propensity.model instead. |
treatment | Deprecated. Use cate.model instead. |
... | additional arguments to future.apply::future_mapply |
Details
We have observed data(Y,A,W) whereY is the response variable,A the binary treatment, andW covariates. We further letVbe a subset of the covariates. Define the conditional potential mean outcome
\psi_{a}(P)(V) = E_{P}[E_{P}(Y\mid A=a, W)|V]
and letm(V;\beta) denote a parametric working model, then the target parameter is themean-squared error
\beta(P) = \operatorname{argmin}_{\beta}E_{P}[\{\Psi_{1}(P)(V)-\Psi_{0}(P)(V)\} - m(V; \beta)]^{2}
Value
cate.targeted object
Author(s)
Klaus Kähler Holst, Andreas Nordland
References
Mark J. van der Laan (2006) Statistical Inference for VariableImportance, The International Journal of Biostatistics.
Examples
sim1 <- function(n=1000, ...) { w1 <- rnorm(n) w2 <- rnorm(n) a <- rbinom(n, 1, plogis(-1 + w1)) y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n) data.frame(y, a, w1, w2)}d <- sim1(5000)## ATEcate(cate.model=~1, response.model=y~a*(w1+w2), propensity.model=a~w1+w2, data=d)## CATEcate(cate.model=~1+w2, response.model=y~a*(w1+w2), propensity.model=a~w1+w2, data=d)## Not run: ## superlearner examplemod1 <- list( glm = learner_glm(y~w1+w2), gam = learner_gam(y~s(w1) + s(w2)))s1 <- learner_sl(mod1, nfolds=5)cate(cate.model=~1, response.model=s1, propensity.model=learner_glm(a~w1+w2, family=binomial), data=d, stratify=TRUE)## End(Not run)Conditional Relative Risk estimation
Description
Conditional average treatment effect estimation via Double Machine Learning
Usage
cate_link( treatment, link = "identity", response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ...)Arguments
treatment | formula specifying treatment and variables to condition on |
link | Link function |
response_model | SL object |
propensity_model | SL object |
importance_model | SL object |
contrast | treatment contrast (default 1 vs 0) |
data | data.frame |
nfolds | Number of folds |
type | 'dml1' or 'dml2' |
... | additional arguments to SuperLearner |
Value
cate.targeted object
Author(s)
Klaus Kähler Holst & Andreas Nordland
Examples
# Example 1:sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){suppressPackageStartupMessages(require("lava"))if (!is.null(seed)) set.seed(seed)m <- lava::lvm()distribution(m, ~x) <- gaussian.lvm()distribution(m, ~v) <- gaussian.lvm(mean = 10)distribution(m, ~a) <- binomial.lvm("logit")regression(m, "a") <- function(v, x){.1*v + x}distribution(m, "y") <- gaussian.lvm()regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v}if (return_model) return(m)lava::sim(m, n = n)}if (require("SuperLearner",quietly=TRUE)) { d <- sim1(n = 1e3, seed = 1) e <- cate_link(data=d, type = "dml2", treatment = a ~ v, response_model = y~ a*(x + v + I(v^2)), importance_model = SL(D_ ~ v + I(v^2)), nfolds = 10) summary(e) # the true parameters are c(1,1)}Construct a learner
Description
Construct a learner
Arguments
info | (character) Optional information to describe the instantiatedlearner object. |
formula | (formula) Formula specifying response and design matrix. |
learner.args | (list) Additional arguments tolearner$new(). |
Value
learner object.
cross_validated class object
Description
The functionscv returns an object of the typecross_validated.
An object of class 'cross_validated' is a list with at least thefollowing components:
- cv
An array with the model score(s) evaluated for each fold,repetition, and model estimates(see
estimate.default)- names
Names (character vector) of the models
- rep
number of repetitions of the CV
- folds
Number of folds of the CV
Value
objects of the S3 class 'cross_validated'
S3 generics
The following S3 generic functions are available for an object ofclasscross_validated:
coefExtract average model scores from thecross-validation procedure.
printBasic print method.
summarySummary of the cross-validation procedure.
'
See Also
Examples
# See example(cv) for examplesConditional Relative Risk estimation
Description
Conditional Relative Risk estimation via Double Machine Learning
Usage
crr( treatment, response_model, propensity_model, importance_model, contrast = c(1, 0), data, nfolds = 5, type = "dml1", ...)Arguments
treatment | formula specifying treatment and variables to condition on |
response_model | SL object |
propensity_model | SL object |
importance_model | SL object |
contrast | treatment contrast (default 1 vs 0) |
data | data.frame |
nfolds | Number of folds |
type | 'dml1' or 'dml2' |
... | additional arguments to SuperLearner |
Value
cate.targeted object
Author(s)
Klaus Kähler Holst & Andreas Nordland
Examples
sim1 <- function(n=1e4, seed=NULL, return_model=FALSE, ...){suppressPackageStartupMessages(require("lava"))if (!is.null(seed)) set.seed(seed)m <- lava::lvm()distribution(m, ~x) <- gaussian.lvm()distribution(m, ~v) <- gaussian.lvm(mean = 10)distribution(m, ~a) <- binomial.lvm("logit")regression(m, "a") <- function(v, x){.1*v + x}distribution(m, "y") <- gaussian.lvm()regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v}if (return_model) return(m)lava::sim(m, n = n)}d <- sim1(n = 2e3, seed = 1)if (require("SuperLearner",quietly=TRUE)) { e <- crr(data=d, type = "dml2", treatment = a ~ v, response_model = learner_glm(y~ a*(x + v + I(v^2))), importance_model = learner_glm(D_ ~ v + I(v^2)), propensity_model = learner_glm(a ~ x + v + I(v^2), family=binomial), nfolds = 2) summary(e) # the true parameters are c(1,1)}Predict the cumulative hazard/survival function for a survival model
Description
Predict the cumulative hazard/survival function for a survival model
Usage
cumhaz( object, newdata, times = NULL, individual.time = FALSE, extend = FALSE, ...)Arguments
object | Survival model object: phreg, coxph, rfsrc, ranger |
newdata | data.frame |
times | numeric vector: Time points at which the survival model isevaluated. If NULL, the time points associated with the survival model isused. |
individual.time | logical: If TRUE the survival object is evaluated atdifferent time points for each row in newdata. The number of rows innewdata and the length of times must be the same. |
extend | if TRUE, prints information for all specified 'times’, even ifthere are no subjects left at the end of the specified ‘times’ (seesurvival::summary.survfit). |
... | Additional arguments. |
Value
List with elements:
time: numeric vector
chf:cumulative hazard function. If individual.time = FALSE, matrix withdimension (nrow(newdata), length(times)). If individual.time = TRUE, vectorof length length(times).
surv: survival function, exp(-chf).
dchf: t(diff(rbind(0, t(chf))))
Author(s)
Klaus K. Holst, Andreas Nordland
Cross-validation
Description
Generic cross-validation function
Usage
## Default S3 method:cv( object, data, response = NULL, nfolds = 5, rep = 1, weights = NULL, model.score = scoring, seed = NULL, shared = NULL, args.pred = NULL, args.future = list(), mc.cores, silent = FALSE, ...)Arguments
object | List oflearner objects |
data | data.frame or matrix |
response | Response variable (vector or name of column in |
nfolds | Number of folds (nfolds=0 simple test/train split into twofolds 1:([n]/2), ([n]+1/2):n with last part used for testing) |
rep | Number of repetitions (default 1) |
weights | Optional frequency weights |
model.score | Model scoring metric (default: MSE / Brier score). Must bea function with arguments response and prediction, and may optionallyinclude weights, object and newdata arguments |
seed | Random seed (argument parsed to future_Apply::future_lapply) |
shared | Function applied to each fold with results send to each model |
args.pred | Optional arguments to prediction function (see detailsbelow) |
args.future | Arguments to future.apply::future_mapply |
mc.cores | Optional number of cores.parallel::mcmapply used insteadof future |
silent | suppress all messages and progressbars |
... | Additional arguments parsed to elements in |
Details
object should be list of objects of classlearner.Alternatively, each element of models should be a list with a fittingfunction and a prediction function.
Theresponse argument can optionally be a named list where the name isthen used as the name of the response argument in models. Similarly, if datais a named list with a single data.frame/matrix then this name will be usedas the name of the data/design matrix argument in models.
Value
An object of class 'cross_validated' is returned. Seecross_validated-class for more details about this class andits generic functions.
Author(s)
Klaus K. Holst
See Also
Examples
m <- list(learner_glm(Sepal.Length~1), learner_glm(Sepal.Length~Species), learner_glm(Sepal.Length~Species + Petal.Length))x <- cv(m, rep=10, data=iris)xCross-validation forlearner_sl
Description
Cross-validation estimation of the generalization error of thesuper learner and each of the separate models in the ensemble. Both thechosen model scoring metrics as well as the model weights of the stackedensemble.
Usage
## S3 method for class 'learner_sl'cv(object, data, nfolds = 5, rep = 1, model.score = scoring, ...)Arguments
object | (learner_sl) Instantiatedlearner_sl object. |
data | data.frame or matrix |
nfolds | Number of folds (nfolds=0 simple test/train split into twofolds 1:([n]/2), ([n]+1/2):n with last part used for testing) |
rep | Number of repetitions (default 1) |
model.score | Model scoring metric (default: MSE / Brier score). Must bea function with arguments response and prediction, and may optionallyinclude weights, object and newdata arguments |
... | Additional arguments parsed to elements in |
Examples
sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2)}sl <- learner_sl(list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1), "glm2" = learner_glm(y ~ x1 + x2) ))cv(sl, data = sim1(), rep = 2)Cast warning for deprecated function argument names
Description
Cast warning for deprecated function argument names
Usage
deprecate_arg_warn(old, new, fun, vers)Arguments
old | deprecated argument name |
new | argument that should be used instead |
fun | function name where arguments are deprecated |
vers | version when argument is deprecated |
Deprecated argument names
Description
Deprecated argument names
Arguments
response_model | Deprecated. Use response.model instead. |
propensity_model | Deprecated. Use propensity.model instead. |
cate_model | Deprecated. Use cate.model instead. |
treatment | Deprecated. Use cate.model instead. |
Extract design matrix
Description
Extract design matrix from data.frame and formula
Usage
design( formula, data, ..., intercept = FALSE, response = TRUE, rm_envir = FALSE, specials = NULL, specials.call = NULL, levels = NULL, design.matrix = TRUE)Arguments
formula | formula |
data | data.frame |
... | additional arguments (e.g, specials such weights, offsets, ...) |
intercept | (logical) If FALSE an intercept is not included in thedesign matrix |
response | (logical) if FALSE the response variable is dropped |
rm_envir | Remove environment |
specials | character vector specifying functions in the formula thatshould be marked as special in theterms object |
specials.call | (call) specials optionally defined as a call-type |
levels | a named list of character vectors giving the full set of levelsto be assumed for each factor |
design.matrix | (logical) if FALSE then only response and specials arereturned. Otherwise, the design.matrix |
Value
An object of class 'design'
Author(s)
Klaus Kähler Holst
Estimation of mean clinical outcome truncated by event process
Description
LetY denote the clinical outcome,A the binary treatmentvariable,X baseline covariates,T the failure time,andepsilon=1,2 the cause of failure.The following are our two target parameters
E(Y|T>t, A=1)- E(Y|T>t, A=0)
P(T<t,\epsilon=1|A=1)- P(T<t,\epsilon=1|A=0)
Usage
estimate_truncatedscore( data, mod.y, mod.r, mod.a, mod.event, time, cause = NULL, cens.code = 0, naive = FALSE, control = list(), ...)Arguments
data | (data.frame) |
mod.y | (formula or learner) Model for clinical outcome given T>time.Using a formula specifies a glm with an identity link (see example). |
mod.r | (formula or learner) Model for missing data mechanism forclinical outcome at T=time. Using a formula specifies a glm with a loglink. |
mod.a | (formula or learner) Treatment model (in RCT should just be 'a~ 1'). Using a formula specifies a glm with a log link. |
mod.event | (formula) Model for time-to-event process('Event(time,status) ~ x'). |
time | (numeric) Landmark time. |
cause | (integer) Primary event (in the 'status' variable of the 'Event'statement). |
cens.code | (integer) Censoring code. |
naive | (logical) If TRUE, the unadjusted estimates ignoring baselinecovariates is returned as the attribute 'naive'. |
control | (list) optimization routine parameters. |
... | Additional arguments passed tomets::binregATE. |
Value
lava::estimate.default object
Author(s)
Klaus Kähler Holst
Examples
data(truncatedscore)mod1 <- learner_glm(y ~ a * (x1 + x2))mod2 <- learner_glm(r ~ a * (x1 + x2), family = binomial)a <- estimate_truncatedscore( data = truncatedscore, mod.y = mod1, mod.r = mod2, mod.a = a ~ 1, mod.event = mets::Event(time, status) ~ x1+x2, time = 2)s <- summary(a, noninf.t = -0.1)print(s)parameter(s)# the above is equivalent to# a <- estimate_truncatedscore(# data = truncatedscore,# mod.y = y ~ a * (x1 + x2),# mod.r = r ~ a * (x1 + x2),# mod.a = a ~ 1,# mod.event = mets::Event(time, status) ~ x1+x2,# time = 2# )Create a list from all combination of input variables
Description
Similar toexpand.grid function, this function creates all combinationsof the input arguments but returns the result as a list.
Usage
expand.list(..., INPUT = NULL, envir = NULL)Arguments
... | input variables |
INPUT | optional list of variables |
envir | environment environment to evalute formulas in |
Value
list
Author(s)
Klaus Kähler Holst
Examples
expand.list(x = 2:4, z = c("a", "b"))Integral approximation of a time dependent function.Computes an approximation of\int_start^stop S(t) dt, whereS(t) is a survival function, for a selection of start and stop timepoints.
Description
Integral approximation of a time dependent function.Computes an approximation of\int_start^stop S(t) dt, whereS(t) is a survival function, for a selection of start and stop timepoints.
Usage
int_surv(times, surv, start = 0, stop = max(times), extend = FALSE)Arguments
times | Numeric vector, sorted time points. |
surv | Numeric vector, values of a survival function evaluated at timepoints given by |
start | Numeric vector, start of the integral. |
stop | Numeric vector, end of the integral. |
extend | (logical) If TRUE, integral is extended beyond the lastobserved time point |
Value
Numeric vector, value of the integral.
Author(s)
Andreas Nordland
R6 class for prediction models
Description
Interface for statistical and machine learning models to be usedfor nuisance model estimation in targeted learning.
The following list provides an overview of constructors for many commonlyused models.
Regression and classification:learner_glm,learner_gam,learner_grf,learner_hal,learner_glmnet_cv,learner_svm,learner_xgboost,learner_mars
Regression:learner_isoreg
Classification:learner_naivebayes
Ensemble (super learner):learner_sl
Public fields
infoOptional information/name of the model
Active bindings
clearRemove fitted model from the learner object
fitReturn estimated model object.
formulaReturn model formula. Uselearner$update() toupdate the formula.
Methods
Public methods
Methodnew()
Create a new prediction model object
Usage
learner$new( formula = NULL, estimate, predict = stats::predict, predict.args = NULL, estimate.args = NULL, info = NULL, specials = c(), formula.keep.specials = FALSE, intercept = FALSE)
Arguments
formulaformula specifying outcome and design matrix
estimatefunction for fitting the model. This must be a functionwith response, 'y', and design matrix, 'x'. Alternatively, a functionwith a formula and data argument. See the examples section.
predictprediction function (must be a function of modelobject, 'object', and new design matrix, 'newdata')
predict.argsoptional arguments to prediction function
estimate.argsoptional arguments to estimate function
infooptional description of the model
specialsoptional specials terms (weights, offset,id, subset, ...) passed on todesign
formula.keep.specialsif TRUE then special terms defined by
specialswill be removed from the formula before it is being passed tothe estimate print.function()intercept(logical) include intercept in design matrix
Methodestimate()
Estimation method
Usage
learner$estimate(data, ..., store = TRUE)
Arguments
datadata.frame
...Additional arguments to estimation method
storeLogical determining if estimated model should bestored inside the class.
Methodpredict()
Prediction method
Usage
learner$predict(newdata, ..., object = NULL)
Arguments
newdatadata.frame
...Additional arguments to prediction method
objectOptional model fit object
Methodupdate()
Update formula
Usage
learner$update(formula)
Arguments
formulaformula or character which defines the new response
Methodprint()
Print method
Usage
learner$print()
Methodsummary()
Summary method to provide more extensive information thanlearner$print().
Usage
learner$summary()
Returns
summarized_learner object, which is a list with the followingelements:
- info
description of the learner
- formula
formula specifying outcome and design matrix
- estimate
function for fitting the model
- estimate.args
arguments to estimate function
- predict
function for making predictions from fitted model
- predict.args
arguments to predict function
- specials
provided special terms
- intercept
include intercept in design matrix
Examples
lr <- learner_glm(y ~ x, family = "nb")lr$summary()lr_sum <- lr$summary() # store returned summary in new objectnames(lr_sum)print(lr_sum)
Methodresponse()
Extract response from data
Usage
learner$response(data, eval = TRUE, ...)
Arguments
datadata.frame
evalwhen FALSE return the untransformed outcome(i.e., return 'a' if formula defined as I(a==1) ~ ...)
...additional arguments todesign
Methoddesign()
Generatedesign object (design matrix and response) from data
Usage
learner$design(data, ...)
Arguments
datadata.frame
...additional arguments todesign
Methodopt()
Get options
Usage
learner$opt(arg)
Arguments
argname of option to get value of
Methodclone()
The objects of this class are cloneable with this method.
Usage
learner$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Author(s)
Klaus Kähler Holst, Benedikt Sommer
Examples
data(iris)rf <- function(formula, ...) { learner$new(formula, info = "grf::probability_forest", estimate = function(x, y, ...) { grf::probability_forest(X = x, Y = y, ...) }, predict = function(object, newdata) { predict(object, newdata)$predictions }, estimate.args = list(...) )}args <- expand.list( num.trees = c(100, 200), mtry = 1:3, formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width))models <- lapply(args, function(par) do.call(rf, par))x <- models[[1]]$clone()x$estimate(iris)predict(x, newdata = head(iris))# Reduce Ex. timinga <- targeted::cv(models, data = iris)cbind(coef(a), attr(args, "table"))# defining learner via function with arguments y (response)# and x (design matrix)f1 <- learner$new( estimate = function(y, x) lm.fit(x = x, y = y), predict = function(object, newdata) newdata %*% object$coefficients)# defining the learner via arguments formula and dataf2 <- learner$new( estimate = function(formula, data, ...) glm(formula, data, ...))# generic learner defined from function (predict method derived per default# from stats::predictf3 <- learner$new( estimate = function(dt, ...) { lm(y ~ x, data = dt) })## ------------------------------------------------## Method `learner$summary`## ------------------------------------------------lr <- learner_glm(y ~ x, family = "nb")lr$summary()lr_sum <- lr$summary() # store returned summary in new objectnames(lr_sum)print(lr_sum)Construct learners from a grid of parameters
Description
Construct learners from a grid of parameters
Usage
learner_expand_grid(fun, args, names = TRUE, params = FALSE)Arguments
fun | (function) A function that returns alearner. |
args | (list) Parameters that generate a grid of parameters withexpand.list, where the set of parameters are then passed on to |
names | (logical or character) If FALSE, then return a list withoutnames. If TRUE, a named list is returned (see details). |
params | (logical) If FALSE, then no information about the parametersdefined by |
Value
list
Examples
lrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)))lrs # use info of constructed learner as nameslrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = "xgboost")names(lrs) # use xgboost instead of info field for nameslrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = "xgboost", params = TRUE)names(lrs) # also add parameters to nameslrs <- learner_expand_grid( learner_xgboost, list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)), names = FALSE)names(lrs) # unnamed list since names = FALSEConstruct a learner
Description
Constructslearner class object for fitting generalizedadditive models withmgcv::gam.
Usage
learner_gam( formula, info = "mgcv::gam", family = gaussian(), select = FALSE, gamma = 1, learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
family | This is a family object specifying the distribution and link to use infitting etc (see |
select | If this is |
gamma | Increase this beyond 1 to produce smoother models. |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments tomgcv::gam. |
Value
learner object.
Examples
n <- 5e2x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)d0 <- data.frame(y, x1, x2)lr <- learner_gam(y ~ s(x1) + x2)lr$estimate(d0)if (interactive()) { plot(lr$fit)}Construct a learner
Description
Constructs alearner class object for fitting generalizedlinear models withstats::glm andMASS::glm.nb. Negative binomialregression is supported withfamily = "nb" (or alternativelyfamily = "negbin").
Usage
learner_glm( formula, info = "glm", family = gaussian(), learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
family | a description of the error distribution and linkfunction to be used in the model. For |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments tostats::glm orMASS::glm.nb. |
Value
learner object.
Examples
n <- 5e2x <- rnorm(n)w <- 50 + rexp(n, rate = 1 / 5)y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2))d0 <- data.frame(y, x, w)lr <- learner_glm(y ~ x) # linear Gaussian modellr$estimate(d0)coef(lr$fit)# negative binomial regression model with offset (using MASS::glm.nb)lr <- learner_glm(y ~ x + offset(log(w)), family = "nb")lr$estimate(d0)coef(lr$fit)lr$predict(data.frame(x = 1, w = c(1, 5))) # response scalelr$predict(data.frame(x = 1, w = c(1, 5)), type = "link") # link scaleConstruct a learner
Description
Constructs alearner class object for fitting entire lasso orelastic-net regularization paths for various linear and non-linear regressionmodels withglmnet::cv.glmnet. Predictions are returned for the value oflambda that gives minimumcvm. That is,glmnet::predict.cv.glmnet iscalled withs = "lambda.min".
Usage
learner_glmnet_cv( formula, info = "glmnet::cv.glmnet", family = gaussian(), lambda = NULL, alpha = 1, nfolds = 10, learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
family | Either a character string representingone of the built-in families, or else a |
lambda | Optional user-supplied lambda sequence; default is |
alpha | The elasticnet mixing parameter, with
|
nfolds | number of folds - default is 10. Although |
learner.args | (list) Additional arguments tolearner$new(). |
... | Other arguments that can be passed to glmnet, for example |
Value
learner object.
Examples
# continuous outcomen <- 5e2x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)lp <- x1 + x2*x1 + cos(x1)y <- rnorm(n, lp, sd = 2)d0 <- data.frame(y, x1, x2)lr <- learner_glmnet_cv(y ~ x1 + x2)lr$estimate(d0, nfolds = 3)lr$predict(data.frame(x1 = c(0, 1), x2 = 1))# count outcome with different exposure timew <- 50 + rexp(n, rate = 1 / 5)y <- rpois(n, exp(0.5 * x1 - 1 * x2 + log(w)) * rgamma(n, 1 / 2, 1 / 2))d0 <- data.frame(y, x1, x2, w)lr <- learner_glmnet_cv(y ~ x1 + x2 + offset(log(w)), family = "poisson")lr$estimate(d0, nfolds = 3)lr$predict(data.frame(x1 = 1, x2 = 1, w = c(1, 5)))Construct a learner
Description
Constructs alearner class object for fitting generalizedrandom forest models withgrf::regression_forest orgrf::probability_forest. As shown in the examples, the constructed learnerreturns predicted class probabilities of class 2 in case of binaryclassification. An times p matrix, withn being the number ofobservations andp the number of classes, is returned for multi-classclassification.
Usage
learner_grf( formula, num.trees = 2000, min.node.size = 5, alpha = 0.05, sample.fraction = 0.5, num.threads = 1, model = "grf::regression_forest", info = model, learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
num.trees | Number of trees grown in the forest. Note: Getting accurateconfidence intervals generally requires more trees thangetting accurate predictions. Default is 2000. |
min.node.size | A target for the minimum number of observations in each tree leaf. Note that nodeswith size smaller than min.node.size can occur, as in the original randomForest package.Default is 5. |
alpha | A tuning parameter that controls the maximum imbalance of a split. Default is 0.05. |
sample.fraction | Fraction of the data used to build each tree.Note: If honesty = TRUE, these subsamples willfurther be cut by a factor of honesty.fraction. Default is 0.5. |
num.threads | Number of threads used in training. By default, the number of threads is setto the maximum hardware concurrency. |
model | (character) grf model to estimate. Usually regression_forest(grf::regression_forest) or probability_forest (grf::probability_forest). |
info | (character) Optional information to describe the instantiatedlearner object. |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments to |
Value
learner object.
Examples
n <- 5e2x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)lp <- x2*x1 + cos(x1)yb <- rbinom(n, 1, lava::expit(lp))y <- lp + rnorm(n, sd = 0.5**.5)d <- data.frame(y, yb, x1, x2)# regressionlr <- learner_grf(y ~ x1 + x2)lr$estimate(d)lr$predict(head(d))# binary classificationlr <- learner_grf(as.factor(yb) ~ x1 + x2, model = "probability_forest")lr$estimate(d)lr$predict(head(d)) # predict class probabilities of class 2# multi-class classificationlr <- learner_grf(Species ~ ., model = "probability_forest")lr$estimate(iris)lr$predict(head(iris))Construct a learner
Description
Constructs alearner class object for fitting a highlyadaptive lasso model withhal9001::fit_hal.
Usage
learner_hal( formula, info = "hal9001::fit_hal", smoothness_orders = 0, reduce_basis = NULL, family = "gaussian", learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
smoothness_orders | An |
reduce_basis | Am optional |
family | A |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments tohal9001::fit_hal. |
Value
learner object.
Examples
## Not run: n <- 5e2x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)d <- data.frame(y, x1, x2)lr <- learner_hal(y ~ x1 + x2, smoothness_orders = 0.5, reduce_basis = 1)lr$estimate(d)lr$predict(data.frame(x1 = 0, x2 = c(-1, 1)))## End(Not run)Construct a learner
Description
Constructs alearner class object for isotonic regression withisoregw.
Usage
learner_isoreg(formula, info = "targeted::isoregw", learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments toisoregw. |
Value
learner object.
Examples
x <- runif(5e3, -5, 5)pr <- lava::expit(-1 + x)y <- rbinom(length(pr), 1, pr)d <- data.frame(y, x)lr <- learner_isoreg(y ~ x)lr$estimate(d)pr_iso <- lr$predict(d)if (interactive()) { plot(pr ~ x, cex=0.3) lines(sort(x), pr_iso[order(x)], col="red", type="s")}Construct a learner
Description
Constructs alearner class object for fitting multivariateadaptive regression splines withearth::earth.
Usage
learner_mars( formula, info = "earth::earth", degree = 1, nprune = NULL, glm = NULL, learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
degree | Maximum degree of interaction (Friedman's |
nprune | Maximum number of terms (including intercept) in the pruned model.Default is NULL, meaning all terms created by the forward pass(but typically not all terms will remain after pruning).Use this to enforce an upper bound on the model size (that is less than |
glm | NULL (default) or a list of arguments to pass on to |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments toearth::earth. |
Value
learner object.
Examples
# poisson regressionn <- 5e2x <- rnorm(n)w <- 50 + rexp(n, rate = 1 / 5)y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2))d0 <- data.frame(y, x, w)lr <- learner_mars(y ~ x + offset(log(w)), degree = 2, glm = list(family = poisson()))lr$estimate(d0)lr$predict(data.frame(x = 0, w = c(1, 2)))Construct a learner
Description
Constructs alearner class object for fitting a naive bayesclassifier withnaivebayes. As shown in the examples, the constructedlearner returns predicted class probabilities of class 2 in case of binaryclassification. An times p matrix, withn being the number ofobservations andp the number of classes, is returned for multi-classclassification.
Usage
learner_naivebayes( formula, info = "Naive Bayes", laplace.smooth = 0, kernel = FALSE, learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
laplace.smooth | Laplace smoothing |
kernel | If TRUE a kernel estimator is used for numeric predictors(otherwise a gaussian model is used) |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments tonaivebayes. |
Value
learner object.
Examples
n <- 5e2x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)y <- rbinom(n, 1, lava::expit(x2*x1 + cos(x1)))d <- data.frame(y, x1, x2)# binary classificationlr <- learner_naivebayes(y ~ x1 + x2)lr$estimate(d)lr$predict(head(d))# multi-class classificationlr <- learner_naivebayes(Species ~ .)lr$estimate(iris)lr$predict(head(iris))Construct a learner
Description
Constructs alearner class object for fitting asuperlearner.
Usage
learner_sl( learners, info = NULL, nfolds = 5L, meta.learner = metalearner_nnls, model.score = mse, learner.args = NULL, ...)Arguments
learners | (list) List oflearner objects (i.e.learner_glm) |
info | (character) Optional information to describe the instantiatedlearner object. |
nfolds | (integer) Number of folds to use in cross-validation toestimate the ensemble weights. |
meta.learner | (function) Algorithm to learn the ensemble weights(default non-negative least squares). Must be a function of the response(nx1 vector), |
model.score | (function) Model scoring method (seelearner) |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments tosuperlearner |
Value
learner object.
See Also
Examples
sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2)}d <- sim1()m <- list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1 + x2), "iso" = learner_isoreg(y ~ x1))s <- learner_sl(m, nfolds = 10)s$estimate(d)pr <- s$predict(d)if (interactive()) { plot(y ~ x1, data = d) points(d$x1, pr, col = 2, cex = 0.5) lines(cos(x1) + x1 ~ x1, data = d[order(d$x1), ], lwd = 4, col = lava::Col("darkblue", 0.3))}print(s)# weights(s$fit)# score(s$fit)cvres <- cv(s, data = d, nfolds = 3, rep = 2)cvres# coef(cvres)# score(cvres)Construct stratified learner
Description
This function creates a stratified learner from an existinglearner wrapper function such aslearner_glm orlearner_xgboost. Thestratification variable can be specified either using thestratifyargument (which can be given as a string "a" or a formula , for example ~I(a==0)), or it can be defined as a special term directly in the formula, y~ ... + stratify(a). The formula will subsequently be passed to thelearner_ wrapper without the stratify special term.
Usage
learner_stratify( formula, learner, stratify = NULL, info = NULL, learner.args = list(), ...)Arguments
formula | formula specifying outcome and design matrix |
learner | (learner)learner object |
stratify | (character,formula) variables to stratify by |
info | optional description of the model |
learner.args | (list) optional arguments to the learner constructor |
... | additional arguments passed to the learner constructor |
Value
learner object
Examples
simdata <- function(n=1000) { a <- rbinom(n, 1, 0.5) x <- rnorm(n) y <- rbinom(n, 1, plogis(-1 + a + a * x)) data.frame(y, a, x)}d <- simdata()lr <- learner_stratify( y ~ x + stratify(a), learner_glm, family=binomial())lr$estimate(d)lr$predict(head(d))Construct a learner
Description
Constructs alearner class object for fitting support vectormachines withe1071::svm. As shown in the examples, the constructed learnerreturns predicted class probabilities of class 2 in case of binaryclassification. An times p matrix, withn being the number ofobservations andp the number of classes, is returned for multi-classclassification.
Usage
learner_svm( formula, info = "e1071::svm", cost = 1, epsilon = 0.1, kernel = "radial", learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
info | (character) Optional information to describe the instantiatedlearner object. |
cost | cost of constraints violation (default: 1)—it is the‘C’-constant of the regularization term in the Lagrange formulation. |
epsilon | epsilon in the insensitive-loss function (default: 0.1) |
kernel | the kernel used in training and predicting. Youmight consider changing some of the following parameters, dependingon the kernel type.
|
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments toe1071::svm. |
Value
learner object.
Examples
n <- 5e2x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)lp <- x2*x1 + cos(x1)yb <- rbinom(n, 1, lava::expit(lp))y <- lp + rnorm(n, sd = 0.5**.5)d <- data.frame(y, yb, x1, x2)# regressionlr <- learner_svm(y ~ x1 + x2)lr$estimate(d)lr$predict(head(d))# binary classificationlr <- learner_svm(as.factor(yb) ~ x1 + x2)# alternative to transforming response variable to factor# lr <- learner_svm(yb ~ x1 + x2, type = "C-classification")lr$estimate(d)lr$predict(head(d)) # predict class probabilities of class 2lr$predict(head(d), probability = FALSE) # predict labels# multi-class classificationlr <- learner_svm(Species ~ .)lr$estimate(iris)lr$predict(head(iris))Construct a learner
Description
Constructs alearner class object forxgboost::xgboost.
Usage
learner_xgboost( formula, max_depth = 2L, learning_rate = 1, nrounds = 2L, subsample = 1, reg_lambda = 1, objective = "reg:squarederror", info = paste("xgboost", objective), learner.args = NULL, ...)Arguments
formula | (formula) Formula specifying response and design matrix. |
max_depth | (integer) Maximum depth of a tree. |
learning_rate | (numeric) Learning rate. |
nrounds | Number of boosting iterations / rounds. Note that the number of default boosting rounds here is not automatically tuned, and differentproblems will have vastly different optimal numbers of boosting rounds. |
subsample | (numeric) Subsample ratio of the training instance. |
reg_lambda | (numeric) L2 regularization term on weights. |
objective | (character) Specify the learning task and the correspondinglearning objective. Seexgboost::xgboost for all available options. |
info | (character) Optional information to describe the instantiatedlearner object. |
learner.args | (list) Additional arguments tolearner$new(). |
... | Additional arguments toxgboost::xgboost. |
Value
learner object.
Examples
n <- 1e3x1 <- rnorm(n, sd = 2)x2 <- rnorm(n)lp <- x2*x1 + cos(x1)yb <- rbinom(n, 1, lava::expit(lp))y <- lp + rnorm(n, sd = 0.5**.5)d0 <- data.frame(y, yb, x1, x2)# regressionlr <- learner_xgboost(y ~ x1 + x2, nrounds = 5)lr$estimate(d0)lr$predict(head(d0))# binary classificationlr <- learner_xgboost(yb ~ x1 + x2, nrounds = 5, objective = "binary:logistic")lr$estimate(d0)lr$predict(head(d0))# multi-class classificationd0 <- irisd0$y <- as.numeric(d0$Species)- 1lr <- learner_xgboost(y ~ ., objective = "multi:softprob", num_class = 3)lr$estimate(d0)lr$predict(head(d0))R6 class for prediction models
Description
Replaced bylearner
Super class
targeted::learner ->ml_model
Methods
Public methods
Inherited methods
Methodnew()
Create a new prediction model object
Usage
ml_model$new(...)
Arguments
...deprecated
Methodclone()
The objects of this class are cloneable with this method.
Usage
ml_model$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Naive Bayes classifier
Description
Naive Bayes Classifier
Usage
naivebayes( formula, data, weights = NULL, kernel = FALSE, laplace.smooth = 0, prior = NULL, ...)Arguments
formula | Formula with syntax: response ~ predictors | weights |
data | data.frame |
weights | optional frequency weights |
kernel | If TRUE a kernel estimator is used for numeric predictors(otherwise a gaussian model is used) |
laplace.smooth | Laplace smoothing |
prior | optional prior probabilities (default estimated from data) |
... | additional arguments to lower level functions |
Value
An object of class 'naivebayes' is returned. Seenaivebayes-class for more details about this class andits generic functions.
Author(s)
Klaus K. Holst
Examples
library(data.table)data(iris)m <- naivebayes(Species ~ Sepal.Width + Petal.Length, data = iris)pr <- predict(m, newdata = iris)# using weights to reduce the size of the datasetn <- 5e2x <- rnorm(n, sd = 2) > 0y <- rbinom(n, 1, lava::expit(x))# full data setd1 <- data.frame(y, x = as.factor(x > 0))m1 <- naivebayes(y ~ x, data = d1)# reduced data setd2 <- data.table(d1)[, .(.N), by = .(y, x)]m2 <- naivebayes(y ~ x, data = d2, weights = d2$N)all(predict(m1, d1) == predict(m2, d1))naivebayes class object
Description
The functionsnaivebayes returns an object of the typenaivebayes.
An object of class 'naivebayes' is a list with at leastthe following components:
- prior
Matrix with prior probabilities,i.e. marginal class probabilities Pr(class)
- pcond
list of matrices with conditional probabilitiesof the features given the classes (one list element per class), Pr(x|class)
- classes
Names (character vector) of the classes
- xvar
Names of predictors
- xmodel
Conditional model for each predictor
- design
Model design object
- call
The function call which instantiated the object
Value
objects of the S3 class 'naivebayes'
S3 generics
The following S3 generic functions are available for an objectof classnaivebayes:
predictPredict class probabilities for new features data.
printBasic print method.
See Also
Examples
## See example(naivebayes) for examplesFind non-dominated points of a set
Description
Find the non-dominated point of a set (minima of a point set).
Usage
nondom(x, ...)Arguments
x | matrix |
... | additional arguments to lower level functions |
Details
A point x dominates y if it is never worse and at leastin one case strictly better.Formally, let f_i denote the ith coordinate of thecondition (objective) function,then for all i: f_i(x)<=f_i(y) and there exists j: f_j(x)<f_j(y).
Based on the algorithm of Kung et al. 1975.
Value
matrix
Author(s)
Klaus Kähler Holst
Examples
rbind( c(1.0, 0.5), c(0.0, 1.0), c(1.0, 0.0), c(0.5, 1.0), c(1.0, 1.0), c(0.8, 0.8)) |> nondom()Pooled Adjacent Violators Algorithm
Description
Pooled Adjacent Violators Algorithm
Usage
pava(y, x = numeric(0), weights = numeric(0))Arguments
y | response variable |
x | (optional) predictor vector (otherwise y is assumedto be a priori sorted according to relevant predictor) |
weights | weights (optional) weights |
Value
List with index (idx) of jump points and values (value)at each jump point.
Author(s)
Klaus K. Holst
Examples
x <- runif(5e3, -5, 5)pr <- lava::expit(-1 + x)y <- rbinom(length(pr), 1, pr)pv <- pava(y, x)plot(pr ~ x, cex=0.3)with(pv, lines(sort(x)[index], value, col="red", type="s"))Prediction for kernel density estimates
Description
Kernel density estimator predictions
Usage
## S3 method for class 'density'predict(object, xnew, ...)Arguments
object | density object |
xnew | New data on which to make predictions for |
... | additional arguments to lower level functions |
Author(s)
Klaus K. Holst
Predictions for Naive Bayes Classifier
Description
Naive Bayes Classifier predictions
Usage
## S3 method for class 'naivebayes'predict(object, newdata, expectation = NULL, threshold = c(0.001, 0.001), ...)Arguments
object | density object |
newdata | new data on which to make predictions |
expectation | Variable to calculate conditional expectation wrtprobabilities from naivebayes classifier |
threshold | Threshold parameters. First element defines the thresholdon the probabilities and the second element the value to set thosetruncated probabilities to. |
... | Additional arguments to lower level functions |
Author(s)
Klaus K. Holst
Predict Method for superlearner Fits
Description
Obtains predictions for ensemble model or individual learners.
Usage
## S3 method for class 'superlearner'predict(object, newdata, all.learners = FALSE, ...)Arguments
object | (superlearner) Fittedsuperlearner object. |
newdata | (data.frame) Data in which to look for variables with which topredict. |
all.learners | (logical) If FALSE (default), then return the predictionsfrom the ensemble model. Otherwise, return predictions of from all individuallearners. |
... | Not used. |
Value
numeric (all.learners = FALSE) or matrix (all.learners = TRUE)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
Risk regression
Description
Risk regression with binary exposure and nuisance model for theodds-product.
LetA be the binary exposure,V the set of covariates, andY the binary response variable, and definep_a(v) = P(Y=1 \midA=a, V=v), a\in\{0,1\}.
Thetarget parameter is either therelative risk
\mathrm{RR}(v) = \frac{p_1(v)}{p_0(v)}
or therisk difference
\mathrm{RD}(v) = p_1(v)-p_0(v)
We assume a target parameter model given by either
\log\{RR(v)\} =\alpha^t v
or
\mathrm{arctanh}\{RD(v)\} = \alpha^tv
and similarly a working linearnuisancemodel for theodds-product
\phi(v) =\log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) = \beta^tv
.
Apropensity model forE(A=1|V) is also fitted using a logisticregression working model
\mathrm{logit}\{E(A=1\mid V=v)\} = \gamma^tv.
If both the odds-product model and the propensity model are correct theestimator is efficient. Further, the estimator is consistent in the unionmodel, i.e., the estimator is double-robust in the sense that only one ofthe two models needs to be correctly specified to get a consistent estimate.
Usage
riskreg( formula, nuisance = ~1, propensity = ~1, target = ~1, data, weights, type = "rr", optimal = TRUE, std.err = TRUE, start = NULL, mle = FALSE, ...)Arguments
formula | formula (see details below) |
nuisance | nuisance model (formula) |
propensity | propensity model (formula) |
target | (optional) target model (formula) |
data | data.frame |
weights | optional weights |
type | type of association measure (rd og rr) |
optimal | If TRUE optimal weights are calculated |
std.err | If TRUE standard errors are calculated |
start | optional starting values |
mle | Semi-parametric (double-robust) estimate or MLE (TRUE gives MLE) |
... | additional arguments to unconstrained optimization routine(nlminb) |
Details
The 'formula' argument should be given asresponse ~ exposure | target-formula | nuisance-formula orresponse ~ exposure | target | nuisance | propensity
E.g.,riskreg(y ~ a | 1 | x+z | x+z, data=...)
Alternatively, the model can specifed using the target, nuisance andpropensity arguments:riskreg(y ~ a, target=~1, nuisance=~x+z, ...)
Theriskreg_fit function can be used with matrix inputs rather thanformulas.
Value
An object of class 'riskreg.targeted' is returned. Seetargeted-class for more details about this class and itsgeneric functions.
Author(s)
Klaus K. Holst
References
Richardson, T. S., Robins, J. M., & Wang, L. (2017). On modelingand estimation for the relative risk and risk difference. Journal of theAmerican Statistical Association, 112(519), 1121–1130.http://dx.doi.org/10.1080/01621459.2016.1192546
Examples
m <- lava::lvm(a[-2] ~ x, z ~ 1, lp.target[1] ~ 1, lp.nuisance[-1] ~ 2*x) |> lava::distribution(~a, value=lava::binomial.lvm("logit")) |> lava::binomial.rr("y","a","lp.target","lp.nuisance")d <- sim(m,5e2,seed=1)I <- model.matrix(~1, d)X <- model.matrix(~1+x, d)with(d, riskreg_mle(y, a, I, X, type="rr"))with(d, riskreg_fit(y, a, nuisance=X, propensity=I, type="rr"))riskreg(y ~ a | 1, nuisance=~x , data=d, type="rr")## Model with same design matrix for nuisance and propensity model:with(d, riskreg_fit(y, a, nuisance=X, type="rr"))## a <- riskreg(y ~ a, target=~z, nuisance=~x,## propensity=~x, data=d, type="rr")a <- riskreg(y ~ a | z, nuisance=~x, propensity=~x, data=d, type="rr")apredict(a, d[1:5,])riskreg(y ~ a, nuisance=~x, data=d, type="rr", mle=TRUE)Binary regression models with right censored outcomes
Description
Binary regression models with right censored outcomes
Usage
riskreg_cens( response, censoring, treatment = NULL, prediction = NULL, data, newdata, tau, type = "risk", M = 1, call.response = "phreg", args.response = list(), call.censoring = "phreg", args.censoring = list(), preprocess = NULL, efficient = TRUE, control = list(), ...)Arguments
response | Response formula (e.g., Surv(time, event) ~ D + W). |
censoring | Censoring formula (e.g., Surv(time, event == 0) ~ D + A +W)). |
treatment | Optional treatment model (learner) |
prediction | Optional prediction model (learner) |
data | data.frame. |
newdata | Optional data.frame. In this case the uncentered influencefunction evaluated in 'newdata' is returned with nuisance parametersobtained from 'data'. |
tau | Time-point of interest, see Details. |
type | "risk", "treatment", "brier" |
M | Number of folds in cross-fitting (M=1 is no cross-fitting). |
call.response | Model call for the response model (e.g. "mets::phreg"). |
args.response | Additional arguments to the response model. |
call.censoring | Similar to call.response. |
args.censoring | Similar to args.response. |
preprocess | (optional) Data pre-processing function. |
efficient | If FALSE an IPCW estimator is returned |
control | See details |
... | Additional arguments to lower level data pre-processingfunctions. |
Details
The one-step estimator depends on the calculation of an integralwrt. the martingale process corresponding to the counting process N(t) =I(C>min(T,tau)). This can be decomposed into an integral wrt the countingprocess,dN_c(t) and the compensatord\Lambda_c(t) where thelatter term can be computational intensive to calculate. Rather thancalculating this integral in all observed time points, we can make acoarser evaluation which can be controlled by settingcontrol=(sample=N). WithN=0 the (computational intensive)standard evaluation is used.
Value
estimate object
Author(s)
Klaus K. Holst, Andreas Nordland
Extract average cross-validated score of individual learners
Description
Extract average cross-validated score of individual learners
Usage
## S3 method for class 'superlearner'score(x, ...)Arguments
x | (superlearner) Fitted model. |
... | Not used. |
Predictive model scoring
Description
Predictive model scoring
Usage
scoring( response, ..., type = "quantitative", levels = NULL, metrics = NULL, weights = NULL, names = NULL, object = NULL, newdata = NULL, messages = 1)Arguments
response | Observed response |
... | model predictions (continuous predictions or class probabilities(matrices)) |
type | continuous or categorical response (the latter is automaticallychosen if response is a factor, otherwise a continuous response is assumed) |
levels | (optional) unique levels in response variable |
metrics | which metrics to report |
weights | optional frequency weights |
names | (optional) character vector of the model names in the output. Ifomitted these will be taken from the names of the ellipsis argument (...) |
object | optional model object |
newdata | (optional) data.frame on which to evaluate the modelperformance |
messages | controls amount of messages/warnings (0: none) |
Value
Numeric matrix of dimension m x p, where m is the number ofdifferent models and p is the number of model metrics
Examples
data(iris)set.seed(1)dat <- lava::csplit(iris,2)g1 <- naivebayes(Species ~ Sepal.Width + Petal.Length, data=dat[[1]])g2 <- naivebayes(Species ~ Sepal.Width, data=dat[[1]])pr1 <- predict(g1, newdata=dat[[2]], wide=TRUE)pr2 <- predict(g2, newdata=dat[[2]], wide=TRUE)table(colnames(pr1)[apply(pr1,1,which.max)], dat[[2]]$Species)table(colnames(pr2)[apply(pr2,1,which.max)], dat[[2]]$Species)scoring(dat[[2]]$Species, pr1=pr1, pr2=pr2)## quantitative response:scoring(response=1:10, prediction=rnorm(1:10))Softmax transformation
Description
Softmax transformation
Usage
softmax(x, log = FALSE, ref = TRUE, ...)Arguments
x | Input matrix (e.g., linear predictors of multinomial logistic model) |
log | Return on log-scale (default FALSE) |
ref | Add reference level (add 0 column to x) |
... | Additional arguments to lower level functions |
Value
Numeric matrix of dimension n x p, wheren= nrow(x) andp = ncol(x) + (ref==TRUE)
Solve ODE
Description
Solve ODE with Runge-Kutta method (RK4)
Usage
solve_ode(ode_ptr, input, init, par = 0)Arguments
ode_ptr | pointer (externalptr) to C++ function or an R function |
input | Input matrix. 1st column specifies the time points |
init | Initial conditions |
par | Parameters defining the ODE (parsed to ode_ptr) |
Details
The external point should be created with the functiontargeted::specify_ode.
Value
Matrix with solution
Author(s)
Klaus Kähler Holst
See Also
specify_ode
Examples
example(specify_ode)Specify Ordinary Differential Equation (ODE)
Description
Define compiled code for ordinary differential equation.
Usage
specify_ode(code, fname = NULL, pname = c("dy", "x", "y", "p"))Arguments
code | string with the body of the function definition (see details) |
fname | Optional name of the exported C++ function |
pname | Vector of variable names (results, inputs, states, parameters) |
Details
The model (code) should be specified as the body of of C++ function.The following variables are defined bye default(see the argumentpname)
- dy
Vector with derivatives, i.e. the rhs of the ODE (the result).
- x
Vector with the first element being the time, and the followingelements additional exogenous input variables,
- y
Vector with the dependent variable
- p
Parameter vector
y'(t) = f_{p}(x(t), y(t))All variables are treated as Armadillo (http://arma.sourceforge.net/)vectors/matrices.
As an example consider theLorenz Equations\frac{dx_{t}}{dt} = \sigma(y_{t}-x_{t})\frac{dy_{t}}{dt} = x_{t}(\rho-z_{t})-y_{t}\frac{dz_{t}}{dt} = x_{t}y_{t}-\beta z_{t}
We can specify this model asode <- 'dy(0) = p(0)*(y(1)-y(0)); dy(1) = y(0)*(p(1)-y(2)); dy(2) = y(0)*y(1)-p(2)*y(2);'dy <- specify_ode(ode)
As an example of model with exogenous inputs consider the following ODE:y'(t) = \beta_{0} + \beta_{1}y(t) +\beta_{2}y(t)x(t) + \beta_{3}x(t)\cdot tThis could be specified asmod <- 'double t = x(0); dy = p(0) + p(1)*y + p(2)*x(1)*y + p(3)*x(1)*t;'dy <- specify_ode(mod)
Value
pointer (externalptr) to C++ function
Author(s)
Klaus Kähler Holst
See Also
solve_ode
Examples
ode <- paste0( "dy(0) = p(0)*(y(1)-y(0));", "dy(1) = y(0)*(p(1)-y(2));", "dy(2) = y(0)*y(1)-p(2)*y(2);", collapse="\n") # Reduce test timedy <- specify_ode(ode)tt <- seq(0, 100, length.out=2e4)yy <- solve_ode(dy, input=tt, init=c(1, 1, 1), par=c(10, 28, 8/3))Identify Stratification Variables
Description
This is a special function that identifies stratificationvariables when they appear on the right hand side of a formula.
Usage
stratify(..., na.group = FALSE, shortlabel, sep = ", ")Arguments
... | any number of variables. All must be the same length. |
na.group | a logical variable, if |
shortlabel | if |
sep | the character used to separate groups, in the created label |
Details
When used outside of acoxph formula the result of the functionis essentially identical to theinteraction function, though the labels fromstrata are often more verbose.
Value
a new factor, whose levels are all possible combinations of the factors supplied as arguments.
See Also
survival::strata,learner_stratify,interaction
Examples
a <- factor(rep(1:3, 4), labels=c("low", "medium", "high"))b <- factor(rep(1:4, 3))levels(stratify(b))levels(stratify(a, b, shortlabel=TRUE))Superlearner (stacked/ensemble learner)
Description
This function creates a predictor object (classlearner) froma list of existinglearner objects. When estimating this model a stackedprediction will be created by weighting together the predictions of each ofthe initial learners The weights are learned using cross-validation.
Usage
superlearner( learners, data, nfolds = 10, meta.learner = metalearner_nnls, model.score = mse, mc.cores = NULL, future.seed = TRUE, silent = TRUE, name.prefix = NULL, ...)Arguments
learners | (list) List oflearner objects (i.e.learner_glm) |
data | (data.frame) Data containing the response variable andcovariates. |
nfolds | (integer) Number of folds to use in cross-validation toestimate the ensemble weights. |
meta.learner | (function) Algorithm to learn the ensemble weights(default non-negative least squares). Must be a function of the response(nx1 vector), |
model.score | (function) Model scoring method (seelearner) |
mc.cores | (integer) If not NULL, thenparallel::mcmapply is used with |
future.seed | (logical or integer) Argument passed on tofuture.apply::future_lapply. If TRUE, then.Random.seed is used if itholds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly. |
silent | (logical) Suppress all messages and progressbars |
name.prefix | (character) Prefix used to name learner objects in |
... | Additional arguments toparallel::mclapply orfuture.apply::future_lapply. |
References
Luedtke & van der Laan (2016) Super-Learning of an OptimalDynamic Treatment Rule, The International Journal of Biostatistics.
See Also
predict.superlearnerweights.superlearnerscore.superlearner
Examples
sim1 <- function(n = 5e2) { x1 <- rnorm(n, sd = 2) x2 <- rnorm(n) y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5) data.frame(y, x1, x2)}m <- list( "mean" = learner_glm(y ~ 1), "glm" = learner_glm(y ~ x1 + x2))sl <- superlearner(m, data = sim1(), nfolds = 2)predict(sl, newdata = sim1(n = 5))predict(sl, newdata = sim1(n = 5), all.learners = TRUE)targeted class object
Description
The functionsriskreg andatereturns an object of the typetargeted.
An object of class 'targeted' is a list with at least thefollowing components:
- estimate
An
estimateobject with the target parameterestimates (seeestimate.default)- opt
Object returned from the applied optimization routine
- npar
number of parameters of the model (target and nuisance)
- type
String describing the model
Value
objects of the S3 class 'targeted'
S3 generics
The following S3 generic functions are available for an object ofclasstargeted:
coefExtract target coefficients of the estimated model.
vcovExtract the variance-covariance matrix ofthe target parameters.
ICExtract the estimated influence function.
printPrint estimates of the target parameters.
summaryExtract information on both target parameeters andestimated nuisance model.
'
See Also
Examples
## See example(riskreg) for examplesExtract model component fromdesign object
Description
Extract model component fromdesign object
Usage
## S3 method for class 'design'terms(x, specials, ...)Arguments
x | design object |
specials | extract variables marked as special(e.g., "offset", "weights", ...) |
... | Additional arguments to lower level functions |
Signed Wald intersection test
Description
Calculating test statistics and p-values for the signed Waldintersection test given by
SW = \inf_{\theta \in \cap_{i=1}^n H_i} \{(\widehat{\theta}-\theta)^\top W\widehat{\Sigma}W (\widehat{\theta}-\theta)\}
with individual hypotheses for eachcoordinate of\theta given byH_i: \theta_j < \delta_j for somenon-inferiority margin\delta_j,j=1,\ldots,n. #
Usage
test_intersection_sw( par, vcov, noninf = 0, weights = 1, nsim.null = 10000, index = NULL, control = list(), par.name = "theta")Arguments
par | (numeric) parameter estimates or |
vcov | (matrix) asymptotic variance estimate |
noninf | (numeric) non-inferiority margins |
weights | (numeric) optional weights |
nsim.null | (integer) number of sample used in Monte-Carlo simulation |
index | (integer) subset of parameters to test |
control | (list) arguments to alternating projection algorithm. Seedetails section. |
par.name | (character) parameter names in output |
Details
The constrained least squares problem is solved using Dykstra'salgorithm. The following parameters for the optimization can be controlledvia thecontrol list argument:dykstra_niter sets the maximum number ofiterations (default 500),dykstra_tol convergence tolerance of thealternating projection algorithm (default 1e-7),pinv_tol tolerance forcalculating the pseudo-inverse matrix (defaultlength(par).Machine$double.epsmax(eigenvalue)).
Value
htest object
Author(s)
Klaus Kähler Holst, Christian Bressen Pipper
References
Christian Bressen Pipper, Andreas Nordland & Klaus Kähler Holst(2025) A general approach to construct powerful tests for intersections ofone-sided null-hypotheses based on influence functions. arXiv:https://arxiv.org/abs/2511.07096.
See Also
test_zmax_onesidedlava::test_waldlava::closed_testing
Examples
S <- matrix(c(1, 0.5, 0.5, 2), 2, 2)thetahat <- c(0.5, -0.2)test_intersection_sw(thetahat, S, nsim.null = 1e5)test_intersection_sw(thetahat, S, weights = NULL)## Not run: # only on 'lava' >= 1.8.2e <- estimate(coef = thetahat, vcov = S, labels = c("p1", "p2"))lava::closed_testing(e, test_intersection_sw, noninf = c(-0.1, -0.1)) |> summary()## End(Not run)One-sided Zmax test
Description
Calculating test statistics and p-values for theonesided Zmax / minP test.z
Given parameter estimates(\widehat{\theta}_1, \ldots, \widehat{\theta}_p)^\top with approximate assymptotic covariance matrix\widehat{S}, let Z_i = \frac{\widehat{\theta}_i - \delta_i}{\operatorname{SE}(\widehat{\theta}_i)} , where\operatorname{SE}(\widehat{\theta}_i) = \widehat{S}_{ii}. The Zmaxtest statistic is thenZ_{max} = \max \{Z_1,\ldots,Z_p\}, and thenull-hypothesis isH_0: \theta_i \leq \delta_i, i=1,\ldots,p withnon-inferiority margin\delta_i, i=1,\ldots,p, for which the p-valueis calculated as 1 - \Phi_R(Z_{max}) where\phi_R is the CDFof the multivariate normal distribution with mean zero and correlationmatrixR = \operatorname{diag}(S_{11}^{-0.5}, \ldots, S_{pp}^{-0.5})S\operatorname{diag}(S_{11}^{-0.5}, \ldots, S_{pp}^{-0.5}).
Usage
test_zmax_onesided(par, vcov, noninf = 0, index = NULL, par.name = "theta")Arguments
par | (numeric) parameter estimates or |
vcov | (matrix) asymptotic variance estimate |
noninf | (numeric) non-inferiority margins |
index | (integer) subset of parameters to test |
par.name | (character) parameter names in output |
Value
htest object
Author(s)
Christian Bressen Pipper, Klaus Kähler Holst
See Also
test_intersection_sw()lava::test_wald()lava::closed_testing()
Scores truncated by death
Description
Simulated data inspired by the FLOW study (Perkovic 2024)...elt()The following variables are considered in this simulated dataset
time: time of first event in years (first major irreversible kidneyevent or non-related death)status: event type at first major irreversible kidney event (=1),non-related death (=2), or right censoring (=0)y: clinical outcome measurement (eGFR) at landmark time (:=2)r: missing indicator fory(1 if observed, 0 if either t<2 or if theoutcome was not measured for other reasons)a: binary treatment (1 := active, 0 := placebo)x1: covariate, clinical outcome at baseline (eGFR)x2: covariate, binary treatment usage indicator (1: SGLT2 treatment,0: none).
The actual failure times and censoring times are also included(failure.time,cens.time), and the full-data outcome (y0) given t>2.
Source
Simulated data
References
Perkovic, V., Tuttle, K. R., Rossing, P.,Mahaffey, K. W., Mann, J. F., Bakris, G., Baeres, F. M., Idorn, T.,Bosch-Traberg, H., Lausvig, N. L., and Pratley, R. (2024). Effects ofsemaglutide on chronic kidney disease in patients with type 2 diabetes. NewEngland Journal of Medicine, 391(2):109–121.
Examples
data(truncatedscore)Extract ensemble weights
Description
Extract ensemble weights
Usage
## S3 method for class 'superlearner'weights(object, ...)Arguments
object | (superlearner) Fitted model. |
... | Not used. |