Movatterモバイル変換


[0]ホーム

URL:


Title:The Scalable Highly Adaptive Lasso
Version:0.4.6
Description:A scalable implementation of the highly adaptive lasso algorithm, including routines for constructing sparse matrices of basis functions of the observed data, as well as a custom implementation of Lasso regression tailored to enhance efficiency when the matrix of predictors is composed exclusively of indicator functions. For ease of use and increased flexibility, the Lasso fitting routines invoke code from the 'glmnet' package by default. The highly adaptive lasso was first formulated and described by MJ van der Laan (2017) <doi:10.1515/ijb-2015-0097>, with practical demonstrations of its performance given by Benkeser and van der Laan (2016) <doi:10.1109/DSAA.2016.93>. This implementation of the highly adaptive lasso algorithm was described by Hejazi, Coyle, and van der Laan (2020) <doi:10.21105/joss.02526>.
Depends:R (≥ 3.1.0), Rcpp
License:GPL-3
URL:https://github.com/tlverse/hal9001
BugReports:https://github.com/tlverse/hal9001/issues
Encoding:UTF-8
LazyData:true
Imports:Matrix, stats, utils, methods, assertthat, origami (≥ 1.0.3),glmnet, data.table, stringr
Suggests:testthat, knitr, rmarkdown, microbenchmark, future, ggplot2,dplyr, tidyr, survival, SuperLearner
LinkingTo:Rcpp, RcppEigen
VignetteBuilder:knitr
RoxygenNote:7.2.3
NeedsCompilation:yes
Packaged:2023-11-13 21:27:19 UTC; jrcoyle
Author:Jeremy CoyleORCID iD [aut, cre], Nima HejaziORCID iD [aut], Rachael PhillipsORCID iD [aut], Lars van der Laan [aut], David BenkeserORCID iD [ctb], Oleg Sofrygin [ctb], Weixin CaiORCID iD [ctb], Mark van der LaanORCID iD [aut, cph, ths]
Maintainer:Jeremy Coyle <jeremyrcoyle@gmail.com>
Repository:CRAN
Date/Publication:2023-11-14 15:00:02 UTC

HAL Formula addition: Adding formula term object together into a singleformula object term.

Description

HAL Formula addition: Adding formula term object together into a singleformula object term.

Usage

## S3 method for class 'formula_hal9001'x + y

Arguments

x

Aformula_hal9001 object as outputted byh.

y

Aformula_hal9001 object as outputted byh.


Wrapper for Classic SuperLearner

Description

Wrapper forSuperLearner for objects of classhal9001

Usage

SL.hal9001(  Y,  X,  newX,  family,  obsWeights,  id,  max_degree = 2,  smoothness_orders = 1,  num_knots = 5,  ...)

Arguments

Y

Anumeric vector of observations of the outcome variable.

X

An inputmatrix with dimensions number of observations -by-number of covariates that will be used to derive the design matrix of basisfunctions.

newX

A matrix of new observations on which to obtain predictions. Thedefault ofNULL computes predictions on training inputsX.

family

Afamily object (one that is supportedbyglmnet) specifying the error/link family for ageneralized linear model.

obsWeights

Anumeric vector of observational-level weights.

id

Anumeric vector of IDs.

max_degree

The highest order of interaction terms for which basisfunctions ought to be generated.

smoothness_orders

Aninteger vector of length 1 or greater,specifying the smoothness of the basis functions. See the argumentsmoothness_orders offit_hal for more information.

num_knots

Aninteger vector of length 1 ormax_degree,specifying the maximum number of knot points (i.e., bins) for eachcovariate for generating basis functions. Seenum_knots argument infit_hal for more information.

...

Additional arguments tofit_hal.

Value

An object of classSL.hal9001 with a fittedhal9001object and corresponding predictions based on the input data.


Apply copy map

Description

OR duplicate training set columns together

Usage

apply_copy_map(X, copy_map)

Arguments

X

Sparse matrix containing columns of indicator functions.

copy_map

the copy map

Value

AdgCMatrix sparse matrix corresponding to the design matrixfor a zero-th order highly adaptive lasso, but with all duplicated columns(basis functions) removed.

Examples

gendata <- function(n) {  W1 <- runif(n, -3, 3)  W2 <- rnorm(n)  W3 <- runif(n)  W4 <- rnorm(n)  g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4))  A <- rbinom(n, 1, g0)  Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3))  Y <- rbinom(n, 1, Q0)  data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)x_basis <- make_design_matrix(X, basis_list)copy_map <- make_copy_map(x_basis)x_basis_uniq <- apply_copy_map(x_basis, copy_map)

Fast Coercion to Sparse Matrix

Description

Fast and efficient coercion of standard matrix objects to sparse matrices.Borrowed from http://gallery.rcpp.org/articles/sparse-matrix-coercion/.INTERNAL USE ONLY.

Usage

as_dgCMatrix(XX_)

Arguments

XX_

An object of classMatrix that has a sparse structuresuitable for coercion to a sparse matrix format ofdgCMatrix.

Value

An object of classdgCMatrix, coerced from inputXX_.


List Basis Functions

Description

Build a list of basis functions from a set of columns

Usage

basis_list_cols(  cols,  x,  smoothness_orders,  include_zero_order,  include_lower_order = FALSE)

Arguments

cols

Index or indices (asnumeric) of covariates (columns) ofinterest in the data matrixx for which basis functions ought to begenerated. Note that basis functions for interactions of these columns arecomputed automatically.

x

Amatrix containing observations in the rows and covariatesin the columns. Basis functions are computed for these covariates.

smoothness_orders

An integer vector of lengthncol(x)specifying the desired smoothness of the function in each covariate. k = 0is no smoothness (indicator basis), k = 1 is first order smoothness, and soon. For an additive model, the component function for each covariate willhave the degree of smoothness as specified by smoothness_orders. Fornon-additive components (tensor products of univariate basis functions),the univariate basis functions in each tensor product have smoothnessdegree as specified by smoothness_orders.

include_zero_order

Alogical, indicating whether the zerothorder basis functions are included for each covariate (ifTRUE), inaddition to the smooth basis functions given bysmoothness_orders.This allows the algorithm to data-adaptively choose the appropriate degreeof smoothness.

include_lower_order

Alogical, likeinclude_zero_order,except including all basis functions of lower smoothness degrees thanspecified viasmoothness_orders.

Value

Alist containing the basis functions generated from a set ofinput columns.


Compute Degree of Basis Functions

Description

Find the full list of basis functions up to a particular degree

Usage

basis_of_degree(  x,  degree,  smoothness_orders,  include_zero_order,  include_lower_order)

Arguments

x

An inputmatrix containing observations and covariatesfollowing standard conventions in problems of statistical learning.

degree

The highest order of interaction terms for which the basisfunctions ought to be generated. The default (NULL) corresponds togenerating basis functions for the full dimensionality of the input matrix.

smoothness_orders

An integer vector of lengthncol(x)specifying the desired smoothness of the function in each covariate. k = 0is no smoothness (indicator basis), k = 1 is first order smoothness, and soon. For an additive model, the component function for each covariate willhave the degree of smoothness as specified by smoothness_orders. Fornon-additive components (tensor products of univariate basis functions),the univariate basis functions in each tensor product have smoothnessdegree as specified by smoothness_orders.

include_zero_order

Alogical, indicating whether the zerothorder basis functions are included for each covariate (ifTRUE), inaddition to the smooth basis functions given bysmoothness_orders.This allows the algorithm to data-adaptively choose the appropriate degreeof smoothness.

include_lower_order

Alogical, likeinclude_zero_order,except including all basis functions of lower smoothness degrees thanspecified viasmoothness_orders.

Value

Alist containing basis functions and cutoffs generated froma set of input columns up to a particular pre-specified degree.


Calculate Proportion of Nonzero Entries

Description

Calculate Proportion of Nonzero Entries

Usage

calc_pnz(X)

Calculating Centered and Scaled Matrices

Description

Calculating Centered and Scaled Matrices

Usage

calc_xscale(X, xcenter)

Arguments

X

A sparse matrix, to be centered.

xcenter

A vector of column means to be used for centering X.


Enumerate Basis Functions

Description

Generate basis functions for all covariates and interaction terms thereof upto a specified order/degree.

Usage

enumerate_basis(  x,  max_degree = NULL,  smoothness_orders = rep(0, ncol(x)),  include_zero_order = FALSE,  include_lower_order = FALSE,  num_knots = NULL)

Arguments

x

An inputmatrix containing observations and covariatesfollowing standard conventions in problems of statistical learning.

max_degree

The highest order of interaction terms for which the basisfunctions ought to be generated. The default (NULL) corresponds togenerating basis functions for the full dimensionality of the input matrix.

smoothness_orders

An integer vector of lengthncol(x)specifying the desired smoothness of the function in each covariate. k = 0is no smoothness (indicator basis), k = 1 is first order smoothness, and soon. For an additive model, the component function for each covariate willhave the degree of smoothness as specified by smoothness_orders. Fornon-additive components (tensor products of univariate basis functions),the univariate basis functions in each tensor product have smoothnessdegree as specified by smoothness_orders.

include_zero_order

Alogical, indicating whether the zerothorder basis functions are included for each covariate (ifTRUE), inaddition to the smooth basis functions given bysmoothness_orders.This allows the algorithm to data-adaptively choose the appropriate degreeof smoothness.

include_lower_order

Alogical, likeinclude_zero_order,except including all basis functions of lower smoothness degrees thanspecified viasmoothness_orders.

num_knots

A vector of lengthmax_degree, which determines howgranular the knot points to generate basis functions should be for eachdegree of basis function. The first entry ofnum_knots determinesthe number of knot points to be used for each univariate basis function.More generally, The kth entry ofnum_knots determines the number ofknot points to be used for the kth degree basis functions. Specifically,for a kth degree basis function, which is the tensor product of kunivariate basis functions, this determines the number of knot points to beused for each univariate basis function in the tensor product.

Value

Alist of basis functions generated for all covariates andinteraction thereof up to a pre-specified degree.

Examples

gendata <- function(n) {  W1 <- runif(n, -3, 3)  W2 <- rnorm(n)  W3 <- runif(n)  W4 <- rnorm(n)  g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4))  A <- rbinom(n, 1, g0)  Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3))  Y <- rbinom(n, 1, Q0)  data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)

Enumerate Basis Functions at Generalized Edges

Description

For degrees of smoothness greater than 1, we must generate the lower ordersmoothness basis functions using the knot points at the "edge" of thehypercube. For example, consider f(x) = x^2 + x, which is second-ordersmooth, but will not be generated by purely quadratic basis functions. Wealso need to include the y = x function (which corresponds to first-orderHAL basis functions at the left most value/edge of x).

Usage

enumerate_edge_basis(  x,  max_degree = 3,  smoothness_orders = rep(0, ncol(x)),  include_zero_order = FALSE,  include_lower_order = FALSE)

Arguments

x

An inputmatrix containing observations and covariatesfollowing standard conventions in problems of statistical learning.

max_degree

The highest order of interaction terms for which the basisfunctions ought to be generated. The default (NULL) corresponds togenerating basis functions for the full dimensionality of the input matrix.

smoothness_orders

An integer vector of lengthncol(x)specifying the desired smoothness of the function in each covariate. k = 0is no smoothness (indicator basis), k = 1 is first order smoothness, and soon. For an additive model, the component function for each covariate willhave the degree of smoothness as specified by smoothness_orders. Fornon-additive components (tensor products of univariate basis functions),the univariate basis functions in each tensor product have smoothnessdegree as specified by smoothness_orders.

include_zero_order

Alogical, indicating whether the zerothorder basis functions are included for each covariate (ifTRUE), inaddition to the smooth basis functions given bysmoothness_orders.This allows the algorithm to data-adaptively choose the appropriate degreeof smoothness.

include_lower_order

Alogical, likeinclude_zero_order,except including all basis functions of lower smoothness degrees thanspecified viasmoothness_orders.


Generate Basis Functions

Description

Populates a column (indexed by basis_col) of x_basis with basis indicators.

Usage

evaluate_basis(basis, X, x_basis, basis_col)

Arguments

basis

The basis function.

X

The design matrix, containing the original data.

x_basis

The HAL design matrix, containing indicator functions.

basis_col

Numeric indicating which column to populate.


HAL: The Highly Adaptive Lasso

Description

Estimation procedure for HAL, the Highly Adaptive Lasso

Usage

fit_hal(  X,  Y,  formula = NULL,  X_unpenalized = NULL,  max_degree = ifelse(ncol(X) >= 20, 2, 3),  smoothness_orders = 1,  num_knots = num_knots_generator(max_degree = max_degree, smoothness_orders =    smoothness_orders, base_num_knots_0 = 200, base_num_knots_1 = 50),  reduce_basis = NULL,  family = c("gaussian", "binomial", "poisson", "cox", "mgaussian"),  lambda = NULL,  id = NULL,  weights = NULL,  offset = NULL,  fit_control = list(cv_select = TRUE, use_min = TRUE, lambda.min.ratio = 1e-04,    prediction_bounds = "default"),  basis_list = NULL,  return_lasso = TRUE,  return_x_basis = FALSE,  yolo = FALSE)

Arguments

X

An inputmatrix with dimensions number of observations -by-number of covariates that will be used to derive the design matrix of basisfunctions.

Y

Anumeric vector of observations of the outcome variable. Forfamily="mgaussian",Y is a matrix of observations of theoutcome variables.

formula

A character string formula to be used informula_hal. See its documentation for details.

X_unpenalized

An inputmatrix with the same number of rows asX, for which no L1 penalization will be performed. Note thatX_unpenalized is directly appended to the design matrix; no basisexpansion is performed onX_unpenalized.

max_degree

The highest order of interaction terms for which basisfunctions ought to be generated.

smoothness_orders

Aninteger, specifying the smoothness of thebasis functions. See details forsmoothness_orders for moreinformation.

num_knots

Aninteger vector of length 1 ormax_degree,specifying the maximum number of knot points (i.e., bins) for any covariatefor generating basis functions. Ifnum_knots is a unit-lengthvector, then the samenum_knots are used for each degree (this isnot recommended). The default settings fornum_knots arerecommended, and these defaults decreasenum_knots with increasingmax_degree andsmoothness_orders, which prevents (expensive)combinatorial explosions in the number of higher-degree and higher-orderbasis functions generated. This allows the complexity of the optimizationproblem to grow scalably. See details ofnum_knots more information.

reduce_basis

Am optionalnumeric value bounded in the openunit interval indicating the minimum proportion of 1's in a basis functioncolumn needed for the basis function to be included in the procedure to fitthe lasso. Any basis functions with a lower proportion of 1's than thecutoff will be removed. Defaults to 1 over the square root of the number ofobservations. Only applicable for models fit with zero-order splines, i.e.smoothness_orders = 0.

family

Acharacter or afamily object(supported byglmnet) specifying the error/linkfamily for a generalized linear model.character options are limitedto "gaussian" for fitting a standard penalized linear model, "binomial" forpenalized logistic regression, "poisson" for penalized Poisson regression,"cox" for a penalized proportional hazards model, and "mgaussian" formultivariate penalized linear model. Note that passing infamily objects leads to slower performance relative to passing in acharacter family (if supported). For example, one should setfamily = "binomial" instead offamily = binomial() whencallingfit_hal.

lambda

User-specified sequence of values of the regularizationparameter for the lasso L1 regression. IfNULL, the default sequenceincv.glmnet will be used. The cross-validatedoptimal value of this regularization parameter will be selected withcv.glmnet. Iffit_control'scv_selectargument is set toFALSE, then the lasso model will be fit viaglmnet, and regularized coefficient values for eachlambda in the input array will be returned.

id

A vector of ID values that is used to generate cross-validationfolds forcv.glmnet. This argument is ignored whenfit_control'scv_select argument isFALSE.

weights

observation weights; defaults to 1 per observation.

offset

a vector of offset values, used in fitting.

fit_control

List of arguments, including the following, and anyothers to be passed tocv.glmnet orglmnet.

  • cv_select: Alogical specifying if the sequence ofspecifiedlambda values should be passed tocv.glmnet in order for a single, optimal value oflambda to be selected according to cross-validation. Whencv_select = FALSE, aglmnet model will beused to fit the sequence of (or single)lambda.

  • use_min: Specify the choice of lambda to be selected bycv.glmnet. WhenTRUE,"lambda.min" isused; otherwise,"lambda.1se". Only used whencv_select = TRUE.

  • lambda.min.ratio: Aglmnet argumentspecifying the smallest value forlambda, as a fraction oflambda.max, the (data derived) entry value (i.e. the smallest valuefor which all coefficients are zero). We've seen that not settinglambda.min.ratio can lead to nolambda values that fit thedata sufficiently well.

  • prediction_bounds: An optional vector of size two that providesthe lower and upper bounds predictions; not used whenfamily = "cox". Whenprediction_bounds = "default", thepredictions are bounded betweenmin(Y) - sd(Y) andmax(Y) + sd(Y) for each outcome (whenfamily = "mgaussian",each outcome can have different bounds). Bounding ensures that there isno extrapolation.

basis_list

The full set of basis functions generated fromX.

return_lasso

Alogical indicating whether or not to returntheglmnet fit object of the lasso model.

return_x_basis

Alogical indicating whether or not to returnthe matrix of (possibly reduced) basis functions used infit_hal.

yolo

Alogical indicating whether to print one of a curatedselection of quotes from the HAL9000 computer, from the criticallyacclaimed epic science-fiction film "2001: A Space Odyssey" (1968).

Details

The procedure uses a custom C++ implementation to generate a designmatrix of spline basis functions of covariates and interactions ofcovariates. The lasso regression is fit to this design matrix viacv.glmnet or a custom implementation derived fromorigami. The maximum dimension of the design matrix isn -by-(n * 2^(d-1)), where wheren is the number of observations andd is the number of covariates.

Forsmoothness_orders = 0, only zero-order splines (piece-wiseconstant) are generated, which assume the true regression function has nosmoothness or continuity. Whensmoothness_orders = 1, first-ordersplines (piece-wise linear) are generated, which assume continuity of thetrue regression function. Whensmoothness_orders = 2, second-ordersplines (piece-wise quadratic and linear terms) are generated, which assumea the true regression function has a single order of differentiability.

num_knots argument specifies the number of knot points for eachcovariate and for eachmax_degree. Fewer knot points cansignificantly decrease runtime, but might be overly simplistic. Whenconsideringsmoothness_orders = 0, too few knot points (e.g., < 50)can significantly reduce performance. Whensmoothness_orders = 1 orhigher, then fewer knot points (e.g., 10-30) is actually better forperformance. We recommend specifyingnum_knots with respect tosmoothness_orders, and as a vector of lengthmax_degree withvalues decreasing exponentially. This prevents combinatorial explosions inthe number of higher-degree basis functions generated. The default behaviorofnum_knots follows this logic — forsmoothness_orders = 0,num_knots is set to500 / 2^{j-1}, and forsmoothness_orders = 1 or higher,num_knots is set to200 / 2^{j-1}, wherej is the interaction degree. We alsoinclude some other suitable settings fornum_knots below, all ofwhich are less complex than defaultnum_knots and will thus resultin a faster runtime:

Value

Object of classhal9001, containing a list of basisfunctions, a copy map, coefficients estimated for basis functions, andtiming results (for assessing computational efficiency).

Examples

n <- 100p <- 3x <- xmat <- matrix(rnorm(n * p), n, p)y_prob <- plogis(3 * sin(x[, 1]) + sin(x[, 2]))y <- rbinom(n = n, size = 1, prob = y_prob)hal_fit <- fit_hal(X = x, Y = y, family = "binomial")preds <- predict(hal_fit, new_data = x)

HAL Formula: Convert formula or string toformula_HAL object.

Description

HAL Formula: Convert formula or string toformula_HAL object.

Usage

formula_hal(formula, smoothness_orders, num_knots, X = NULL)

Arguments

formula

Aformula_hal9001 object as outputted byh.

smoothness_orders

A default value fors if not providedexplicitly to the functionh.

num_knots

A default value fork if not provided explicitly tothe functionh.

X

Controls inheritance of the variableX from parent environment.WhenNULL (the default), such a variable is inherited.


Generates rules based on knot points of the fitted HAL basis functions withnon-zero coefficients.

Description

Generates rules based on knot points of the fitted HAL basis functions withnon-zero coefficients.

Usage

generate_all_rules(basis_list, coefs, X_colnames)

HAL Formula term: Generate a single term of the HAL basis

Description

HAL Formula term: Generate a single term of the HAL basis

Usage

h(  ...,  k = NULL,  s = NULL,  pf = 1,  monotone = c("none", "i", "d"),  . = NULL,  dot_args_as_string = FALSE,  X = NULL)

Arguments

...

Variables for which to generate multivariate interaction basisfunction where the variables can be found in a matrixX in a parentenvironment/frame. Note, just like standardformula objects, thevariables should not be characters (e.g. do h(W1,W2) not h("W1", "W2"))h(W1,W2,W3) will generate three-way HAL basis functions between W1, W2, andW3. It willnot generate the lower dimensional basis functions.

k

The number of knots for each univariate basis function used togenerate the tensor product basis functions. If a single value then thisvalue is used for the univariate basis functions for each variable.Otherwise, this should be a variable named list that specifies for eachvariable how many knots points should be used.h(W1,W2,W3, k = list(W1 = 3, W2 = 2, W3=1)) is equivalent to firstbinning the variablesW1,W2 andW3 into3,2 and1 uniquevalues and then callingh(W1,W2,W3). This coarsening of the data ensuresthat fewer basis functions are generated, which can lead to substantialcomputational speed-ups. If not provided and the variablenum_knotsis in the parent environment, thens will be set tonum_knots'.

s

Thesmoothness_orders for the basis functions. The possiblevalues are0 for piece-wise constant zero-order splines or1 forpiece-wise linear first-order splines. If not provided and the variablesmoothness_orders is in the parent environment, thens willbe set tosmoothness_orders.

pf

Apenalty.factor value the generated basis functions that isused byglmnet in the LASSO penalization procedure.pf = 1(default) is the standard penalization factor used byglmnet andpf = 0 means the generated basis functions are unpenalized.

monotone

Whether the basis functions should enforce monotonicity ofthe interaction term. If⁠\code{s} = 0⁠, this is monotonicity of thefunction, and, if⁠\code{s} = 1⁠, this is monotonicity of its derivative(e.g., enforcing a convex fit). Set"none" for no constraints,"i" fora monotone increasing constraint, and"d" for a monotone decreasingconstraint. Using"i" constrains the basis functions to have positivecoefficients in the fit, and"d" constrains the basis functions to havenegative coefficients.

.

Just like withformula,. as inh(.) orh(.,.) istreated as a wildcard variable that generates terms using all variables inthe data. The argument. should be a character vector of variablenames that. iterates over. Specifically,h(., k=1, . = c("W1", "W2", "W3")) is equivalent toh(W1, k=1) + h(W2, k=1) + h(W3, k=1), andh(., ., k=1, . = c("W1", "W2", "W3")) is equivalent toh(W1,W2, k=1) + h(W2,W3, k=1) + h(W1, W3, k=1)

dot_args_as_string

Whether the arguments... are characters orcharacter vectors and should thus be evaluated directly. WhenTRUE, theexpression h("W1", "W2") can be used.

X

An optional design matrix where the variables given in...can be found. Otherwise,X is taken from the parent environment.


HAL 9000 Quotes

Description

Prints a quote from the HAL 9000 robot from 2001: A Space Odyssey

Usage

hal9000()

hal9001

Description

Package for fitting the Highly Adaptive LASSO (HAL) estimator


HAL9000 Quotes from "2001: A Space Odyssey"

Description

Curated selection of quotes from the HAL9000 computer, from the criticallyacclaimed epic science-fiction film "2001: A Space Odyssey" (1968).

Usage

hal_quotes

Format

A vector of quotes.


Find Copies of Columns

Description

Index vector that, for each column in X, indicates the index of the firstcopy of that column

Usage

index_first_copy(X)

Arguments

X

Sparse matrix containing columns of indicator functions.


Sort Basis Functions

Description

Build a sorted list of unique basis functions based on columns, where eachbasis function is a list

Usage

make_basis_list(X_sub, cols, order_map)

Arguments

X_sub

A subset of the columns of X, the original design matrix.

cols

An index of the columns that were reduced to by sub-setting.

order_map

A vector with length the original unsubsetted matrix X which specifies the smoothness of the function in each covariate.

Details

Note that sorting of columns is performed such that the basis orderequals cols.length() and each basis function is a list(cols, cutoffs).


Build Copy Maps

Description

Build Copy Maps

Usage

make_copy_map(x_basis)

Arguments

x_basis

A design matrix consisting of basis (indicator) functions forcovariates (X) and terms for interactions thereof.

Value

Alist ofnumeric vectors indicating indices of basisfunctions that are identical in the training set.

Examples

gendata <- function(n) {  W1 <- runif(n, -3, 3)  W2 <- rnorm(n)  W3 <- runif(n)  W4 <- rnorm(n)  g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4))  A <- rbinom(n, 1, g0)  Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3))  Y <- rbinom(n, 1, Q0)  data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)x_basis <- make_design_matrix(X, basis_list)copy_map <- make_copy_map(x_basis)

Build HAL Design Matrix

Description

Make a HAL design matrix based on original design matrix X and a list ofbasis functions in argument blist

Usage

make_design_matrix(X, blist, p_reserve = 0.5)

Arguments

X

Matrix of covariates containing observed data in the columns.

blist

List of basis functions with which to build HAL design matrix.

p_reserve

Sparse matrix pre-allocation proportion. Default value is 0.5.If one expects a dense HAL design matrix, it is useful to set p_reserve to a higher value.

Value

AdgCMatrix sparse matrix of indicator basis functionscorresponding to the design matrix in a zero-order highly adaptive lasso.

Examples

gendata <- function(n) {  W1 <- runif(n, -3, 3)  W2 <- rnorm(n)  W3 <- runif(n)  W4 <- rnorm(n)  g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4))  A <- rbinom(n, 1, g0)  Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3))  Y <- rbinom(n, 1, Q0)  data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)x_basis <- make_design_matrix(X, basis_list)

Mass-based reduction of basis functions

Description

A helper function that finds which basis functions to keep (and equivalentlywhich to discard) based on the proportion of 1's (observations, i.e.,"mass") included in a given basis function.

Usage

make_reduced_basis_map(x_basis, reduce_basis_crit)

Arguments

x_basis

A matrix of basis functions with all redundant basisfunctions already removed.

reduce_basis_crit

A scalarnumeric value bounded in the openinterval (0,1) indicating the minimum proportion of 1's in a basis functioncolumn needed for the basis function to be included in the procedure to fitthe Lasso. Any basis functions with a lower proportion of 1's than thespecified cutoff will be removed. This argument defaults toNULL, inwhich case all basis functions are used in the lasso-fitting stage of theHAL algorithm.

Value

A binarynumeric vector indicating which columns of thematrix of basis functions to keep (given a one) and which to discard (givena zero).


Compute Values of Basis Functions

Description

Computes and returns the indicator value for the basis described bycols and cutoffs for a given row of X

Usage

meets_basis(X, row_num, cols, cutoffs, orders)

Arguments

X

The design matrix, containing the original data.

row_num

Numeri for a row index over which to evaluate.

cols

Numeric for the column indices of the basis function.

cutoffs

Numeric providing thresholds.

orders

Numeric providing smoothness orders


A default generator for thenum_knots argument for each degree ofinteractions and the smoothness orders.

Description

A default generator for thenum_knots argument for each degree ofinteractions and the smoothness orders.

Usage

num_knots_generator(  max_degree,  smoothness_orders,  base_num_knots_0 = 500,  base_num_knots_1 = 200)

Arguments

max_degree

interaction degree.

smoothness_orders

seefit_hal.

base_num_knots_0

The base number of knots for zeroth-order smoothnessbasis functions. The number of knots by degree interaction decays asbase_num_knots_0/2^(d-1) whered is the interaction degree of the basisfunction.

base_num_knots_1

The base number of knots for 1 or greater ordersmoothness basis functions. The number of knots by degree interactiondecays asbase_num_knots_1/2^(d-1) whered is the interaction degree ofthe basis function.


predict.SL.hal9001

Description

Predict method for objects of classSL.hal9001

Usage

## S3 method for class 'SL.hal9001'predict(object, newdata, ...)

Arguments

object

A fitted object of classhal9001.

newdata

A matrix of new observations on which to obtain predictions.

...

Not used.

Value

Anumeric vector of predictions from aSL.hal9001object based on the providenewdata.


Prediction from HAL fits

Description

Prediction from HAL fits

Usage

## S3 method for class 'hal9001'predict(  object,  new_data,  new_X_unpenalized = NULL,  offset = NULL,  type = c("response", "link"),  ...)

Arguments

object

An object of classhal9001, containing the results offitting the Highly Adaptive Lasso, as produced byfit_hal.

new_data

Amatrix ordata.frame containing new data(i.e., observations not used for fitting thehal9001 object that'spassed in via theobject argument) for which thehal9001object will compute predicted values.

new_X_unpenalized

If the user suppliedX_unpenalized duringtraining, then user should also supply this matrix with the same number ofobservations asnew_data.

offset

A vector of offsets. Must be provided if provided at training.

type

Either "response" for predictions of the response, or "link" forun-transformed predictions (on the scale of the link function).

...

Additional arguments passed topredict as necessary.

Details

Method for computing and extracting predictions from fits of theHighly Adaptive Lasso estimator, returned as a single S3 objects of classhal9001.

Value

Anumeric vector of predictions from ahal9001 object.

Note

This prediction method does not function similarly to the equivalentmethod fromglmnet. In particular, this procedure will not return asubset of lambdas originally specified in callingfit_halnor result in re-fitting. Instead, it will return predictions for all ofthe lambdas specified in the call tofit_hal that constructsobject, whenfit_control'scv_select is set toFALSE. Whenfit_control'scv_select is set toTRUE, predictions will only be returned for the value of lambdaselected by cross-validation.


Print formula_hal9001 object

Description

Print formula_hal9001 object

Usage

## S3 method for class 'formula_hal9001'print(x, ...)

Arguments

x

A formula_hal9001 object.

...

Other arguments (ignored).


Print Method for Summary Class of HAL fits

Description

Print Method for Summary Class of HAL fits

Usage

## S3 method for class 'summary.hal9001'print(x, length = NULL, ...)

Arguments

x

An object of classsummary.hal9001.

length

The number of ranked coefficients to be summarized.

...

Other arguments (ignored).


Discretize Variables into Number of Bins by Unique Values

Description

Discretize Variables into Number of Bins by Unique Values

Usage

quantizer(X, bins)

Arguments

X

Anumeric vector to be discretized.

bins

Anumeric scalar indicating the number of bins into whichX should be discretized..


Squash HAL objects

Description

Reduce footprint by dropping basis functions with coefficients of zero

Usage

squash_hal_fit(object)

Arguments

object

An object of classhal9001, containing the results offitting the Highly Adaptive LASSO, as produced by a call tofit_hal.

Value

Object of classhal9001, similar to the input object butreduced such that coefficients belonging to bases with coefficients equalto zero removed.

Examples

# generate simple test datan <- 100p <- 3x <- matrix(rnorm(n * p), n, p)y <- sin(x[, 1]) * sin(x[, 2]) + rnorm(n, mean = 0, sd = 0.2)# fit HAL model and squash resulting object to reduce footprinthal_fit <- fit_hal(X = x, Y = y, yolo = FALSE)squashed <- squash_hal_fit(hal_fit)

Summary Method for HAL fit objects

Description

Summary Method for HAL fit objects

Usage

## S3 method for class 'hal9001'summary(  object,  lambda = NULL,  only_nonzero_coefs = TRUE,  include_redundant_terms = FALSE,  round_cutoffs = 3,  ...)

Arguments

object

An object of classhal9001, containing the results offitting the Highly Adaptive Lasso, as produced byfit_hal.

lambda

Optionalnumeric value of the lambda tuningparameter, for which corresponding coefficient values will be summarized.Defaults tofit_hal's optimal value,lambda_star, orthe minimum value oflambda_star.

only_nonzero_coefs

Alogical specifying whether the summaryshould include only terms with non-zero coefficients.

include_redundant_terms

Alogical specifying whether thesummary should remove so-called "redundant terms". We define a redundantterm (say x1) as a term (1) with basis function corresponding to anexisting basis function, a duplicate; and (2) the duplicate contains thex1 term as part of its term, so that x1 terms inclusion would be redundant.For example, say the same coefficient corresponds to these three terms:(1) "I(age >= 50)*I(bmi >= 18)", (2) "I(age >= 50)", and (3)"I(education >= 16)". Wheninclude_redundant_terms isFALSE (default), the second basis function is omitted.

round_cutoffs

Aninteger indicating the number of decimalplaces to be used for rounding cutoff values in the term. For example, if"bmi" was numeric that was rounded to the third decimal, in the exampleabove we would have needed to specifyround_cutoffs = 0 in order toyield a term like "I(bmi >= 18)" opposed to something like"I(bmi >= 18.111)". This rounding is intended to simplify the term-wisepart of the output and only rounds the basis cutoffs, thehal9001model's coefficients are not rounded.

...

Additional arguments passed tosummary, not supported.

Details

Method for summarizing the coefficients of the Highly AdaptiveLasso estimator in terms of the basis functions corresponding to covariatesand interactions of covariates, returned as a single S3 object of classhal9001.

Due to the nature of the basis function terms, the summary tables can beextremely wide. The R environment might not be the optimal location to viewthe summary. Tables can be exported from R to LaTeX withxtablepackage (or similar). Here's an example:print(xtable(summary(fit)$table, type = "latex"), file = "dt.tex").

Value

A list summarizing ahal9001 object's coefficients.


[8]ページ先頭

©2009-2025 Movatter.jp