| Title: | The Scalable Highly Adaptive Lasso |
| Version: | 0.4.6 |
| Description: | A scalable implementation of the highly adaptive lasso algorithm, including routines for constructing sparse matrices of basis functions of the observed data, as well as a custom implementation of Lasso regression tailored to enhance efficiency when the matrix of predictors is composed exclusively of indicator functions. For ease of use and increased flexibility, the Lasso fitting routines invoke code from the 'glmnet' package by default. The highly adaptive lasso was first formulated and described by MJ van der Laan (2017) <doi:10.1515/ijb-2015-0097>, with practical demonstrations of its performance given by Benkeser and van der Laan (2016) <doi:10.1109/DSAA.2016.93>. This implementation of the highly adaptive lasso algorithm was described by Hejazi, Coyle, and van der Laan (2020) <doi:10.21105/joss.02526>. |
| Depends: | R (≥ 3.1.0), Rcpp |
| License: | GPL-3 |
| URL: | https://github.com/tlverse/hal9001 |
| BugReports: | https://github.com/tlverse/hal9001/issues |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | Matrix, stats, utils, methods, assertthat, origami (≥ 1.0.3),glmnet, data.table, stringr |
| Suggests: | testthat, knitr, rmarkdown, microbenchmark, future, ggplot2,dplyr, tidyr, survival, SuperLearner |
| LinkingTo: | Rcpp, RcppEigen |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.2.3 |
| NeedsCompilation: | yes |
| Packaged: | 2023-11-13 21:27:19 UTC; jrcoyle |
| Author: | Jeremy Coyle |
| Maintainer: | Jeremy Coyle <jeremyrcoyle@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2023-11-14 15:00:02 UTC |
HAL Formula addition: Adding formula term object together into a singleformula object term.
Description
HAL Formula addition: Adding formula term object together into a singleformula object term.
Usage
## S3 method for class 'formula_hal9001'x + yArguments
x | A |
y | A |
Wrapper for Classic SuperLearner
Description
Wrapper forSuperLearner for objects of classhal9001
Usage
SL.hal9001( Y, X, newX, family, obsWeights, id, max_degree = 2, smoothness_orders = 1, num_knots = 5, ...)Arguments
Y | A |
X | An input |
newX | A matrix of new observations on which to obtain predictions. Thedefault of |
family | A |
obsWeights | A |
id | A |
max_degree | The highest order of interaction terms for which basisfunctions ought to be generated. |
smoothness_orders | An |
num_knots | An |
... | Additional arguments to |
Value
An object of classSL.hal9001 with a fittedhal9001object and corresponding predictions based on the input data.
Apply copy map
Description
OR duplicate training set columns together
Usage
apply_copy_map(X, copy_map)Arguments
X | Sparse matrix containing columns of indicator functions. |
copy_map | the copy map |
Value
AdgCMatrix sparse matrix corresponding to the design matrixfor a zero-th order highly adaptive lasso, but with all duplicated columns(basis functions) removed.
Examples
gendata <- function(n) { W1 <- runif(n, -3, 3) W2 <- rnorm(n) W3 <- runif(n) W4 <- rnorm(n) g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4)) A <- rbinom(n, 1, g0) Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3)) Y <- rbinom(n, 1, Q0) data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)x_basis <- make_design_matrix(X, basis_list)copy_map <- make_copy_map(x_basis)x_basis_uniq <- apply_copy_map(x_basis, copy_map)Fast Coercion to Sparse Matrix
Description
Fast and efficient coercion of standard matrix objects to sparse matrices.Borrowed from http://gallery.rcpp.org/articles/sparse-matrix-coercion/.INTERNAL USE ONLY.
Usage
as_dgCMatrix(XX_)Arguments
XX_ | An object of class |
Value
An object of classdgCMatrix, coerced from inputXX_.
List Basis Functions
Description
Build a list of basis functions from a set of columns
Usage
basis_list_cols( cols, x, smoothness_orders, include_zero_order, include_lower_order = FALSE)Arguments
cols | Index or indices (as |
x | A |
smoothness_orders | An integer vector of length |
include_zero_order | A |
include_lower_order | A |
Value
Alist containing the basis functions generated from a set ofinput columns.
Compute Degree of Basis Functions
Description
Find the full list of basis functions up to a particular degree
Usage
basis_of_degree( x, degree, smoothness_orders, include_zero_order, include_lower_order)Arguments
x | An input |
degree | The highest order of interaction terms for which the basisfunctions ought to be generated. The default ( |
smoothness_orders | An integer vector of length |
include_zero_order | A |
include_lower_order | A |
Value
Alist containing basis functions and cutoffs generated froma set of input columns up to a particular pre-specified degree.
Calculate Proportion of Nonzero Entries
Description
Calculate Proportion of Nonzero Entries
Usage
calc_pnz(X)Calculating Centered and Scaled Matrices
Description
Calculating Centered and Scaled Matrices
Usage
calc_xscale(X, xcenter)Arguments
X | A sparse matrix, to be centered. |
xcenter | A vector of column means to be used for centering X. |
Enumerate Basis Functions
Description
Generate basis functions for all covariates and interaction terms thereof upto a specified order/degree.
Usage
enumerate_basis( x, max_degree = NULL, smoothness_orders = rep(0, ncol(x)), include_zero_order = FALSE, include_lower_order = FALSE, num_knots = NULL)Arguments
x | An input |
max_degree | The highest order of interaction terms for which the basisfunctions ought to be generated. The default ( |
smoothness_orders | An integer vector of length |
include_zero_order | A |
include_lower_order | A |
num_knots | A vector of length |
Value
Alist of basis functions generated for all covariates andinteraction thereof up to a pre-specified degree.
Examples
gendata <- function(n) { W1 <- runif(n, -3, 3) W2 <- rnorm(n) W3 <- runif(n) W4 <- rnorm(n) g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4)) A <- rbinom(n, 1, g0) Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3)) Y <- rbinom(n, 1, Q0) data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)Enumerate Basis Functions at Generalized Edges
Description
For degrees of smoothness greater than 1, we must generate the lower ordersmoothness basis functions using the knot points at the "edge" of thehypercube. For example, consider f(x) = x^2 + x, which is second-ordersmooth, but will not be generated by purely quadratic basis functions. Wealso need to include the y = x function (which corresponds to first-orderHAL basis functions at the left most value/edge of x).
Usage
enumerate_edge_basis( x, max_degree = 3, smoothness_orders = rep(0, ncol(x)), include_zero_order = FALSE, include_lower_order = FALSE)Arguments
x | An input |
max_degree | The highest order of interaction terms for which the basisfunctions ought to be generated. The default ( |
smoothness_orders | An integer vector of length |
include_zero_order | A |
include_lower_order | A |
Generate Basis Functions
Description
Populates a column (indexed by basis_col) of x_basis with basis indicators.
Usage
evaluate_basis(basis, X, x_basis, basis_col)Arguments
basis | The basis function. |
X | The design matrix, containing the original data. |
x_basis | The HAL design matrix, containing indicator functions. |
basis_col | Numeric indicating which column to populate. |
HAL: The Highly Adaptive Lasso
Description
Estimation procedure for HAL, the Highly Adaptive Lasso
Usage
fit_hal( X, Y, formula = NULL, X_unpenalized = NULL, max_degree = ifelse(ncol(X) >= 20, 2, 3), smoothness_orders = 1, num_knots = num_knots_generator(max_degree = max_degree, smoothness_orders = smoothness_orders, base_num_knots_0 = 200, base_num_knots_1 = 50), reduce_basis = NULL, family = c("gaussian", "binomial", "poisson", "cox", "mgaussian"), lambda = NULL, id = NULL, weights = NULL, offset = NULL, fit_control = list(cv_select = TRUE, use_min = TRUE, lambda.min.ratio = 1e-04, prediction_bounds = "default"), basis_list = NULL, return_lasso = TRUE, return_x_basis = FALSE, yolo = FALSE)Arguments
X | An input |
Y | A |
formula | A character string formula to be used in |
X_unpenalized | An input |
max_degree | The highest order of interaction terms for which basisfunctions ought to be generated. |
smoothness_orders | An |
num_knots | An |
reduce_basis | Am optional |
family | A |
lambda | User-specified sequence of values of the regularizationparameter for the lasso L1 regression. If |
id | A vector of ID values that is used to generate cross-validationfolds for |
weights | observation weights; defaults to 1 per observation. |
offset | a vector of offset values, used in fitting. |
fit_control | List of arguments, including the following, and anyothers to be passed to
|
basis_list | The full set of basis functions generated from |
return_lasso | A |
return_x_basis | A |
yolo | A |
Details
The procedure uses a custom C++ implementation to generate a designmatrix of spline basis functions of covariates and interactions ofcovariates. The lasso regression is fit to this design matrix viacv.glmnet or a custom implementation derived fromorigami. The maximum dimension of the design matrix isn -by-(n * 2^(d-1)), where wheren is the number of observations andd is the number of covariates.
Forsmoothness_orders = 0, only zero-order splines (piece-wiseconstant) are generated, which assume the true regression function has nosmoothness or continuity. Whensmoothness_orders = 1, first-ordersplines (piece-wise linear) are generated, which assume continuity of thetrue regression function. Whensmoothness_orders = 2, second-ordersplines (piece-wise quadratic and linear terms) are generated, which assumea the true regression function has a single order of differentiability.
num_knots argument specifies the number of knot points for eachcovariate and for eachmax_degree. Fewer knot points cansignificantly decrease runtime, but might be overly simplistic. Whenconsideringsmoothness_orders = 0, too few knot points (e.g., < 50)can significantly reduce performance. Whensmoothness_orders = 1 orhigher, then fewer knot points (e.g., 10-30) is actually better forperformance. We recommend specifyingnum_knots with respect tosmoothness_orders, and as a vector of lengthmax_degree withvalues decreasing exponentially. This prevents combinatorial explosions inthe number of higher-degree basis functions generated. The default behaviorofnum_knots follows this logic — forsmoothness_orders = 0,num_knots is set to500 / 2^{j-1}, and forsmoothness_orders = 1 or higher,num_knots is set to200 / 2^{j-1}, wherej is the interaction degree. We alsoinclude some other suitable settings fornum_knots below, all ofwhich are less complex than defaultnum_knots and will thus resultin a faster runtime:
Some good settings for little to no cost in performance:
If
smoothness_orders = 0andmax_degree = 3,num_knots = c(400, 200, 100).If
smoothness_orders = 1+andmax_degree = 3,num_knots = c(100, 75, 50).
Recommended settings for fairly fast runtime:
If
smoothness_orders = 0andmax_degree = 3,num_knots = c(200, 100, 50).If
smoothness_orders = 1+andmax_degree = 3,num_knots = c(50, 25, 15).
Recommended settings for fast runtime:
If
smoothness_orders = 0andmax_degree = 3,num_knots = c(100, 50, 25).If
smoothness_orders = 1+andmax_degree = 3,num_knots = c(40, 15, 10).
Recommended settings for very fast runtime:
If
smoothness_orders = 0andmax_degree = 3,num_knots = c(50, 25, 10).If
smoothness_orders = 1+andmax_degree = 3,num_knots = c(25, 10, 5).
Value
Object of classhal9001, containing a list of basisfunctions, a copy map, coefficients estimated for basis functions, andtiming results (for assessing computational efficiency).
Examples
n <- 100p <- 3x <- xmat <- matrix(rnorm(n * p), n, p)y_prob <- plogis(3 * sin(x[, 1]) + sin(x[, 2]))y <- rbinom(n = n, size = 1, prob = y_prob)hal_fit <- fit_hal(X = x, Y = y, family = "binomial")preds <- predict(hal_fit, new_data = x)HAL Formula: Convert formula or string toformula_HAL object.
Description
HAL Formula: Convert formula or string toformula_HAL object.
Usage
formula_hal(formula, smoothness_orders, num_knots, X = NULL)Arguments
formula | A |
smoothness_orders | A default value for |
num_knots | A default value for |
X | Controls inheritance of the variable |
Generates rules based on knot points of the fitted HAL basis functions withnon-zero coefficients.
Description
Generates rules based on knot points of the fitted HAL basis functions withnon-zero coefficients.
Usage
generate_all_rules(basis_list, coefs, X_colnames)HAL Formula term: Generate a single term of the HAL basis
Description
HAL Formula term: Generate a single term of the HAL basis
Usage
h( ..., k = NULL, s = NULL, pf = 1, monotone = c("none", "i", "d"), . = NULL, dot_args_as_string = FALSE, X = NULL)Arguments
... | Variables for which to generate multivariate interaction basisfunction where the variables can be found in a matrix |
k | The number of knots for each univariate basis function used togenerate the tensor product basis functions. If a single value then thisvalue is used for the univariate basis functions for each variable.Otherwise, this should be a variable named list that specifies for eachvariable how many knots points should be used. |
s | The |
pf | A |
monotone | Whether the basis functions should enforce monotonicity ofthe interaction term. If |
. | Just like with |
dot_args_as_string | Whether the arguments |
X | An optional design matrix where the variables given in |
HAL 9000 Quotes
Description
Prints a quote from the HAL 9000 robot from 2001: A Space Odyssey
Usage
hal9000()hal9001
Description
Package for fitting the Highly Adaptive LASSO (HAL) estimator
HAL9000 Quotes from "2001: A Space Odyssey"
Description
Curated selection of quotes from the HAL9000 computer, from the criticallyacclaimed epic science-fiction film "2001: A Space Odyssey" (1968).
Usage
hal_quotesFormat
A vector of quotes.
Find Copies of Columns
Description
Index vector that, for each column in X, indicates the index of the firstcopy of that column
Usage
index_first_copy(X)Arguments
X | Sparse matrix containing columns of indicator functions. |
Sort Basis Functions
Description
Build a sorted list of unique basis functions based on columns, where eachbasis function is a list
Usage
make_basis_list(X_sub, cols, order_map)Arguments
X_sub | A subset of the columns of X, the original design matrix. |
cols | An index of the columns that were reduced to by sub-setting. |
order_map | A vector with length the original unsubsetted matrix X which specifies the smoothness of the function in each covariate. |
Details
Note that sorting of columns is performed such that the basis orderequals cols.length() and each basis function is a list(cols, cutoffs).
Build Copy Maps
Description
Build Copy Maps
Usage
make_copy_map(x_basis)Arguments
x_basis | A design matrix consisting of basis (indicator) functions forcovariates (X) and terms for interactions thereof. |
Value
Alist ofnumeric vectors indicating indices of basisfunctions that are identical in the training set.
Examples
gendata <- function(n) { W1 <- runif(n, -3, 3) W2 <- rnorm(n) W3 <- runif(n) W4 <- rnorm(n) g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4)) A <- rbinom(n, 1, g0) Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3)) Y <- rbinom(n, 1, Q0) data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)x_basis <- make_design_matrix(X, basis_list)copy_map <- make_copy_map(x_basis)Build HAL Design Matrix
Description
Make a HAL design matrix based on original design matrix X and a list ofbasis functions in argument blist
Usage
make_design_matrix(X, blist, p_reserve = 0.5)Arguments
X | Matrix of covariates containing observed data in the columns. |
blist | List of basis functions with which to build HAL design matrix. |
p_reserve | Sparse matrix pre-allocation proportion. Default value is 0.5.If one expects a dense HAL design matrix, it is useful to set p_reserve to a higher value. |
Value
AdgCMatrix sparse matrix of indicator basis functionscorresponding to the design matrix in a zero-order highly adaptive lasso.
Examples
gendata <- function(n) { W1 <- runif(n, -3, 3) W2 <- rnorm(n) W3 <- runif(n) W4 <- rnorm(n) g0 <- plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * W4)) A <- rbinom(n, 1, g0) Q0 <- plogis(0.15 * (2 * A + 2 * A * W1 + 6 * A * W3 * W4 - 3)) Y <- rbinom(n, 1, Q0) data.frame(A, W1, W2, W3, W4, Y)}set.seed(1234)data <- gendata(100)covars <- setdiff(names(data), "Y")X <- as.matrix(data[, covars, drop = FALSE])basis_list <- enumerate_basis(X)x_basis <- make_design_matrix(X, basis_list)Mass-based reduction of basis functions
Description
A helper function that finds which basis functions to keep (and equivalentlywhich to discard) based on the proportion of 1's (observations, i.e.,"mass") included in a given basis function.
Usage
make_reduced_basis_map(x_basis, reduce_basis_crit)Arguments
x_basis | A matrix of basis functions with all redundant basisfunctions already removed. |
reduce_basis_crit | A scalar |
Value
A binarynumeric vector indicating which columns of thematrix of basis functions to keep (given a one) and which to discard (givena zero).
Compute Values of Basis Functions
Description
Computes and returns the indicator value for the basis described bycols and cutoffs for a given row of X
Usage
meets_basis(X, row_num, cols, cutoffs, orders)Arguments
X | The design matrix, containing the original data. |
row_num | Numeri for a row index over which to evaluate. |
cols | Numeric for the column indices of the basis function. |
cutoffs | Numeric providing thresholds. |
orders | Numeric providing smoothness orders |
A default generator for thenum_knots argument for each degree ofinteractions and the smoothness orders.
Description
A default generator for thenum_knots argument for each degree ofinteractions and the smoothness orders.
Usage
num_knots_generator( max_degree, smoothness_orders, base_num_knots_0 = 500, base_num_knots_1 = 200)Arguments
max_degree | interaction degree. |
smoothness_orders | see |
base_num_knots_0 | The base number of knots for zeroth-order smoothnessbasis functions. The number of knots by degree interaction decays as |
base_num_knots_1 | The base number of knots for 1 or greater ordersmoothness basis functions. The number of knots by degree interactiondecays as |
predict.SL.hal9001
Description
Predict method for objects of classSL.hal9001
Usage
## S3 method for class 'SL.hal9001'predict(object, newdata, ...)Arguments
object | A fitted object of class |
newdata | A matrix of new observations on which to obtain predictions. |
... | Not used. |
Value
Anumeric vector of predictions from aSL.hal9001object based on the providenewdata.
Prediction from HAL fits
Description
Prediction from HAL fits
Usage
## S3 method for class 'hal9001'predict( object, new_data, new_X_unpenalized = NULL, offset = NULL, type = c("response", "link"), ...)Arguments
object | An object of class |
new_data | A |
new_X_unpenalized | If the user supplied |
offset | A vector of offsets. Must be provided if provided at training. |
type | Either "response" for predictions of the response, or "link" forun-transformed predictions (on the scale of the link function). |
... | Additional arguments passed to |
Details
Method for computing and extracting predictions from fits of theHighly Adaptive Lasso estimator, returned as a single S3 objects of classhal9001.
Value
Anumeric vector of predictions from ahal9001 object.
Note
This prediction method does not function similarly to the equivalentmethod fromglmnet. In particular, this procedure will not return asubset of lambdas originally specified in callingfit_halnor result in re-fitting. Instead, it will return predictions for all ofthe lambdas specified in the call tofit_hal that constructsobject, whenfit_control'scv_select is set toFALSE. Whenfit_control'scv_select is set toTRUE, predictions will only be returned for the value of lambdaselected by cross-validation.
Print formula_hal9001 object
Description
Print formula_hal9001 object
Usage
## S3 method for class 'formula_hal9001'print(x, ...)Arguments
x | A formula_hal9001 object. |
... | Other arguments (ignored). |
Print Method for Summary Class of HAL fits
Description
Print Method for Summary Class of HAL fits
Usage
## S3 method for class 'summary.hal9001'print(x, length = NULL, ...)Arguments
x | An object of class |
length | The number of ranked coefficients to be summarized. |
... | Other arguments (ignored). |
Discretize Variables into Number of Bins by Unique Values
Description
Discretize Variables into Number of Bins by Unique Values
Usage
quantizer(X, bins)Arguments
X | A |
bins | A |
Squash HAL objects
Description
Reduce footprint by dropping basis functions with coefficients of zero
Usage
squash_hal_fit(object)Arguments
object | An object of class |
Value
Object of classhal9001, similar to the input object butreduced such that coefficients belonging to bases with coefficients equalto zero removed.
Examples
# generate simple test datan <- 100p <- 3x <- matrix(rnorm(n * p), n, p)y <- sin(x[, 1]) * sin(x[, 2]) + rnorm(n, mean = 0, sd = 0.2)# fit HAL model and squash resulting object to reduce footprinthal_fit <- fit_hal(X = x, Y = y, yolo = FALSE)squashed <- squash_hal_fit(hal_fit)Summary Method for HAL fit objects
Description
Summary Method for HAL fit objects
Usage
## S3 method for class 'hal9001'summary( object, lambda = NULL, only_nonzero_coefs = TRUE, include_redundant_terms = FALSE, round_cutoffs = 3, ...)Arguments
object | An object of class |
lambda | Optional |
only_nonzero_coefs | A |
include_redundant_terms | A |
round_cutoffs | An |
... | Additional arguments passed to |
Details
Method for summarizing the coefficients of the Highly AdaptiveLasso estimator in terms of the basis functions corresponding to covariatesand interactions of covariates, returned as a single S3 object of classhal9001.
Due to the nature of the basis function terms, the summary tables can beextremely wide. The R environment might not be the optimal location to viewthe summary. Tables can be exported from R to LaTeX withxtablepackage (or similar). Here's an example:print(xtable(summary(fit)$table, type = "latex"), file = "dt.tex").
Value
A list summarizing ahal9001 object's coefficients.