Movatterモバイル変換

tlars

CRAN Downloads Total

Title: The T-LARS Algorithm: Early-TerminatedForward Variable Selection

Description: It computes the solution path of theTerminating-LARS (T-LARS) algorithm. The T-LARS algorithm appends dummypredictors to the original predictor matrix and terminates theforward-selection process after a pre-defined number of dummy variableshas been selected.

Paper: The package is based on the paper

J. Machkour, M. Muma, and D. P. Palomar, “The terminating-randomexperiments selector: Fast high-dimensional variable selection withfalse discovery rate control,” arXiv preprint arXiv:2110.06048, 2022.(https://doi.org/10.48550/arXiv.2110.06048)

Note: The T-LARS algorithm is a major building blockof the T-Rex selector (Paper andR package).The T-Rex selector performs terminated-random experiments (T-Rex) usingthe T-LARS algorithm and fuses the selected active sets of all randomexperiments to obtain a final set of selected variables. The T-Rexselector provably controls the false discovery rate (FDR), i.e., theexpected fraction of selected false positives among all selectedvariables, at the user-defined target level while maximizing the numberof selected variables.

In the following, we show how to use the package and give you an ideaof why terminating the solution path early is a reasonable approach inhigh-dimensional and sparse variable selection: In many applications,most active variables enter the solution path early!

Installation

You can install the ‘tlars’ package (stable version) fromCRAN with

install.packages("tlars")library(tlars)

You can install the ‘tlars’ package (developer version) fromGitHub with

install.packages("devtools")devtools::install_github("jasinmachkour/tlars")

You can open the help pages with

library(tlars)help(package ="tlars")?tlars?tlars_model?tlars_cpp?plot.Rcpp_tlars_cpp?print.Rcpp_tlars_cpp?Gauss_data

To cite the package ‘tlars’ in publications use:

citation("tlars")

Quick Start

In the following, we illustrate the basic usage of the ‘tlars’package to perform variable selection in sparse and high-dimensionalregression settings using the T-LARS algorithm.

First, we generate a high-dimensional Gaussian dataset with sparse support:

library(tlars)# Setupn<-75# Number of observationsp<-150# Number of variablesnum_act<-3# Number of true active variablesbeta<-c(rep(1,times = num_act),rep(0,times = p- num_act))# Coefficient vectortrue_actives<-which(beta>0)# Indices of true active variablesnum_dummies<- p# Number of dummy predictors (or dummies)# Generate Gaussian dataset.seed(123)X<-matrix(stats::rnorm(n* p),nrow = n,ncol = p)y<- X%*% beta+ stats::rnorm(n)

Second, we generate a dummy matrix containing nrows and num_dummies dummy predictors that are sampled from the standardnormal distribution and append it to the original predictor matrix:

set.seed(1234)dummies<-matrix(stats::rnorm(n* num_dummies),nrow = n,ncol = num_dummies)XD<-cbind(X, dummies)

Third, we generate an object of the C++ class‘tlars_cpp’ and supply the information that the last num_dummiespredictors in XD are dummy predictors:

mod_tlars<-tlars_model(X = XD,y = y,num_dummies = num_dummies)#> Created an object of class tlars_cpp...#>       The first p = 150 predictors in 'XD' are the original predictors and#>       the last num_dummies = 150 predictors are dummies

Finally, we perform three T-LARS steps on‘mod_tlars’, i.e., the T-LARS algorithm is run untilT_stop =3 dummies have entered the solution path and stops there. Forcomparison, we also compute the whole solution path by settingearly_stop = FALSE:

4.1. Perform three T-LARS steps on object ‘mod_tlars’:

tlars(model = mod_tlars,T_stop =3,early_stop =TRUE)# Perform three T-LARS steps on object "mod_tlars"#> Executing T-LARS step by reference...#>       Finished T-LARS step(s)...#>           - The results are stored in the C++ object 'mod_tlars'.#>           - New value of T_stop: 3.#>           - Time elapsed: 0.001 sec.print(mod_tlars)# Print information about the results of the performed T-LARS steps#> 'mod_tlars' is a C++ object of class 'tlars_cpp' ...#>   - Number of dummies: 150.#>   - Number of included dummies: 3.#>   - Selected variables: 2, 3, 1, 148, 130, 89, 82, 12, 16, 132, 147, 123.plot(mod_tlars)# Plot the terminated solution path

4.2. Compute the whole solution path:

tlars(model = mod_tlars,early_stop =FALSE)# Compute the whole solution path#> 'T_stop' is ignored. Computing the entire solution path...#> Executing T-LARS step by reference...#>       Finished T-LARS step(s). No early stopping!#>           - The results are stored in the C++ object 'mod_tlars'.#>           - Time elapsed: 0.004 sec.print(mod_tlars)# Print information about the results#> 'mod_tlars' is a C++ object of class 'tlars_cpp' ...#>   - Number of dummies: 150.#>   - Number of included dummies: 30.#>   - Selected variables: 2, 3, 1, 148, 130, 89, 82, 12, 16, 132, 147, 123, 122, 65, 47, 107, 79, 54, 39, 62, 46, 116, 86, 102, 81, 129, 24, 41, 26, 83, 20, 28, 72, 63, 60, 94, 135, 6, 27, 7, 66, 98, 112, 111, 74.plot(mod_tlars)# Plot the whole solution path

Outlook

The T-LARS algorithm is a major building block of the T-Rex selector(Paper andR package).The T-Rex selector performs terminated-random experiments (T-Rex) usingthe T-LARS algorithm and fuses the selected active sets of all randomexperiments to obtain a final set of selected variables. The T-Rexselector provably controls the FDR at the user-defined target levelwhile maximizing the number of selected variables. If you are working ingenomics, financial engineering, or any other field that requires a fastand FDR-controlling variable/feature selection method for large-scalehigh-dimensional settings, then this is for you. Check it out!

Documentation

For more information and some examples, please check theGitHub-vignette.

Links

tlars package (stable version):CRAN-tlars.

tlars package (developer version):GitHub-tlars.

README file:GitHub-readme.

Vignette:GitHub-vignette.

TRexSelector package:CRAN-TRexSelector.

T-Rex paper:https://arxiv.org/abs/2110.06048

[8]ページ先頭