Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Best Orthogonalized Subset Selection (BOSS)
Version:0.2.0
Date:2021-3-6
Maintainer:Sen Tian <stian@stern.nyu.edu>
Description:Best Orthogonalized Subset Selection (BOSS) is a least-squares (LS) based subset selection method, that performs best subset selection upon an orthogonalized basis of ordered predictors, with the computational effort of a single ordinary LS fit. This package provides a highly optimized implementation of BOSS and estimates a heuristic degrees of freedom for BOSS, which can be plugged into an information criterion (IC) such as AICc in order to select the subset from candidates. It provides various choices of IC, including AIC, BIC, AICc, Cp and GCV. It also implements the forward stepwise selection (FS) with no additional computational cost, where the subset of FS is selected via cross-validation (CV). CV is also an option for BOSS. For details see: Tian, Hurvich and Simonoff (2021), "On the Use of Information Criteria for Subset Selection in Least Squares Regression", <doi:10.48550/arXiv.1911.10191>.
Depends:R (≥ 3.5.0)
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
Encoding:UTF-8
LazyData:true
Imports:glmnet, Matrix, Rcpp, stats
RoxygenNote:7.1.1
Suggests:devtools, ISLR, kableExtra, knitr, MASS, rmarkdown, sparsenet
VignetteBuilder:knitr
LinkingTo:Rcpp, RcppArmadillo
URL:https://github.com/sentian/BOSSreg
BugReports:https://github.com/sentian/BOSSreg/issues
NeedsCompilation:yes
Packaged:2021-03-06 18:56:57 UTC; sentian
Author:Sen Tian [aut, cre], Clifford Hurvich [aut], Jeffrey Simonoff [aut]
Repository:CRAN
Date/Publication:2021-03-06 19:20:02 UTC

Best Orthogonalized Subset Selection (BOSS).

Description

Usage

boss(  x,  y,  maxstep = min(nrow(x) - intercept - 1, ncol(x)),  intercept = TRUE,  hdf.ic.boss = TRUE,  mu = NULL,  sigma = NULL,  ...)

Arguments

x

A matrix of predictors, withnrow(x)=length(y)=n observations andncol(x)=p predictors. Intercept shall NOT be included.

y

A vector of response variable, withlength(y)=n.

maxstep

Maximum number of steps performed. Default ismin(n-1,p) ifintercept=FALSE,and it ismin(n-2, p) otherwise.

intercept

Logical, whether to include an intercept term. Default is TRUE.

hdf.ic.boss

Logical, whether to calculate the heuristic degrees of freedom (hdf)and information criteria (IC) for BOSS. IC includes AIC, BIC, AICc, BICc, GCV,Cp. Default is TRUE.

mu

True mean vector, used in the calculation of hdf. Default is NULL, and is estimated vialeast-squares (LS) regression of y upon x for n>p, and 10-fold CV cross-validated lasso estimate for n<=p.

sigma

True standard deviation of the error, used in the calculation of hdf. Default is NULL,and is estimated via least-squares (LS) regression of y upon x for n>p, and 10-fold cross-validated lassofor n<=p.

...

Extra parameters to allow flexibility. Currently none allows or requires, just forthe convinience of call from other parent functions like cv.boss.

Details

This function computes the full solution path given by BOSS and FS on a givendataset (x,y) with n observations and p predictors. It also calculatesthe heuristic degrees of freedom for BOSS, and various information criteria, which can furtherbe used to select the subset from the candidates. Please refer to the Vignettefor implementation details and Tian et al. (2021) for methodology details (links are given below).

Value

Author(s)

Sen Tian

References

See Also

predict andcoef methods for "boss" object, and thecv.boss function

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(n, sd=0.01), center = TRUE, scale = FALSE)## Fit the modelboss_result = boss(x, y)## Get the coefficient vector selected by AICc-hdf (S3 method for boss)beta_boss_aicc = coef(boss_result)# the above is equivalent to the followingbeta_boss_aicc = boss_result$beta_boss[, which.min(boss_result$IC_boss$aicc), drop=FALSE]## Get the fitted values of BOSS-AICc-hdf (S3 method for boss)mu_boss_aicc = predict(boss_result, newx=x)# the above is equivalent to the followingmu_boss_aicc = cbind(1,x) %*% beta_boss_aicc## Repeat the above process, but using Cp-hdf instead of AICc-hdf## coefficient vectorbeta_boss_cp = coef(boss_result, method.boss='cp')beta_boss_cp = boss_result$beta_boss[, which.min(boss_result$IC_boss$cp), drop=FALSE]## fitted values of BOSS-Cp-hdfmu_boss_cp = predict(boss_result, newx=x, method.boss='cp')mu_boss_cp = cbind(1,x) %*% beta_boss_cp

Calculate an information criterion.

Description

Calculate a specified information criterion (IC) for an estimate or a group of estimates.The choices of IC include AIC, BIC, AICc, BICc, GCV and Mallows' Cp.

Usage

calc.ic(  y_hat,  y,  ic = c("aicc", "bicc", "aic", "bic", "gcv", "cp"),  df,  sigma = NULL)

Arguments

y_hat

A vector of fitted values withlength(y_hat)=length(y)=n, ora matrix, withnrow(coef)=length(y)=n andncol(y_hat)=m, containing m different fits.

y

A vector of response variable, withlength(y)=n.

ic

A specified IC to calculate. Default is AICc ('aicc'). Other choices include AIC ('aic'),BIC ('bic'), BICc ('bicc'), GCV ('gcv') and Mallows' Cp ('cp').

df

A number if y_hat is a vector, or a vector withlength(df)=ncol(y_hat)=m if y_hat isa matrix. df represents the degrees of freedom for each fit.

sigma

Standard deviation of the error term. It only needs to be specified if the argumentic='cp'.

Details

This function enables the computation of various common IC for model fits, which canfurther be used to choose the optimal fit. This allows user comparing the effect of different IC.In order to calculate an IC, degrees of freedoms (df) needs to be specified. To be more specific,here are the formulas used to calculate each IC:

AIC = \log(\frac{RSS}{n}) + 2\frac{df}{n}

BIC = \log(\frac{RSS}{n}) + \log(n)\frac{df}{n}

AICc = \log(\frac{RSS}{n}) + 2\frac{df+1}{n-df-2}

BICc = \log(\frac{RSS}{n}) + \log(n)\frac{df+1}{n-df-2}

GCV = \frac{RSS}{(n-df)^2}

Mallows' Cp = RSS + 2\times \sigma^2 \times df

Value

The value(s) of the specified IC for each fit.

Author(s)

Sen Tian

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(20, sd=0.01), center = TRUE, scale = FALSE)## Fit the modelboss_result = boss(x, y)## Print the values of AICc-hdf for all subsets given by BOSSprint(boss_result$IC_boss$aicc)## calculate them manually using the calc.ic functiony_hat = cbind(rep(1,n),x)%*%boss_result$beta_bossprint(calc.ic(y_hat, y, df=boss_result$hdf_boss))

Select coefficient vector(s) for BOSS.

Description

This function returns the optimal coefficient vector of BOSS selected by AICc(by default) or other types of information criterion.

Usage

## S3 method for class 'boss'coef(  object,  ic = c("aicc", "bicc", "aic", "bic", "gcv", "cp"),  select.boss = NULL,  ...)

Arguments

object

The boss object, returned from calling theboss function.

ic

Which information criterion is used to select the optimal coefficient vector for BOSS.The default is AICc-hdf.

select.boss

The index (or indicies) of columns in the coefficient matrix for whichone wants to select. By default (NULL) it's selected by the information criterion specified in'ic'.

...

Extra arguments (unused for now)

Details

Ifselect.boss is specified, the function returnscorresponding column(s) in the coefficient matrix.

Ifselect.boss is unspecified, the function returns the optimal coefficientvector selected by AICc-hdf (other choice of IC can be specified in the argumentic).

Value

The chosen coefficient vector(s) for BOSS.

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(n, sd=0.01), center = TRUE, scale = FALSE)## Fit the modelboss_result = boss(x, y)## Get the coefficient vector selected by AICc-hdf (S3 method for boss)beta_boss_aicc = coef(boss_result)# the above is equivalent to the followingbeta_boss_aicc = boss_result$beta_boss[, which.min(boss_result$IC_boss$aicc), drop=FALSE]## Get the fitted values of BOSS-AICc-hdf (S3 method for boss)mu_boss_aicc = predict(boss_result, newx=x)# the above is equivalent to the followingmu_boss_aicc = cbind(1,x) %*% beta_boss_aicc## Repeat the above process, but using Cp-hdf instead of AICc-hdf## coefficient vectorbeta_boss_cp = coef(boss_result, method.boss='cp')beta_boss_cp = boss_result$beta_boss[, which.min(boss_result$IC_boss$cp), drop=FALSE]## fitted values of BOSS-Cp-hdfmu_boss_cp = predict(boss_result, newx=x, method.boss='cp')mu_boss_cp = cbind(1,x) %*% beta_boss_cp

Select coefficient vector based on cross-validation for BOSS or FS.

Description

This function returns coefficient vector that minimizes out-of-sample (OOS) crossvalidation score.

Usage

## S3 method for class 'cv.boss'coef(object, method = c("boss", "fs"), ...)

Arguments

object

The cv.boss object, returned from callingcv.boss function.

method

It can either be 'fs' or 'boss'. The default is 'boss'.

...

Extra arguments (unused for now).

Value

The chosen coefficient vector for BOSS or FS.

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(20, sd=0.01), center = TRUE, scale = FALSE)## Perform 10-fold CV without replicationboss_cv_result = cv.boss(x, y)## Get the coefficient vector of BOSS that gives minimum CV OSS score (S3 method for cv.boss)beta_boss_cv = coef(boss_cv_result)# the above is equivalent toboss_result = boss_cv_result$bossbeta_boss_cv = boss_result$beta_boss[, boss_cv_result$i.min.boss, drop=FALSE]## Get the fitted values of BOSS-CV (S3 method for cv.boss)mu_boss_cv = predict(boss_cv_result, newx=x)# the above is equivalent tomu_boss_cv = cbind(1,x) %*% beta_boss_cv## Get the coefficient vector of FS that gives minimum CV OSS score (S3 method for cv.boss)beta_fs_cv = coef(boss_cv_result, method='fs')## Get the fitted values of FS-CV (S3 method for cv.boss)mu_fs_cv = predict(boss_cv_result, newx=x, method='fs')

Cross-validation for Best Orthogonalized Subset Selection (BOSS) and Forward Stepwise Selection (FS).

Description

Cross-validation for Best Orthogonalized Subset Selection (BOSS) and Forward Stepwise Selection (FS).

Usage

cv.boss(  x,  y,  maxstep = min(nrow(x) - intercept - 1, ncol(x)),  intercept = TRUE,  n.folds = 10,  n.rep = 1,  show.warning = TRUE,  ...)

Arguments

x

A matrix of predictors, seeboss.

y

A vector of response variable, seeboss.

maxstep

Maximum number of steps performed. Default ismin(n-1,p) ifintercept=FALSE,and it ismin(n-2, p) otherwise.

intercept

Logical, whether to fit an intercept term. Default is TRUE.

n.folds

The number of cross validation folds. Default is 10.

n.rep

The number of replications of cross validation. Default is 1.

show.warning

Whether to display a warning if CV is only performed for a subset of candidates.e.g. when n<p and 10-fold. Default is TRUE.

...

Arguments toboss, such ashdf.ic.boss.

Details

This function fits BOSS and FS (boss) on the full dataset, and performsn.foldscross-validation. The cross-validation process can be repeatedn.rep times to evaluate theout-of-sample (OOS) performance for the candidate subsets given by both methods.

Value

Author(s)

Sen Tian

References

See Also

predict andcoef methods forcv.boss object, and theboss function

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(20, sd=0.01), center = TRUE, scale = FALSE)## Perform 10-fold CV without replicationboss_cv_result = cv.boss(x, y)## Get the coefficient vector of BOSS that gives minimum CV OSS score (S3 method for cv.boss)beta_boss_cv = coef(boss_cv_result)# the above is equivalent toboss_result = boss_cv_result$bossbeta_boss_cv = boss_result$beta_boss[, boss_cv_result$i.min.boss, drop=FALSE]## Get the fitted values of BOSS-CV (S3 method for cv.boss)mu_boss_cv = predict(boss_cv_result, newx=x)# the above is equivalent tomu_boss_cv = cbind(1,x) %*% beta_boss_cv## Get the coefficient vector of FS that gives minimum CV OSS score (S3 method for cv.boss)beta_fs_cv = coef(boss_cv_result, method='fs')## Get the fitted values of FS-CV (S3 method for cv.boss)mu_fs_cv = predict(boss_cv_result, newx=x, method='fs')

Prediction given new data entries.

Description

This function returns the prediction(s) given new observation(s), for BOSS,where the optimal coefficient vector is chosen via certain selection rule.

Usage

## S3 method for class 'boss'predict(object, newx, ...)

Arguments

object

The boss object, returned from calling 'boss' function.

newx

A new data entry or several entries. It can be a vector, or a matrix withnrow(newx) being the number of new entries andncol(newx)=p being thenumber of predictors. The function takes care of the intercept, NO need to add1tonewx.

...

Extra arguments to be plugged intocoef, such asselect.boss,see the description ofcoef.boss for more details.

Details

The function basically calculatesx * coef, wherecoefis a coefficient vector chosen by a selection rule. See more details about the defaultand available choices of the selection rule in the description ofcoef.boss.

Value

The prediction(s) for BOSS.

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(n, sd=0.01), center = TRUE, scale = FALSE)## Fit the modelboss_result = boss(x, y)## Get the coefficient vector selected by AICc-hdf (S3 method for boss)beta_boss_aicc = coef(boss_result)# the above is equivalent to the followingbeta_boss_aicc = boss_result$beta_boss[, which.min(boss_result$IC_boss$aicc), drop=FALSE]## Get the fitted values of BOSS-AICc-hdf (S3 method for boss)mu_boss_aicc = predict(boss_result, newx=x)# the above is equivalent to the followingmu_boss_aicc = cbind(1,x) %*% beta_boss_aicc## Repeat the above process, but using Cp-hdf instead of AICc-hdf## coefficient vectorbeta_boss_cp = coef(boss_result, method.boss='cp')beta_boss_cp = boss_result$beta_boss[, which.min(boss_result$IC_boss$cp), drop=FALSE]## fitted values of BOSS-Cp-hdfmu_boss_cp = predict(boss_result, newx=x, method.boss='cp')mu_boss_cp = cbind(1,x) %*% beta_boss_cp

Prediction given new data entries.

Description

This function returns the prediction(s) given new observation(s) for BOSS or FS,where the optimal coefficient vector is chosen via cross-validation.

Usage

## S3 method for class 'cv.boss'predict(object, newx, ...)

Arguments

object

The cv.boss object, returned from callingcv.boss function.

newx

A new data entry or several entries. It can be a vector, or a matrix withnrow(newx) being the number of new entries andncol(newx)=p being thenumber of predictors. The function takes care of the intercept, NO need to add1tonewx.

...

Extra arguments to be plugged intocoef, such asmethod,see the description ofcoef.cv.boss for more details.

Value

The prediction for BOSS or FS.

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0set.seed(11)n = 20p = 5x = matrix(rnorm(n*p), nrow=n, ncol=p)x = scale(x, center = colMeans(x))x = scale(x, scale = sqrt(colSums(x^2)))beta = c(1, 1, 0, 0, 0)y = x%*%beta + scale(rnorm(20, sd=0.01), center = TRUE, scale = FALSE)## Perform 10-fold CV without replicationboss_cv_result = cv.boss(x, y)## Get the coefficient vector of BOSS that gives minimum CV OSS score (S3 method for cv.boss)beta_boss_cv = coef(boss_cv_result)# the above is equivalent toboss_result = boss_cv_result$bossbeta_boss_cv = boss_result$beta_boss[, boss_cv_result$i.min.boss, drop=FALSE]## Get the fitted values of BOSS-CV (S3 method for cv.boss)mu_boss_cv = predict(boss_cv_result, newx=x)# the above is equivalent tomu_boss_cv = cbind(1,x) %*% beta_boss_cv## Get the coefficient vector of FS that gives minimum CV OSS score (S3 method for cv.boss)beta_fs_cv = coef(boss_cv_result, method='fs')## Get the fitted values of FS-CV (S3 method for cv.boss)mu_fs_cv = predict(boss_cv_result, newx=x, method='fs')

[8]ページ先頭

©2009-2025 Movatter.jp