Movatterモバイル変換


[0]ホーム

URL:


A quick tour through UBayFS

Anna Jenul, Stefan Schrunner

2023-03-07

Introduction

The UBayFS package implements the framework proposed in(Jenul et al. 2022), together with aninteractive Shiny dashboard, which makes UBayFS applicable to R-userswith different levels of expertise. UBayFS is an ensemble featureselection technique embedded in a Bayesian statistical framework. Themethod combines data and user knowledge, where the first is extractedvia data-driven ensemble feature selection. The user can control thefeature selection by assigning prior weights to features and penalizingspecific feature combinations. In particular, the user can define amaximum number of selected features and must-link constraints (featuresmust be selected together) or cannot-link constraints (features must notbe selected together). A parameter\(\rho\) regulates the shape of a penaltyterm accounting for side constraints, where feature sets that violateconstraints lead to a lower target value.

In this vignette, we use the Breast Cancer Wisconsin dataset(Wolberg and Mangasarian 1990) fordemonstration. Specifically, the dataset consists of 569 samples and 30features and can be downloaded as a demo dataset by callingdata(bcw). The dataset describes a classification problem,where the aim is to distinguish between malignant and benign cancerbased on image data. Features are derived from 10 image characteristics,where each characteristic is represented by three features (summarystatistics) in the dataset. For instance, the characteristicradius is represented by featuresradius mean,radius standard deviation, andradius worst.

UBayFS is implemented via a core S3-classUBaymodel,along with help functions. An overview of the ‘UBaymodel’ class and itsmain generic functions is shown in the following diagram:

Requirements and dependencies

In addition, some functionality of the package (in particular, theinteractive Shiny interface) requires the following dependencies:

Like other R packages, UBayFS is loaded using thelibrary(UBayFS) command. The sample dataset is accessed viadata(bcw).

library(UBayFS)data(bcw)

Background

This section summarizes the core parts of UBayFS, where a centralpart is Bayes’ Theorem for two random variables\(\boldsymbol{\theta}\) and\(\boldsymbol{y}\):\[p(\boldsymbol{\theta}|\boldsymbol{y})\proptop(\boldsymbol{y}|\boldsymbol{\theta})\cdotp(\boldsymbol{\theta}),\] where\(\boldsymbol{\theta}\) represents animportance parameter of single features and\(\boldsymbol{y}\) collects evidence about\(\boldsymbol{\theta}\) from anensemble of elementary feature selectors. In the following, the conceptwill be outlined.

Ensemble feature selection as likelihood

The first step in UBayFS is to build\(M\) ensembles of elementary featureselectors. Each elementary feature selector\(m=1,\dots,M\) selects features, denoted bya binary membership vector\(\boldsymbol{\delta}^{(m)} \in \{0,1\}^N\),based on a randomly selected training dataset, where\(N\) denotes the total number of features inthe dataset. In the binary membership vector\(\boldsymbol{\delta}^{(m)}\), a component\(\delta_i^{(m)}=1\) indicates thatfeature\(i\in\{1,\dots,N\}\) isselected, and\(\delta_i^{(m)}=0\)otherwise. Statistically, we interpret the result from each elementaryfeature selector as a realization from a multinomial distribution withparameters\(\boldsymbol{\theta}\) and\(l\), where\(\boldsymbol{\theta}\in[0,1]^N\) defines thesuccess probabilities of sampling each feature in an individual featureselection and\(l\) corresponds to thenumber of features selected in\(\boldsymbol{\delta}^{(m)}\). Therefore, thejoint probability density of the observed data\(\boldsymbol{y} =\sum\limits_{m=1}^{M}\boldsymbol{\delta}^{(m)}\in\{0,\dots,M\}^N\)— the likelihood function — has the form\[p(\boldsymbol{y}|\boldsymbol{\theta}) = \prod\limits_{m=1}^{M}f_{\text{mult}}(\boldsymbol{\delta}^{(m)};\boldsymbol{\theta},l),\]where\(f_{\text{mult}}\) is theprobability density function of the multinomial distribution.

Expert knowledge as prior

UBayFS includes two types of expert knowledge: prior feature weightsand feature set constraints.

Prior feature weights

To introduce expert knowledge about the importance of features, theuser may define a vector\(\boldsymbol{\alpha}= (\alpha_1,\dots,\alpha_N)\),\(\alpha_i>0\) for all\(i=1,\dots,N\), assigning a weight to eachfeature. High weights indicate that a feature is important. By default,if all features are equally important or no prior weighting is used,\(\boldsymbol{\alpha}\) is set to the1-vector of length\(N\). With theweighting in place, we assume the a-priori feature importance parameter\(\boldsymbol{\theta}\) follows aDirichlet distribution(Maier 2020)\[p(\boldsymbol{\theta}) =f_{\text{Dir}}(\boldsymbol{\theta};\boldsymbol{\alpha}),\] wherethe probability density function of the Dirichlet distribution is givenas\[f_{\text{Dir}}(\boldsymbol{\theta};\boldsymbol{\alpha})= \frac{1}{\text{B}(\boldsymbol{\alpha})} \prod\limits_{n=1}^N\theta_n^{\alpha_n-1},\] where\(\text{B}(.)\) denotes the multivariate Betafunction. Generalizations of the Dirichlet distributionHankin (2010) are also implemented inUBayFS.

Since the Dirichlet distribution is the conjugate prior with respectto a multivariate likelihood, the posterior density is given as\[p(\boldsymbol{\theta}|\boldsymbol{y}) \proptof_{\text{Dir}}(\boldsymbol{\theta};\boldsymbol{\alpha}^\circ),\]with\[\boldsymbol{\alpha}^\circ = \left(\alpha_1 + \sum\limits_{m=1}^M \delta_1^{(m)}, \dots, \alpha_N +\sum\limits_{m=1}^M \delta_N^{(m)} \right)\] representing theposterior parameter vector\(\boldsymbol{\alpha}^\circ\).

Feature set constraints

In addition to the prior weighting of features, the UBayFS user canalso add different types of constraints to the feature selection:

  • max-size constraint: Maximum number of features that shallbe selected.
  • must-link constraint: For a pair of features, either bothor none is selected (defined as pairwise constraints, one for each pairof features).
  • cannot-link constraint: Used if a pair of features must notbe selected jointly.

All constraints can be definedblock-wise between featureblocks (instead of individual features). Constraints are represented asa linear system of linear inequalities\(\boldsymbol{A}\boldsymbol{\delta}-\boldsymbol{b}\leq\boldsymbol{0}\), where\(\boldsymbol{A}\in\mathbb{R}^{K\times N}\)and\(\boldsymbol{b}\in\mathbb{R}^K\).\(K\) denotes the total number ofconstraints. For constraint\(k \in1,..,K\), a feature set\(\boldsymbol{\delta}\) is admissible only if\(\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta}- b^{(k)} \leq 0\), leading to the inadmissibility function(penalty term)

\[\begin{align}\kappa_{k,\rho}(\boldsymbol{\delta}) = \left\{ \begin{array}{l l} 0 &\text{if}~\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta}\leqb^{(k)}\\ 1 & \text{if}~\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta}> b^{(k)} \land\rho =\infty\\ \frac{1-\xi_{k,\rho}}{1 + \xi_{k,\rho}} & \text{otherwise}, \end{array} \right.\end{align}\]

where\(\rho\in\mathbb{R}^+ \cup\{\infty\}\) denotes a relaxation parameter and\(\xi_{k,\rho} = \exp\left(-\rho \left(\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta} -b^{(k)}\right)\right)\) defines the exponential term of alogistic function. To handle\(K\)different constraints for one feature selection problem, the jointinadmissibility function is given as\[\kappa(\boldsymbol{\delta}) = 1 - \prod\limits_{k=1}^{K} \left(1-\kappa_{k,\rho}(\boldsymbol{\delta})\right)\] which originatesfrom the idea that\(\kappa = 1\)(maximum penalization) if at least one\(\kappa_{k,\rho}=1\), while\(\kappa=0\) (no penalization) if all\(\kappa_{k,\rho}=0\).

To obtain an optimal feature set\(\boldsymbol{\delta}^\star\), we use atarget function\(U(\boldsymbol{\delta},\boldsymbol{\theta})\) which represents a posterior expectedutility of feature sets\(\boldsymbol{\delta}\) given the posteriorfeature importance parameter\(\boldsymbol{\theta}\), regularized by theinadmissibility function\(\kappa(.)\).

\[\mathbb{E}_{\boldsymbol{\theta}|\boldsymbol{y}}[U(\boldsymbol{\delta},\boldsymbol{\theta}(\boldsymbol{y}))] = \boldsymbol{\delta}^T\mathbb{E}_{\boldsymbol{\boldsymbol{\delta}}|\boldsymbol{y}}[\boldsymbol{\theta}(\boldsymbol{y})]-\lambda\kappa(\boldsymbol{\delta})\longrightarrow\underset{\boldsymbol{\delta}\in\{0,1\}^N}{\text{arg max}}\]

Since an exact optimization is impossible due to the non-linearfunction\(\kappa\), we use a geneticalgorithm to find an appropriate feature set. In detail, the geneticalgorithm is initialized via a Greedy algorithm and computescombinations of the given feature sets with regard to a fitness functionin each iteration.

Application of UBayFS

Ensemble Training

The functionbuild.UBaymodel() initializes the UBayFSmodel and trains an ensemble of elementary feature selectors. Thetraining dataset and target are initialized with the argumentsdata andtarget. Although the UBayFS conceptpermits unsupervised, multiclass, or regression setups, the currentimplementation supports binary target variables only. WhileM defines the ensemble size (number of elementary featureselectors), the types of the elementary feature selectors is set viamethod. Three different feature selectors (mRMR, Fisherschore and Laplace score) are implemented as baseline. In general, themethod argument allows for each self-implemented featureselection function with the argumentsX (describes thedata),y (describes the target),n (describesthe number of features that shall be selected), andname(name of the method). The function must return the indices of theselected features and the input name. An example with classificationtrees is shown below. Each ensemble model is trained on a random subsetcomprisingtt_split\(\cdot100\) percent of the train data. The help functionbuildConstraints() provides an easy way to define sideconstraints for the model. Using the argumentprior_modelthe user specifies whether the standard Dirichlet distribution or ageneralized variant should be used as prior model. Furthermore, thenumber of features selected in each ensemble can be controlled by theparameternr_features.

For the standard UBayFS initialization, all prior feature weights areset to 1, and no feature constraints are included yet. Thesummary() function provides an overview of the dataset, theprior weights, and the likelihood — ensemble counts indicate how often afeature was selected over the ensemble feature selections.

model=build.UBaymodel(data = bcw$data,target = bcw$labels,M =100,tt_split =0.75,nr_features =10,method ='mRMR',prior_model ='dirichlet',weights =0.01,lambda =1,constraints =buildConstraints(constraint_types =c('max_size'),constraint_vars =list(3),num_elements =dim(bcw$data)[2],rho =1),optim_method ='GA',popsize =100,maxiter =100,shiny =FALSE                        )summary(model)#>  UBayFS model summary#>   data:  569x30#>   labels:  B: 357 M: 212#>#>   === constraints ===#>   - - - - - - - - - -  group  1   - - - - - - - - - -#>  constraint 1: (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) x <= 3; rho = 1#>#>   === prior weights ===#>   weights: ( 0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01 )#>#>   === likelihood ===#>   ensemble counts: ( 0,16,75,0,0,0,100,100,0,0,0,0,3,100,0,0,4,0,0,0,100,83,100,100,1,18,100,100,0,0 )#>#>   === feature selection results ===#>  no output produced yet

The prior constraints are shown as a linear inequation systemtogether with the penalty term\(\rho\). Further, the current prior weightand the ensemble feature counts (likelihood) for each feature areprinted. As the model is not trained yet, the final feature selectionresult is empty.

In addition tomRMR, we add a functiondecision_tree() that computes features based on decisiontree importances.

library(rpart)decision_tree<-function(X, y, n,name ='tree'){  rf_data=as.data.frame(cbind(y, X))colnames(rf_data)<-make.names(colnames(rf_data))  tree= rpart::rpart(y~.,data = rf_data)return(list(ranks=which(colnames(X)%in%names(tree$variable.importance)[1:n]),name = name))}model=build.UBaymodel(data = bcw$data,target = bcw$labels,M =100,tt_split =0.75,nr_features =10,method =c('mRMR', decision_tree),prior_model ='dirichlet',weights =0.01,lambda =1,constraints =buildConstraints(constraint_types =c('max_size'),constraint_vars =list(3),num_elements =dim(bcw$data)[2],rho =1),optim_method ='GA',popsize =100,maxiter =100,shiny =FALSE                        )

Examples for more feature selection methods are:

# recursive feature eliminationlibrary(caret)rec_fe<-function(X,y,n,name='rfe'){if(is.factor(y)){        control<-rfeControl(functions=rfFuncs,method ='cv',number =2)      }else{        control<-rfeControl(functions=lmFuncs,method ='cv',number =2)      }      results<- caret::rfe(X, y,sizes = n,rfeControl=control)return(list(ranks =which(colnames(X)%in% results$optVariables),name = name))}# Lassolibrary(glmnet)lasso<-function(X, y,n=NULL,name='lasso'){  family=ifelse(is.factor(y),'binomial','gaussian')  cv.lasso<-cv.glmnet(as.matrix(X), y,intercept =FALSE,alpha =1,family = family,nfolds=3)  model<-glmnet(as.matrix(X), y,intercept =FALSE,alpha =1,family = family,lambda = cv.lasso$lambda.min)return(list(ranks =which(as.vector(model$beta)!=0),name = name))}# HSIC Lassolibrary(GSelection)hsic_lasso<-function(X, y, n,name='hsic'){ifelse(is.factor(y), {tl=as.numeric(as.integer(y)-1)}, {tl= y})  results=feature.selection(X, tl, n)return(list(ranks = results$hsic_selected_feature_index,name = name))}

User knowledge

Using the functionsetWeights() the user is able tochange the feature weights from the standard initialization to desiredvalues. In our example, we assign equal weights to features originatingfrom the same image characteristic. Weights can be on an arbitraryscale. As it is difficult to specify prior weights in real-lifeapplications, we suggest to define them on a normalized scale.

weights=rep(c(10,15,20,16,15,10,12,17,21,14),3)strength=1weights= weights* strength/sum(weights)print(weights)#>  [1] 0.02222222 0.03333333 0.04444444 0.03555556 0.03333333 0.02222222 0.02666667 0.03777778 0.04666667 0.03111111 0.02222222 0.03333333 0.04444444 0.03555556 0.03333333 0.02222222 0.02666667 0.03777778 0.04666667 0.03111111 0.02222222 0.03333333 0.04444444 0.03555556 0.03333333 0.02222222 0.02666667 0.03777778 0.04666667 0.03111111model=setWeights(model = model,weights = weights)

In addition to prior weights, feature set constraints may bespecified. Internally, constraints are implemented via an S3-classUBayconstraint, depicted in the following diagram:

Rather than calling the constructor method directly, the helpfunctionbuildConstraints() may be used to facilitate thedefinition of a set of constraints: the inputconstraint_types consists of a vector, where all constrainttypes are defined. Then, withconstraint_vars, the userspecifies details about the constraint: for max-size, the number offeatures to select is provided, while for must-link and cannot-link, theset of feature indices to be linked must be provided. Each list entrycorresponds to one constraint inconstraint_types. Inaddition,num_features denotes the total number of featuresin the dataset (or the total number of blocks if the constraint isblock-wise) andrho corresponds to the relaxation parameterof the admissibility function. For block constraints, information aboutthe block structure is included either withblock_listorblock_matrix - if both arguments areNULL,feature-wise constraints are generated.

Applyingprint(constraints) demonstrates that, thematrixA has ten rows to represent four constraints. Whilemax-size andcannot-link can be expressed in oneequation each,must-link is a pairwise constraint. In specific,themust-link constraint between\(n\) features produces\(\frac{n!}{(n-2)!}\) elementary constraints.Hence, six equations represent themust-link constraint. ThefunctionsetConstraints() integrates the constraints intothe UBayFS model.

constraints=buildConstraints(constraint_types =c('max_size','must_link',rep('cannot_link',2)),constraint_vars =list(10,# max-size (maximal 10 features)c(1,11,21),# must-link between features 1, 11, and 21c(1,10),# cannot-link between features 1, and 10c(20,23,24)),# cannot-link between features 20, 23, and 24num_elements =ncol(model$data),rho =c(Inf,# max_size0.1,# rho for must-link1,# rho for first cannot-link1))# rho for second cannot-linkprint(constraints)#>  A#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]#>  [1,]    1    1    1    1    1    1    1    1    1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1#>  [2,]   -1    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [3,]   -1    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0#>  [4,]    1    0    0    0    0    0    0    0    0     0    -1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [5,]    0    0    0    0    0    0    0    0    0     0    -1     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0#>  [6,]    1    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0    -1     0     0     0     0     0     0     0     0     0#>  [7,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     0     0     0     0    -1     0     0     0     0     0     0     0     0     0#>  [8,]    1    0    0    0    0    0    0    0    0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [9,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     1     0     0     1     1     0     0     0     0     0     0#>  b#> [1] 10  0  0  0  0  0  0  1  1#>  rho#> [1] Inf 0.1 0.1 0.1 0.1 0.1 0.1 1.0 1.0#>  block_matrix#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]#>  [1,]    1    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [2,]    0    1    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [3,]    0    0    1    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [5,]    0    0    0    0    1    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [6,]    0    0    0    0    0    1    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [7,]    0    0    0    0    0    0    1    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [8,]    0    0    0    0    0    0    0    1    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [9,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [10,]    0    0    0    0    0    0    0    0    0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [11,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [12,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [13,]    0    0    0    0    0    0    0    0    0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [14,]    0    0    0    0    0    0    0    0    0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [15,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [16,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [17,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0#> [18,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0#> [19,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0#> [20,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0#> [21,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0#> [22,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0#> [23,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0#> [24,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0#> [25,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0#> [26,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0#> [27,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0#> [28,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0#> [29,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0#> [30,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1model=setConstraints(model = model,constraints = constraints)

Optimization and evaluation

A genetic algorithm, described by(Givens andHoeting 2012) and implemented in(Scrucca2013), searches for the optimal feature set in the UBayFSframework. UsingsetOptim() we initialize the geneticalgorithm. Furthermore,popsize indicates the number ofcandidate feature sets created in each iteration, andmaxiter is the number of iterations.

model=setOptim(model = model,popsize =100,maxiter =200)

At this point, we have initialized prior weights, constraints, andthe optimization procedure — we can now train the UBayFS model using thegeneric functiontrain(), relying on a genetic algorithm.Thesummary() function provides an overview of allcomponents of UBayFS. Theplot() function shows the priorfeature information as bar charts, with the selected features markedwith red borders. In addition, the constraints and the regularizationparameter\(\rho\) are presented.

model= UBayFS::train(x = model)#> Running Genetic Algorithmsummary(model)#>  UBayFS model summary#>   data:  569x30#>   labels:  B: 357 M: 212#>#>   === constraints ===#>   - - - - - - - - - -  group  1   - - - - - - - - - -#>  constraint 1: (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) x <= 10; rho = Inf#>  constraint 2: (-1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 3: (-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 4: (1,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 5: (0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 6: (1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 7: (0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 8: (1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) x <= 1; rho = 1#>  constraint 9: (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0) x <= 1; rho = 1#>#>   === prior weights ===#>   weights: ( 0.0222222222222222,0.0333333333333333,0.0444444444444444,0.0355555555555556,0.0333333333333333,0.0222222222222222,0.0266666666666667,0.0377777777777778,0.0466666666666667,0.0311111111111111,0.0222222222222222,0.0333333333333333,0.0444444444444444,0.0355555555555556,0.0333333333333333,0.0222222222222222,0.0266666666666667,0.0377777777777778,0.0466666666666667,0.0311111111111111,0.0222222222222222,0.0333333333333333,0.0444444444444444,0.0355555555555556,0.0333333333333333,0.0222222222222222,0.0266666666666667,0.0377777777777778,0.0466666666666667,0.0311111111111111 )#>#>   === likelihood ===#>   ensemble counts: ( 0,16,75,0,0,0,100,100,0,0,0,0,3,100,0,0,4,0,0,0,100,83,100,100,1,18,100,100,0,0 )#>#>   === feature selection results ===#>   ( 2,3,7,8,14,22,23,26,27,28 )plot(model)

After training the model, we receive a feature selection result. Morethan one optimal feature set with the same MAP score is possible. Theplot shows the selected features (red framed) and their selectiondistribution between ensemble feature selection and prior weights. Theconstraints are shown at the top, where a connecting line is drawnbetween features of one constraint. The final feature set and itsadditional properties can be evaluated withevaluateFS():

# evaluation feature setevaluateMultiple(state = model$output$feature_set,model = model)#>                                  [,1]#> cardinality                    10.000#> log total utility              -0.234#> log posterior feature utility  -0.234#> log admissibility               0.000#> number of violated constraints  0.000#> avg feature correlation         0.614

The output contains the following information:

Shiny dashboard

UBayFS provides an interactiveR Shinydashboard as GUI. With its intuitive user interface, the user can loaddata, set likelihood parameters, and even control the admissibilityregularization strength of each constraint. With the commandrunInteractive(), the Shiny dashboard opens, given that therequired depedencies are available (see above). Histograms and plotshelp to get an overview of the user’s settings. The interactivedashboard offerssave andload buttons to saveor load UBayFS models as RData files. Due to computational limitations,it is not recommended to use the HTML interface for larger datasets(\(> 100\) features or\(>1000\) samples).

runInteractive()

The dashboard includes multiple tabs:

Conclusion

With the methodology in place, UBayFS is applicable to a large rangeof feature selection problems with multiple sources of information. Thelikelihood parameters, steering the number of elementary models, mainlyaffect the stability and runtime of the result — the latter linearlyincreases with the number of models. Especially the Shiny dashboarddelivers insight into the single UBayFS steps. Nevertheless, thedashboard only applies to smaller datasets, while larger ones should becomputed in the console.

References

Givens, G. H., and J. A. Hoeting. 2012.ComputationalStatistics. Vol. 703. John Wiley & Sons.
Hankin, Robin K. S. 2010.A Generalizationof the Dirichlet Distribution.”Journal ofStatistical Software 33 (11): 1–18.
Jenul, Anna, Stefan Schrunner, Jürgen Pilz, and Oliver Tomic. 2022.“A User-Guided Bayesian Framework for Ensemble Feature Selectionin Life Science Applications (UBayFS).”MachineLearning, 1–27.
Maier, M. J. 2020.DirichletReg: Dirichlet Regression in r.https://dirichletreg.r-forge.r-project.org/.
Scrucca, L. 2013.GA: A Package for GeneticAlgorithms inR.”Journal of StatisticalSoftware 53 (4): 1–37.https://www.jstatsoft.org/v53/i04/.
Seijo-Pardo, B., I. Porto-Díaz, V. Bolón-Canedo, and A. Alonso-Betanzos.2017.“Ensemble Feature Selection: Homogeneous and HeterogeneousApproaches.”Knowledge-Based Systems 118: 124–39.https://www.sciencedirect.com/science/article/pii/S0950705116304749.
Wolberg, W. H., and O. L. Mangasarian. 1990.“Multisurface Methodof Pattern Separation for Medical Diagnosis Applied to BreastCytology.”Proceedings of the National Academy ofSciences 87 (23): 9193–96.
Wong, Tzu-Tsung. 1998.Generalized Dirichletdistribution in Bayesian analysis.”AppliedMathematics and Computation 97 (2): 165–81.

[8]ページ先頭

©2009-2025 Movatter.jp