Background

This section summarizes the core parts of UBayFS, where a centralpart is Bayes’ Theorem for two random variables\(\boldsymbol{\theta}\) and\(\boldsymbol{y}\):\[p(\boldsymbol{\theta}|\boldsymbol{y})\proptop(\boldsymbol{y}|\boldsymbol{\theta})\cdotp(\boldsymbol{\theta}),\] where\(\boldsymbol{\theta}\) represents animportance parameter of single features and\(\boldsymbol{y}\) collects evidence about\(\boldsymbol{\theta}\) from anensemble of elementary feature selectors. In the following, the conceptwill be outlined.

Ensemble feature selection as likelihood

The first step in UBayFS is to build\(M\) ensembles of elementary featureselectors. Each elementary feature selector\(m=1,\dots,M\) selects features, denoted bya binary membership vector\(\boldsymbol{\delta}^{(m)} \in \{0,1\}^N\),based on a randomly selected training dataset, where\(N\) denotes the total number of features inthe dataset. In the binary membership vector\(\boldsymbol{\delta}^{(m)}\), a component\(\delta_i^{(m)}=1\) indicates thatfeature\(i\in\{1,\dots,N\}\) isselected, and\(\delta_i^{(m)}=0\)otherwise. Statistically, we interpret the result from each elementaryfeature selector as a realization from a multinomial distribution withparameters\(\boldsymbol{\theta}\) and\(l\), where\(\boldsymbol{\theta}\in[0,1]^N\) defines thesuccess probabilities of sampling each feature in an individual featureselection and\(l\) corresponds to thenumber of features selected in\(\boldsymbol{\delta}^{(m)}\). Therefore, thejoint probability density of the observed data\(\boldsymbol{y} =\sum\limits_{m=1}^{M}\boldsymbol{\delta}^{(m)}\in\{0,\dots,M\}^N\)— the likelihood function — has the form\[p(\boldsymbol{y}|\boldsymbol{\theta}) = \prod\limits_{m=1}^{M}f_{\text{mult}}(\boldsymbol{\delta}^{(m)};\boldsymbol{\theta},l),\]where\(f_{\text{mult}}\) is theprobability density function of the multinomial distribution.

Expert knowledge as prior

UBayFS includes two types of expert knowledge: prior feature weightsand feature set constraints.

Prior feature weights

To introduce expert knowledge about the importance of features, theuser may define a vector\(\boldsymbol{\alpha}= (\alpha_1,\dots,\alpha_N)\),\(\alpha_i>0\) for all\(i=1,\dots,N\), assigning a weight to eachfeature. High weights indicate that a feature is important. By default,if all features are equally important or no prior weighting is used,\(\boldsymbol{\alpha}\) is set to the1-vector of length\(N\). With theweighting in place, we assume the a-priori feature importance parameter\(\boldsymbol{\theta}\) follows aDirichlet distribution(Maier 2020)\[p(\boldsymbol{\theta}) =f_{\text{Dir}}(\boldsymbol{\theta};\boldsymbol{\alpha}),\] wherethe probability density function of the Dirichlet distribution is givenas\[f_{\text{Dir}}(\boldsymbol{\theta};\boldsymbol{\alpha})= \frac{1}{\text{B}(\boldsymbol{\alpha})} \prod\limits_{n=1}^N\theta_n^{\alpha_n-1},\] where\(\text{B}(.)\) denotes the multivariate Betafunction. Generalizations of the Dirichlet distributionHankin (2010) are also implemented inUBayFS.

Since the Dirichlet distribution is the conjugate prior with respectto a multivariate likelihood, the posterior density is given as\[p(\boldsymbol{\theta}|\boldsymbol{y}) \proptof_{\text{Dir}}(\boldsymbol{\theta};\boldsymbol{\alpha}^\circ),\]with\[\boldsymbol{\alpha}^\circ = \left(\alpha_1 + \sum\limits_{m=1}^M \delta_1^{(m)}, \dots, \alpha_N +\sum\limits_{m=1}^M \delta_N^{(m)} \right)\] representing theposterior parameter vector\(\boldsymbol{\alpha}^\circ\).

Feature set constraints

In addition to the prior weighting of features, the UBayFS user canalso add different types of constraints to the feature selection:

max-size constraint: Maximum number of features that shallbe selected.
must-link constraint: For a pair of features, either bothor none is selected (defined as pairwise constraints, one for each pairof features).
cannot-link constraint: Used if a pair of features must notbe selected jointly.

All constraints can be definedblock-wise between featureblocks (instead of individual features). Constraints are represented asa linear system of linear inequalities\(\boldsymbol{A}\boldsymbol{\delta}-\boldsymbol{b}\leq\boldsymbol{0}\), where\(\boldsymbol{A}\in\mathbb{R}^{K\times N}\)and\(\boldsymbol{b}\in\mathbb{R}^K\).\(K\) denotes the total number ofconstraints. For constraint\(k \in1,..,K\), a feature set\(\boldsymbol{\delta}\) is admissible only if\(\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta}- b^{(k)} \leq 0\), leading to the inadmissibility function(penalty term)

\[\begin{align}\kappa_{k,\rho}(\boldsymbol{\delta}) = \left\{ \begin{array}{l l} 0 &\text{if}~\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta}\leqb^{(k)}\\ 1 & \text{if}~\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta}> b^{(k)} \land\rho =\infty\\ \frac{1-\xi_{k,\rho}}{1 + \xi_{k,\rho}} & \text{otherwise}, \end{array} \right.\end{align}\]

where\(\rho\in\mathbb{R}^+ \cup\{\infty\}\) denotes a relaxation parameter and\(\xi_{k,\rho} = \exp\left(-\rho \left(\left(\boldsymbol{a}^{(k)}\right)^T\boldsymbol{\delta} -b^{(k)}\right)\right)\) defines the exponential term of alogistic function. To handle\(K\)different constraints for one feature selection problem, the jointinadmissibility function is given as\[\kappa(\boldsymbol{\delta}) = 1 - \prod\limits_{k=1}^{K} \left(1-\kappa_{k,\rho}(\boldsymbol{\delta})\right)\] which originatesfrom the idea that\(\kappa = 1\)(maximum penalization) if at least one\(\kappa_{k,\rho}=1\), while\(\kappa=0\) (no penalization) if all\(\kappa_{k,\rho}=0\).

To obtain an optimal feature set\(\boldsymbol{\delta}^\star\), we use atarget function\(U(\boldsymbol{\delta},\boldsymbol{\theta})\) which represents a posterior expectedutility of feature sets\(\boldsymbol{\delta}\) given the posteriorfeature importance parameter\(\boldsymbol{\theta}\), regularized by theinadmissibility function\(\kappa(.)\).

\[\mathbb{E}_{\boldsymbol{\theta}|\boldsymbol{y}}[U(\boldsymbol{\delta},\boldsymbol{\theta}(\boldsymbol{y}))] = \boldsymbol{\delta}^T\mathbb{E}_{\boldsymbol{\boldsymbol{\delta}}|\boldsymbol{y}}[\boldsymbol{\theta}(\boldsymbol{y})]-\lambda\kappa(\boldsymbol{\delta})\longrightarrow\underset{\boldsymbol{\delta}\in\{0,1\}^N}{\text{arg max}}\]

Since an exact optimization is impossible due to the non-linearfunction\(\kappa\), we use a geneticalgorithm to find an appropriate feature set. In detail, the geneticalgorithm is initialized via a Greedy algorithm and computescombinations of the given feature sets with regard to a fitness functionin each iteration.

Application of UBayFS

Ensemble Training

The functionbuild.UBaymodel() initializes the UBayFSmodel and trains an ensemble of elementary feature selectors. Thetraining dataset and target are initialized with the argumentsdata andtarget. Although the UBayFS conceptpermits unsupervised, multiclass, or regression setups, the currentimplementation supports binary target variables only. WhileM defines the ensemble size (number of elementary featureselectors), the types of the elementary feature selectors is set viamethod. Three different feature selectors (mRMR, Fisherschore and Laplace score) are implemented as baseline. In general, themethod argument allows for each self-implemented featureselection function with the argumentsX (describes thedata),y (describes the target),n (describesthe number of features that shall be selected), andname(name of the method). The function must return the indices of theselected features and the input name. An example with classificationtrees is shown below. Each ensemble model is trained on a random subsetcomprisingtt_split\(\cdot100\) percent of the train data. The help functionbuildConstraints() provides an easy way to define sideconstraints for the model. Using the argumentprior_modelthe user specifies whether the standard Dirichlet distribution or ageneralized variant should be used as prior model. Furthermore, thenumber of features selected in each ensemble can be controlled by theparameternr_features.

For the standard UBayFS initialization, all prior feature weights areset to 1, and no feature constraints are included yet. Thesummary() function provides an overview of the dataset, theprior weights, and the likelihood — ensemble counts indicate how often afeature was selected over the ensemble feature selections.

model=build.UBaymodel(data = bcw$data,target = bcw$labels,M =100,tt_split =0.75,nr_features =10,method ='mRMR',prior_model ='dirichlet',weights =0.01,lambda =1,constraints =buildConstraints(constraint_types =c('max_size'),constraint_vars =list(3),num_elements =dim(bcw$data)[2],rho =1),optim_method ='GA',popsize =100,maxiter =100,shiny =FALSE                        )summary(model)#>  UBayFS model summary#>   data:  569x30#>   labels:  B: 357 M: 212#>#>   === constraints ===#>   - - - - - - - - - -  group  1   - - - - - - - - - -#>  constraint 1: (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) x <= 3; rho = 1#>#>   === prior weights ===#>   weights: ( 0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01 )#>#>   === likelihood ===#>   ensemble counts: ( 0,16,75,0,0,0,100,100,0,0,0,0,3,100,0,0,4,0,0,0,100,83,100,100,1,18,100,100,0,0 )#>#>   === feature selection results ===#>  no output produced yet

The prior constraints are shown as a linear inequation systemtogether with the penalty term\(\rho\). Further, the current prior weightand the ensemble feature counts (likelihood) for each feature areprinted. As the model is not trained yet, the final feature selectionresult is empty.

In addition tomRMR, we add a functiondecision_tree() that computes features based on decisiontree importances.

library(rpart)decision_tree<-function(X, y, n,name ='tree'){  rf_data=as.data.frame(cbind(y, X))colnames(rf_data)<-make.names(colnames(rf_data))  tree= rpart::rpart(y~.,data = rf_data)return(list(ranks=which(colnames(X)%in%names(tree$variable.importance)[1:n]),name = name))}model=build.UBaymodel(data = bcw$data,target = bcw$labels,M =100,tt_split =0.75,nr_features =10,method =c('mRMR', decision_tree),prior_model ='dirichlet',weights =0.01,lambda =1,constraints =buildConstraints(constraint_types =c('max_size'),constraint_vars =list(3),num_elements =dim(bcw$data)[2],rho =1),optim_method ='GA',popsize =100,maxiter =100,shiny =FALSE                        )

Examples for more feature selection methods are:

# recursive feature eliminationlibrary(caret)rec_fe<-function(X,y,n,name='rfe'){if(is.factor(y)){        control<-rfeControl(functions=rfFuncs,method ='cv',number =2)      }else{        control<-rfeControl(functions=lmFuncs,method ='cv',number =2)      }      results<- caret::rfe(X, y,sizes = n,rfeControl=control)return(list(ranks =which(colnames(X)%in% results$optVariables),name = name))}# Lassolibrary(glmnet)lasso<-function(X, y,n=NULL,name='lasso'){  family=ifelse(is.factor(y),'binomial','gaussian')  cv.lasso<-cv.glmnet(as.matrix(X), y,intercept =FALSE,alpha =1,family = family,nfolds=3)  model<-glmnet(as.matrix(X), y,intercept =FALSE,alpha =1,family = family,lambda = cv.lasso$lambda.min)return(list(ranks =which(as.vector(model$beta)!=0),name = name))}# HSIC Lassolibrary(GSelection)hsic_lasso<-function(X, y, n,name='hsic'){ifelse(is.factor(y), {tl=as.numeric(as.integer(y)-1)}, {tl= y})  results=feature.selection(X, tl, n)return(list(ranks = results$hsic_selected_feature_index,name = name))}

User knowledge

Using the functionsetWeights() the user is able tochange the feature weights from the standard initialization to desiredvalues. In our example, we assign equal weights to features originatingfrom the same image characteristic. Weights can be on an arbitraryscale. As it is difficult to specify prior weights in real-lifeapplications, we suggest to define them on a normalized scale.

weights=rep(c(10,15,20,16,15,10,12,17,21,14),3)strength=1weights= weights* strength/sum(weights)print(weights)#>  [1] 0.02222222 0.03333333 0.04444444 0.03555556 0.03333333 0.02222222 0.02666667 0.03777778 0.04666667 0.03111111 0.02222222 0.03333333 0.04444444 0.03555556 0.03333333 0.02222222 0.02666667 0.03777778 0.04666667 0.03111111 0.02222222 0.03333333 0.04444444 0.03555556 0.03333333 0.02222222 0.02666667 0.03777778 0.04666667 0.03111111model=setWeights(model = model,weights = weights)

In addition to prior weights, feature set constraints may bespecified. Internally, constraints are implemented via an S3-classUBayconstraint, depicted in the following diagram:

Rather than calling the constructor method directly, the helpfunctionbuildConstraints() may be used to facilitate thedefinition of a set of constraints: the inputconstraint_types consists of a vector, where all constrainttypes are defined. Then, withconstraint_vars, the userspecifies details about the constraint: for max-size, the number offeatures to select is provided, while for must-link and cannot-link, theset of feature indices to be linked must be provided. Each list entrycorresponds to one constraint inconstraint_types. Inaddition,num_features denotes the total number of featuresin the dataset (or the total number of blocks if the constraint isblock-wise) andrho corresponds to the relaxation parameterof the admissibility function. For block constraints, information aboutthe block structure is included either withblock_listorblock_matrix - if both arguments areNULL,feature-wise constraints are generated.

Applyingprint(constraints) demonstrates that, thematrixA has ten rows to represent four constraints. Whilemax-size andcannot-link can be expressed in oneequation each,must-link is a pairwise constraint. In specific,themust-link constraint between\(n\) features produces\(\frac{n!}{(n-2)!}\) elementary constraints.Hence, six equations represent themust-link constraint. ThefunctionsetConstraints() integrates the constraints intothe UBayFS model.

constraints=buildConstraints(constraint_types =c('max_size','must_link',rep('cannot_link',2)),constraint_vars =list(10,# max-size (maximal 10 features)c(1,11,21),# must-link between features 1, 11, and 21c(1,10),# cannot-link between features 1, and 10c(20,23,24)),# cannot-link between features 20, 23, and 24num_elements =ncol(model$data),rho =c(Inf,# max_size0.1,# rho for must-link1,# rho for first cannot-link1))# rho for second cannot-linkprint(constraints)#>  A#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]#>  [1,]    1    1    1    1    1    1    1    1    1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1#>  [2,]   -1    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [3,]   -1    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0#>  [4,]    1    0    0    0    0    0    0    0    0     0    -1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [5,]    0    0    0    0    0    0    0    0    0     0    -1     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0#>  [6,]    1    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0    -1     0     0     0     0     0     0     0     0     0#>  [7,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     0     0     0     0    -1     0     0     0     0     0     0     0     0     0#>  [8,]    1    0    0    0    0    0    0    0    0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [9,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     1     0     0     1     1     0     0     0     0     0     0#>  b#> [1] 10  0  0  0  0  0  0  1  1#>  rho#> [1] Inf 0.1 0.1 0.1 0.1 0.1 0.1 1.0 1.0#>  block_matrix#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]#>  [1,]    1    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [2,]    0    1    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [3,]    0    0    1    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [5,]    0    0    0    0    1    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [6,]    0    0    0    0    0    1    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [7,]    0    0    0    0    0    0    1    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [8,]    0    0    0    0    0    0    0    1    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#>  [9,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [10,]    0    0    0    0    0    0    0    0    0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [11,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [12,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [13,]    0    0    0    0    0    0    0    0    0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [14,]    0    0    0    0    0    0    0    0    0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [15,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [16,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0#> [17,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0#> [18,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0#> [19,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0#> [20,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0#> [21,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0     0#> [22,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0     0#> [23,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0     0#> [24,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0     0#> [25,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0     0#> [26,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0#> [27,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0#> [28,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0     0#> [29,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     0#> [30,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1model=setConstraints(model = model,constraints = constraints)

Optimization and evaluation

A genetic algorithm, described by(Givens andHoeting 2012) and implemented in(Scrucca2013), searches for the optimal feature set in the UBayFSframework. UsingsetOptim() we initialize the geneticalgorithm. Furthermore,popsize indicates the number ofcandidate feature sets created in each iteration, andmaxiter is the number of iterations.

model=setOptim(model = model,popsize =100,maxiter =200)

At this point, we have initialized prior weights, constraints, andthe optimization procedure — we can now train the UBayFS model using thegeneric functiontrain(), relying on a genetic algorithm.Thesummary() function provides an overview of allcomponents of UBayFS. Theplot() function shows the priorfeature information as bar charts, with the selected features markedwith red borders. In addition, the constraints and the regularizationparameter\(\rho\) are presented.

model= UBayFS::train(x = model)#> Running Genetic Algorithmsummary(model)#>  UBayFS model summary#>   data:  569x30#>   labels:  B: 357 M: 212#>#>   === constraints ===#>   - - - - - - - - - -  group  1   - - - - - - - - - -#>  constraint 1: (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) x <= 10; rho = Inf#>  constraint 2: (-1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 3: (-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 4: (1,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 5: (0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 6: (1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 7: (0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0) x <= 0; rho = 0.1#>  constraint 8: (1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) x <= 1; rho = 1#>  constraint 9: (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0) x <= 1; rho = 1#>#>   === prior weights ===#>   weights: ( 0.0222222222222222,0.0333333333333333,0.0444444444444444,0.0355555555555556,0.0333333333333333,0.0222222222222222,0.0266666666666667,0.0377777777777778,0.0466666666666667,0.0311111111111111,0.0222222222222222,0.0333333333333333,0.0444444444444444,0.0355555555555556,0.0333333333333333,0.0222222222222222,0.0266666666666667,0.0377777777777778,0.0466666666666667,0.0311111111111111,0.0222222222222222,0.0333333333333333,0.0444444444444444,0.0355555555555556,0.0333333333333333,0.0222222222222222,0.0266666666666667,0.0377777777777778,0.0466666666666667,0.0311111111111111 )#>#>   === likelihood ===#>   ensemble counts: ( 0,16,75,0,0,0,100,100,0,0,0,0,3,100,0,0,4,0,0,0,100,83,100,100,1,18,100,100,0,0 )#>#>   === feature selection results ===#>   ( 2,3,7,8,14,22,23,26,27,28 )plot(model)

After training the model, we receive a feature selection result. Morethan one optimal feature set with the same MAP score is possible. Theplot shows the selected features (red framed) and their selectiondistribution between ensemble feature selection and prior weights. Theconstraints are shown at the top, where a connecting line is drawnbetween features of one constraint. The final feature set and itsadditional properties can be evaluated withevaluateFS():

# evaluation feature setevaluateMultiple(state = model$output$feature_set,model = model)#>                                  [,1]#> cardinality                    10.000#> log total utility              -0.234#> log posterior feature utility  -0.234#> log admissibility               0.000#> number of violated constraints  0.000#> avg feature correlation         0.614

The output contains the following information:

cardinality: number of selected features
log total utility: value of the target function foroptimization
log posterior feature utility: cumulatedimportances of selected features before substracting a penalizationterm
log admissibility: if 0, all constraints arefulfilled, otherwise at least one constraint is violated
number violated constraints: number of violatedconstraints
avg feature correlation: average correlationbetween features in dataset

Movatterモバイル変換

A quick tour through UBayFS

Anna Jenul, Stefan Schrunner

2023-03-07

Introduction

Requirements and dependencies