Movatterモバイル変換

Type:

Package

Title:

Leveraging Learning to Automatically Manage Algorithms

Version:

0.10.1

Date:

2021-03-16

Author:

Lars Kotthoff [aut,cre], Bernd Bischl [aut], Barry Hurley [ctb], Talal Rahwan [ctb], Damir Pulatov [ctb]

Maintainer:

Lars Kotthoff <larsko@uwyo.edu>

Description:

Provides functionality to train and evaluate algorithm selection models for portfolios.

Depends:

R (≥ 4.0), mlr (≥ 2.5)

Imports:

rJava, parallelMap, ggplot2, checkmate, BBmisc, plyr,data.table

Suggests:

testthat, ParamHelpers

License:

BSD_3_clause + file LICENSE

URL:

https://bitbucket.org/lkotthoff/llama

NeedsCompilation:

Packaged:

2021-03-16 18:47:39 UTC; larsko

Repository:

CRAN

Date/Publication:

2021-03-16 22:40:12 UTC

Leveraging Learning to Automatically Manage Algorithms

Description

Leveraging Learning to Automatically Manage Algorithms provides functionality toread and process performance data for algorithms, facilitate building modelsthat predict which algorithm to use in which scenario and ways of evaluatingthem.

Details

The package provides functions to read performance data, build performancemodels that enable selection of algorithms (using external machine learningfunctions) and evaluate those models.

Data is input usinginput and can then be used to learnperformance models. There are currently four main ways to create models.Classification (classify) creates a single machine learning modelthat predicts the algorithm to use as a label. Classification of pairs ofalgorithms (classifyPairs) creates a classification model for each pairof algorithms that predicts which one is better and aggregates these predictionsto determine the best overall algorithm. Clustering (cluster) clustersthe problems to solve and assigns the best algorithm to each cluster. Regression(regression) trains a separate or single model (depending on the types of features available)for all algorithms, predicts the performance on a problem independently and chooses the algorithmwith the best predicted performance. Regression of pairs of algorithms(regressionPairs) is similar toclassifyPairs, but predicts theperformance difference between each pair of algorithms. Similar toregression,regressionPairs can also build a single model for all pairs of algorithms, depending on the types of features available to the function.

Various functions to split the data into training and test set(s) and toevaluate the performance of the learned models are provided.

LLAMA uses the mlr package to access the implementation of machine learningalgorithms in R.

The model building functions are using theparallelMap package toparallelize across the data partitions (e.g. cross-validation folds) with level"llama.fold" and "llama.tune" for tuning. By default, everything is runsequentially. By loading a suitable backend (e.g. throughparallelStartSocket(2) for parallelization across 2 CPUs using sockets),the model building will be parallelized automatically and transparently. Notethat this doesnot mean that all machine learning algorithms used forbuilding models can be parallelized safely. For functions that are not threadsafe, useparallelStartSocket to run in separate processes.

Author(s)

Lars Kotthoff, Bernd Bischl

contributions by Barry Hurley, Talal Rahwan, Damir Pulatov

Maintainer: Lars Kotthoff <larsko@uwyo.edu>

References

Kotthoff, L. (2013)LLAMA: Leveraging Learning to Automatically Manage Algorithms.arXiv:1306.1031.

Kotthoff, L. (2014)Algorithm Selection for Combinatorial Search Problems: A survey.AI Magazine.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)model = classify(classifier=makeLearner("classif.J48"), data=folds)# print the total number of successesprint(sum(successes(folds, model)))# print the total misclassification penaltyprint(sum(misclassificationPenalties(folds, model)))# print the total PAR10 scoreprint(sum(parscores(folds, model)))# number of total successes for virtual best solver for comparisonprint(sum(successes(satsolvers, vbs, addCosts = FALSE)))# print predictions on the entire data setprint(model$predictor(subset(satsolvers$data, TRUE, satsolvers$features)))# train a regression modelmodel = regression(regressor=makeLearner("regr.lm"), data=folds)# print the total number of successesprint(sum(successes(folds, model)))}

Analysis functions

Description

Functions for analysing portfolios.

Usage

contributions(data = NULL)

Arguments

data

the data to use. The structure returned byinput.

Details

contributions analyses the marginal contributions of the algorithms inthe portfolio to its overall performance. More specifically, the Shapley valuefor a specific algorithm is computed as the "value" of the portfolio with thealgorithm minus the "value" without the algorithm. This is done over allpossible portfolio compositions.

It is automatically determined whether the performance value is to be minimisedor maximised.

Value

A table listing the Shapley values for each algorithm in the portfolio.The higher the value, the more the respective algorithm contributes to theoverall performance of the portfolio.

Author(s)

Lars Kotthoff

References

Rahwan, T., Michalak, T. (2013)A Game Theoretic Approach to Measure Contributions in Algorithm Portfolios.Technical Report RR-13-11, University of Oxford.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)contributions(satsolvers)}

Bootstrapping folds

Description

Take data produced byinput and amend it with (optionally)stratified folds determined through bootstrapping.

Usage

bsFolds(data, nfolds = 10L, stratify = FALSE)

Arguments

data

the data to use. The structure returned byinput.

nfolds

the number of folds. Defaults to 10.

stratify

whether to stratify the folds. Makes really only sense for classificationmodels. Defaults toFALSE.

Details

Partitions the data set into folds. Stratification, if requested, is done by thebest algorithm, i.e. the one with the best performance. The distribution of thebest algorithms in each fold will be approximately the same. For each fold, thetraining index set is assembled through .632 bootstrap. The remaining indicesare used for testing. There is no guarantee on the sizes of either sets. Thesets of indices are added to the original data set and returned.

If the data set has train and test partitions already, they are overwritten.

Value

train

a list of index sets for training.

test

a list of index sets for testing.

...

the original members ofdata. Seeinput.

Author(s)

Lars Kotthoff

Examples

data(satsolvers)folds = bsFolds(satsolvers)# use 5 folds instead of the default 10folds5 = bsFolds(satsolvers, 5L)# stratifyfoldsU = bsFolds(satsolvers, stratify=TRUE)

Classification model

Description

Build a classification model that predicts the algorithm to use based on thefeatures of the problem.

Usage

classify(classifier = NULL, data = NULL,    pre = function(x, y=NULL) { list(features=x) },    save.models = NA, use.weights = TRUE)

Arguments

classifier

the mlr classifier to use. See examples.

The argument can also be a list of such classifiers.

data

the data to use with training and test sets. The structure returned byone of the partitioning functions.

pre

a function to preprocess the data. Currently onlynormalize.Optional. Does nothing by default.

save.models

Whether to serialize and save the models trained during evaluation of themodel. If notNA, will be used as a prefix for the file name.

use.weights

Whether to use instance weights if supported. DefaultTRUE.

Details

classify takes the training and test sets indata andprocesses it usingpre (if supplied).classifier is called toinduce a classifier. The learned model is used to make predictions on the testset(s).

The evaluation across the training and test sets will be parallelizedautomatically if a suitable backend for parallel computation is loaded.TheparallelMap level is "llama.fold".

If the given classifier supports case weights anduse.weights isTRUE, the performance difference between the best and the worst algorithmis passed as a weight for each instance.

If a list of classifiers is supplied inclassifier, ensembleclassification is performed. That is, the models are trained and used to makepredictions independently. For each instance, the final prediction is determinedby majority vote of the predictions of the individual models – the class thatoccurs most often is chosen. If the list given asclassifier contains amember.combine that is a function, it is assumed to be a classifier withthe same properties as the other ones and will be used to combine the ensemblepredictions instead of majority voting. This classifier is passed the originalfeatures and the predictions of the classifiers in the ensemble.

If the prediction of a stacked learner isNA, the prediction will beNA for the score.

Ifsave.models is notNA, the models trained during evaluation areserialized into files. Each file contains a list with membersmodel (themlr model),train.data (the mlr task with the training data), andtest.data (the data frame with the test data used to make predictions).The file name starts withsave.models, followed by the ID of the machinelearning model, followed by "combined" if the model combines predictions ofother models, followed by the number of the fold. Each model for each fold issaved in a different file.

Value

predictions

predictor

a function that encapsulates the classifier learned on theentire data set. Can be called with data for the same features with thesame feature names as the training data to obtain predictions in the sameformat as thepredictions member.

models

the list of models trained on theentire data set. This ismeant for debugging/inspection purposes and does not include any models used tocombine predictions of individual models.

Author(s)

Lars Kotthoff

References

Kotthoff, L., Miguel, I., Nightingale, P. (2010)Ensemble Classification for Constraint Solver Configuration.16th International Conference on Principles and Practices of Constraint Programming, 321–329.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)res = classify(classifier=makeLearner("classif.J48"), data=folds)# the total number of successessum(successes(folds, res))# predictions on the entire data setres$predictor(satsolvers$data[satsolvers$features])res = classify(classifier=makeLearner("classif.svm"), data=folds)# use probabilities instead of labelsres = classify(classifier=makeLearner("classif.randomForest", predict.type = "prob"), data=folds)# ensemble classificationrese = classify(classifier=list(makeLearner("classif.J48"),                                makeLearner("classif.IBk"),                                makeLearner("classif.svm")),                data=folds)# ensemble classification with a classifier to combine predictionsrese = classify(classifier=list(makeLearner("classif.J48"),                                makeLearner("classif.IBk"),                                makeLearner("classif.svm"),                                .combine=makeLearner("classif.J48")),                data=folds)}

Classification model for pairs of algorithms

Description

Build a classification model for each pair of algorithms that predicts which oneis better based on the features of the problem. Predictions are aggregated todetermine the best overall algorithm.

Usage

classifyPairs(classifier = NULL, data = NULL,    pre = function(x, y=NULL) { list(features=x) }, combine = NULL,    save.models = NA, use.weights = TRUE)

Arguments

classifier

the mlr classifier to use. See examples.

data

the data to use with training and test sets. The structure returned byone of the partitioning functions.

pre

a function to preprocess the data. Currently onlynormalize.Optional. Does nothing by default.

combine

The classifier function to predict the overall best algorithm given thepredictions for pairs of algorithms. Optional. By default, the overall bestalgorithm is determined by majority vote.

save.models

Whether to serialize and save the models trained during evaluation of themodel. If notNA, will be used as a prefix for the file name.

use.weights

Whether to use instance weights if supported. DefaultTRUE.

Details

classifyPairs takes the training and test sets indata andprocesses it usingpre (if supplied).classifier is called toinduce a classifier for each pair of algorithms to predict which one is better.Ifcombine is not supplied, the best overall algorithm is determinedby majority vote. If it is supplied, it is assumed to be a classifier with thesame properties as the other one. This classifier is passed the originalfeatures and the predictions for each pair of algorithms.

Which algorithm is better of a pair is determined by comparing their performancescores. Whether a lower performance number is better or not is determined bywhat was specified when the LLAMA data frame was created.

The evaluation across the training and test sets will be parallelizedautomatically if a suitable backend for parallel computation is loaded.TheparallelMap level is "llama.fold".

If the given classifier supports case weights anduse.weights isTRUE, the performance difference between the best and the worst algorithmis passed as a weight for each instance.

If all predictions of an underlying machine learning model areNA, itwill count as 0 towards the score.

Training this model can take a very long time. Givenn algorithms,choose(n, 2) models are trained and evaluated. This is significantlyslower than the other approaches that train a single model or one for eachalgorithm.

Value

predictions

predictor

models

the models for each pair of algorithms trained on theentire data set. This is meant for debugging/inspection purposes anddoes not include any models used to combine predictions of individual models.

Author(s)

Lars Kotthoff

References

Xu, L., Hutter, F., Hoos, H. H., Leyton-Brown, K. (2011)Hydra-MIP: Automated Algorithm Configuration and Selection for Mixed Integer Programming.RCRA Workshop on Experimental Evaluation of Algorithms for SolvingProblems with Combinatorial Explosion, 16–30.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)res = classifyPairs(classifier=makeLearner("classif.J48"), data=folds)# the total number of successessum(successes(folds, res))# predictions on the entire data setres$predictor(satsolvers$data[satsolvers$features])# use probabilities instead of labelsres = classifyPairs(classifier=makeLearner("classif.randomForest",                                predict.type = "prob"), data=folds)# combine predictions using J48 induced classifier instead of majority voteres = classifyPairs(classifier=makeLearner("classif.J48"),                    data=folds,                    combine=makeLearner("classif.J48"))}

Cluster model

Description

Build a cluster model that predicts the algorithm to use based on the featuresof the problem.

Usage

cluster(clusterer = NULL, data = NULL,    bestBy = "performance",    pre = function(x, y=NULL) { list(features=x) },    save.models = NA)

Arguments

clusterer

the mlr clustering function to use. See examples.

The argument can also be a list of such functions.

data

the data to use with training and test sets. The structure returned byone of the partitioning functions.

bestBy

the criteria by which to determine the best algorithm in a cluster. Can be oneof "performance", "count", "successes". Optional. Defaults to "performance".

pre

a function to preprocess the data. Currently onlynormalize.Optional. Does nothing by default.

save.models

Whether to serialize and save the models trained during evaluation of themodel. If notNA, will be used as a prefix for the file name.

Details

cluster takesdata and processes it usingpre (ifsupplied).clusterer is called to cluster the data. For each cluster, thebest algorithm is identified according to the criteria given inbestBy.IfbestBy is "performance", the best algorithm is the one with the bestoverall performance across all instances in the cluster. If it is "count", thebest algorithm is the one that has the best performance most often. If it is"successes", the best algorithm is the one with the highest number of successesacross all instances in the cluster. The learned model is used to cluster thetest data and predict algorithms accordingly.

The evaluation across the training and test sets will be parallelizedautomatically if a suitable backend for parallel computation is loaded.TheparallelMap level is "llama.fold".

If a list of clusterers is supplied inclusterer, ensembleclustering is performed. That is, the models are trained and used to makepredictions independently. For each instance, the final prediction is determinedby majority vote of the predictions of the individual models – the class thatoccurs most often is chosen. If the list given asclusterer contains amember.combine that is a function, it is assumed to be a classifier withthe same properties as classifiers given toclassify and will be used tocombine the ensemble predictions instead of majority voting. This classifier ispassed the original features and the predictions of the classifiers in theensemble.

If all predictions of an underlying machine learning model areNA, theprediction will beNA for the algorithm and-Inf for the score ifthe performance value is to be maximised,Inf otherwise.

Value

predictions

a data frame with the predictions for each instance and testset. The columns of the data frame are the instance ID columns (as determinedbyinput), the algorithm, the score of the algorithm, and the iteration(e.g. the number of the fold for cross-validation). More than one predictionmay be made for each instance and iteration. The score corresponds to thecumulative performance value for the algorithm of the cluster the instance wasassigned to. That is, ifbestBy is "performance", it is the sum of theperformance over all training instances. IfbestBy is "count", the scorecorresponds to the number of training instances that the respective algorithmwas the best on, and if it is "successes" it corresponds to the number oftraining instances where the algorithm was successful. If more than oneclustering algorithm is used, the score corresponds to the sum of all instancesacross all clusterers. If stacking is used, the prediction is simply the bestalgorithm with a score of 1.

predictor

a function that encapsulates the model learned on theentire data set. Can be called with data for the same features with thesame feature names as the training data to obtain predictions in the sameformat as thepredictions member.

models

the list of models trained on theentire data set. This ismeant for debugging/inspection purposes and does not include any models used tocombine predictions of individual models.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)res = cluster(clusterer=makeLearner("cluster.XMeans"), data=folds, pre=normalize)# the total number of successessum(successes(folds, res))# predictions on the entire data setres$predictor(satsolvers$data[satsolvers$features])# determine best by number of successesres = cluster(clusterer=makeLearner("cluster.XMeans"), data=folds,    bestBy="successes", pre=normalize)sum(successes(folds, res))# ensemble clusteringrese = cluster(clusterer=list(makeLearner("cluster.XMeans"),    makeLearner("cluster.SimpleKMeans"), makeLearner("cluster.EM")),    data=folds, pre=normalize)# ensemble clustering with a classifier to combine predictionsrese = cluster(clusterer=list(makeLearner("cluster.XMeans"),    makeLearner("cluster.SimpleKMeans"), makeLearner("cluster.EM"),    .combine=makeLearner("classif.J48")), data=folds, pre=normalize)}

Cross-validation folds

Description

Take data produced byinput and amend it with (optionally)stratified folds for cross-validation.

Usage

cvFolds(data, nfolds = 10L, stratify = FALSE)

Arguments

data

the data to use. The structure returned byinput.

nfolds

the number of folds. Defaults to 10. If -1 is given, leave-one-outcross-validation folds are produced.

stratify

whether to stratify the folds. Makes really only sense for classificationmodels. Defaults toFALSE.

Details

Partitions the data set into folds. Stratification, if requested, is done by thebest algorithm, i.e. the one with the best performance. The distribution of thebest algorithms in each fold will be approximately the same. The folds areassembled into training and test sets by combining $n-1$ folds for training andusing the remaining fold for testing. The sets of indices are added to theoriginal data set and returned.

If the data set has train and test partitions already, they are overwritten.

Value

train

a list of index sets for training.

test

a list of index sets for testing.

...

the original members ofdata. Seeinput.

Author(s)

Lars Kotthoff

Examples

data(satsolvers)folds = cvFolds(satsolvers)# use 5 folds instead of the default 10folds5 = cvFolds(satsolvers, 5L)# stratifyfoldsU = cvFolds(satsolvers, stratify=TRUE)

Helpers

Description

S3 helper methods.

Usage

## S3 method for class 'llama.data'print(x, ...)## S3 method for class 'llama.model'print(x, ...)## S3 method for class 'classif.constant'makeRLearner()## S3 method for class 'classif.constant'predictLearner(.learner, .model, .newdata, ...)## S3 method for class 'classif.constant'trainLearner(.learner, .task, .subset, .weights, ...)

Arguments

x

the object to print.

.learner

learner.

.model

model.

.newdata

new data.

.task

task.

.subset

subset.

.weights

weights.

...

ignored.

Author(s)

Lars Kotthoff

Impute censored values

Description

Impute the performance values that are censored, i.e. for which the respectivealgorithm was not successful.

Usage

imputeCensored(data = NULL, estimator = makeLearner("regr.lm"),    epsilon = 0.1, maxit = 1000)

Arguments

data

the data to check for censored values to impute. The structurereturned byinput.

estimator

the mlr regressor to use to impute the censored values.

epsilon

the convergence criterion. Default 0.1.

maxit

the maximum number of iterations. Default 1000.

Details

The function checks for each algorithm if there are censored values by checkingfor which problem instances the algorithm was not successful. It trains a modelto predict the performance value for those instances using the given estimatorbased on the performance values of the instances where the algorithm wassuccessful and the problem features. It then uses the results of this initialprediction to train a new model on the entire data and predict the performancevalues for those problems where the algorithm was successful again. This processis repeated until the maximum difference between predictions in two successiveiterations is less thanepsilon or more thanmaxit iterations havebeen performed.

It is up to the user to check whether the imputed values make sense. Inparticular, for solver runtime data and timeouts one would expect that theimputed values are above the timeout threshold, indicating at what time thealgorithms that have timed out would have solved the problem. No effort is madeto enforce such application-specific constraints.

Value

The data structure with imputed censored values. The original data is saved intheoriginal_data member.

Author(s)

Lars Kotthoff

References

Josef Schmee and Gerald J. Hahn (1979)A Simple Method for Regression Analysis with Censored Data.Technometrics 21, no. 4, 417-432.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)imputed = imputeCensored(satsolvers)}

Read data

Description

Reads performance data that can be used to train and evaluate models.

Usage

input(features, performances, algorithmFeatures = NULL, successes = NULL, costs = NULL, extra = NULL, minimize = T, perfcol = "performance")

Arguments

features

data frame that contains the feature values for each problem instance and anon-empty set of ID columns.

algorithmFeatures

data frame that contains the feature values for each algorithm and anon-empty set of algorithm ID columns. Optional.

performances

data frame that contains the performance values for each problem instance anda non-empty set of ID columns.

successes

data frame that contains the success values (TRUE/FALSE) foreach algorithm on each problem instance and a non-empty set of ID columns. Thenames of the columns in this data set should be the same as the names of thecolumns inperformances. Optional.

costs

either a single number, a data frame or a list that specifiesthe cost of the features. If a number is specified, it is assumed to denotethe cost for all problem instances (i.e. the cost is always the same). If adata frame is given, it is assumed to have one column for each feature withthe same name as the feature where each value gives the cost and a non-emptyset of ID columns. If a list is specified, it is assumed to have a membergroups that specifies which features belong to which group and a membervalues that is a data frame in the same format as before. Optional.

extra

data frame containing any extra information about the instances and anon-empty set of ID columns. This is not used in modelling, but can be usede.g. for visualisation. Optional.

minimize

whether the minimum performance value is best. Default true.

perfcol

name of the column that stores performance values when algorithm features are provided. Default performance.

Details

input takes a list of data frames and processes them as follows. Thefeature and performance data are joined by looking for common column names inthe two data frames (usually an ID of the problem instance). For each problem,the best algorithm according to the given performance data is computed. If morethan one algorithm has the best performance, all of them are returned.

The data frame for algorithmic features is optional. When it is provided, the existing data is joined by algorithm names. The final data frame is reshaped into 'long' format.

The data frame that describes whether an algorithm was successful on a problemis optional. Ifparscores orsuccesses are to be used toevaluate the learned models, this argument is required however and will lead toerror messages if not supplied.

Similarly, feature costs are optional.

Ifsuccesses is given, it is used to determine the best algorithm on eachproblem instance. That is, an algorithm can only be best if it was successful.If no algorithm was successful, the value will beNA. Special care shouldbe taken when preparing the performance values for unsuccessful algorithms. Forexample, if the performance measure is runtime and success is determined bywhether the algorithm was able to find a solution within a timeout, theperformance value for unsuccessful algorithms should be the timeout value. Ifthe algorithm failed because of some other reason in a short amount of time,specifying this small amount of time may confuse some of the algorithm selectionmodel learners.

Value

data

the combined data (features, performance, successes).

best

a list of the best algorithms.

ids

a list of names denoting the instance ID columns.

features

a list of names denoting problem features.

algorithmFeatures

a list of names denoting algorithm features. 'NULL' if no algorithm features are provided.

algorithmNames

a list of algorithm names. 'NULL' if no algorithm features are provided. See 'performance' field in that case.

algos

a column that stores names of algorithms. 'NULL' if no algorithm features are provided.

performance

a list of names denoting algorithm performances. If algorithm features are provided, a column name that stores algorithm performances.

success

a list of names denoting algorithm successes. If algorithm features are provided, a column name that stores algorithm successes.

minimize

true if the smaller performance values are better, else false.

cost

a list of names denoting feature costs.

costGroups

a list of list of names denoting which features belong towhich group. Only returned if cost groups are given as input.

Author(s)

Lars Kotthoff

Examples

# features.csv looks something like# ID,width,height# 0,1.2,3# ...# performance.csv:# ID,alg1,alg2# 0,2,5# ...# success.csv:# ID,alg1,alg2# 0,T,F# ...#input(read.csv("features.csv"), read.csv("performance.csv"),#    read.csv("success.csv"), costs=10)# costs.csv:# ID,width,height# 0,3,4.5# ...#input(read.csv("features.csv"), read.csv("performance.csv"),#    read.csv("success.csv"), costs=read.csv("costs.csv"))# costGroups.csv:# ID,group1,group2# 0,3,4.5# ...#input(read.csv("features.csv"), read.csv("performance.csv"),#    read.csv("success.csv"),#    costs=list(groups=list(group1=c("height"), group2=c("width")),#               values=read.csv("costGroups.csv")))

Convenience functions

Description

Convenience functions for computing and working with predictions.

Usage

vbs(data = NULL)singleBest(data = NULL)singleBestByCount(data = NULL)singleBestByPar(data = NULL, factor = 10)singleBestBySuccesses(data = NULL)predTable(predictions = NULL, bestOnly = TRUE)

Arguments

data

the data to use. The structure returned byinput.

factor

the penalization factor to use for non-successful choices. Default 10.

predictions

the list of predictions.

bestOnly

whether to tabulate only the respective best algorithm for each instance.DefaultTRUE.

Details

vbs andsingleBest take a data frame of input data and returnpredictions that correspond to the virtual best and the single best algorithm,respectively. The virtual best picks the best algorithm for each instance. If noalgorithm solved in the instance,NA is returned. The single best picksthe algorithm that has the best cumulative performance over the entire data set.

singleBestByCount returns the algorithm that has the best performancethe highest number of times over the entire data set. Only whether or not analgorithm is the best matters for this, not the difference to other algorithms.

singleBestByPar aggregates the PAR score over the entire data set andreturns the algorithm with the lowest overall PAR score.singleBestBySuccesses counts the number of successes over the data setand returns the algorithm with the highest overall number of successes.

predTable tabulates the predicted algorithms in the same way thattable does. IfbestOnly isFALSE, all algorithms areconsidered – for example for regression models, predictions are made for allalgorithms, so the table will simply show the number of instances for eachalgorithm. SetbestOnly toTRUE to tabulate only the bestalgorithm for each instance.

Value

A data frame with the predictions for each instance. The columns of the dataframe are the instance ID columns (as determined byinput), thealgorithm, the score of the algorithm, and the iteration (always 1). The scoreis 1 if the respective algorithm is chosen for the instance, 0 otherwise. Morethan one prediction may be made for each instance and iteration.

ForpredTable, a table.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)# number of total successes for virtual best solverprint(sum(successes(satsolvers, vbs)))# number of total successes for single best solver by countprint(sum(successes(satsolvers, singleBestByCount)))# sum of PAR10 scores for single best solver by PAR10 scoreprint(sum(parscores(satsolvers, singleBestByPar)))# number of total successes for single best solver by successesprint(sum(successes(satsolvers, singleBestBySuccesses)))# print a table of the best solvers per instanceprint(predTable(vbs(satsolvers)))}

Misclassification penalty

Description

Calculates the penalty incurred because of making incorrect decisions, i.e.choosing suboptimal algorithms.

Usage

misclassificationPenalties(data, model, addCosts = NULL)

Arguments

data

the data used to induce the model. The same as given toclassify,classifyPairs,cluster orregression.

model

the algorithm selection model. Can be either a modelreturned by one of the model-building functions or a function that returnspredictions such asvbs or the predictor function of a trainedmodel.

addCosts

does nothing. Only here for compatibility with the otherevaluation functions.

Details

Compares the performance of the respective chosen algorithm to the performanceof the best algorithm for each datum. Returns the absolute difference. Thisdenotes the penalty for choosing a suboptimal algorithm, e.g. the additionaltime required to solve a problem or reduction in solution quality incurred. Themisclassification penalty of the virtual best is always zero.

If the model returnsNA (e.g. because no algorithm solved the instance),0 is returned as misclassification penalty.

data may contain a train/test partition or not. This makes a differencewhen computing the misclassification penalties for the single best algorithm.If no train/test split is present, the single best algorithm is determined onthe entire data. If it is present, the single best algorithm is determined oneach test partition. That is, the single best is local to the partition and mayvary across partitions.

Value

A list of the misclassification penalties.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)model = classify(classifier=makeLearner("classif.J48"), data=folds)sum(misclassificationPenalties(folds, model))}

Normalize features

Description

Normalize input data so that the values for all features cover the same range -1to 1.

Usage

normalize(rawfeatures, meta = NULL)

Arguments

rawfeatures

data frame with the feature values to normalize.

meta

meta data to use for the normalization. If supplied should be a list withmembersminValues that contains the minimum values for all features andmaxValues that contains the maximum values for all features. Will becomputed if not supplied.

Details

normalize subtracts the minimum (supplied or computed) from all values ofa feature, divides by the difference between maximum and minimum, multiplies by2 and subtracts 1. The range of the values for all features will be -1 to 1.

Value

features

the normalized feature vectors.

meta

the minimum and maximum values for each feature beforenormalization. Can be used in subsequent calls tonormalize for newdata.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)cluster(clusterer=makeLearner("cluster.XMeans"), data=folds, pre=normalize)}

Penalized average runtime score

Description

Calculates the penalized average runtime score which is commonly used forevaluating satisfiability solvers on a set of problems.

Usage

parscores(data, model, factor = 10, timeout, addCosts = NULL)

Arguments

data

the data used to induce the model. The same as given toclassify,classifyPairs,cluster orregression.

model

the algorithm selection model. Can be either a modelreturned by one of the model-building functions or a function that returnspredictions such asvbs or the predictor function of a trainedmodel.

factor

the penalization factor to use for non-successful choices.Default 10.

timeout

the timeout value to be multiplied by the penalization factor.If not specified, the maximum performance value of all algorithms on theentire data is used.

addCosts

whether to add feature costs. You should not need to set thismanually, the default ofNULL will have LLAMA figure outautomatically depending on the model whether to add costs or not. Thisshould always be true (the default) except for comparison algorithms (i.e.single best and virtual best).

Details

Returns the penalized average runtime performances of the respective chosenalgorithm on each problem instance.

If feature costs have been given andaddCosts isTRUE, the cost ofthe used features or feature groups is added to the performance of the chosenalgorithm. The used features are determined by examining the thefeaturesmember ofdata, not the model. If after that the performance value isabove the timeout value, the timeout value multiplied by the factor is assumed.

If the model returnsNA (e.g. because no algorithm solved the instance),timeout * factor is returned as PAR score.

data may contain a train/test partition or not. This makes a differencewhen computing the PAR scores for the single best algorithm. If no train/testsplit is present, the single best algorithm is determined on the entire data. Ifit is present, the single best algorithm is determined on each test partition.That is, the single best is local to the partition and may vary acrosspartitions.

Value

A list of the penalized average runtimes.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)model = classify(classifier=makeLearner("classif.J48"), data=folds)sum(parscores(folds, model))# use factor of 5 instead of 10.sum(parscores(folds, model, 5))# explicitly specify timeout.sum(parscores(folds, model, timeout = 3600))}

Plot convenience functions to visualise selectors

Description

Functions to plot the performance of selectors and compare them to others.

Usage

perfScatterPlot(metric, modelx, modely, datax, datay=datax,    addCostsx=NULL, addCostsy=NULL, pargs=NULL, ...)

Arguments

metric

the metric used to evaluate the model. Can be one ofmisclassificationPenalties,parscores orsuccesses.

modelx

the algorithm selection model to be plotted on the x axis. Canbe either a model returned by one of the model-building functions or afunction that returns predictions such asvbs or the predictorfunction of a trained model.

modely

the algorithm selection model to be plotted on the y axis. Canbe either a model returned by one of the model-building functions or afunction that returns predictions such asvbs or the predictorfunction of a trained model.

datax

the data used to evaluatemodelx. Will be passed to themetric function.

datay

the data used to evaluatemodely. Can be omitted if thesame as formodelx. Will be passed to themetric function.

addCostsx

whether to add feature costs formodelx. You shouldnot need to set this manually, the default ofNULL will have LLAMAfigure out automatically depending on the model whether to add costs ornot. This should always be true (the default) except for comparisonalgorithms (i.e. single best and virtual best).

addCostsy

whether to add feature costs formodely. You shouldnot need to set this manually, the default ofNULL will have LLAMAfigure out automatically depending on the model whether to add costs ornot. This should always be true (the default) except for comparisonalgorithms (i.e. single best and virtual best).

pargs

any arguments to be passed togeom_points.

...

any additional arguments to be passed to the metrics. For examplethe penalisation factor forparscores.

Details

perfScatterPlot creates a scatter plot that compares the performances oftwo algorithm selectors. It plots the performance on each instance in the dataset formodelx on the x axis versusmodely on the y axis. Inaddition, a diagonal line is drawn to denote the line of equal performance forboth selectors.

Value

Aggplot object.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)model = classify(classifier=makeLearner("classif.J48"), data=folds)# Simple plot to compare our selector to the single best in terms of PAR10 scorelibrary(ggplot2)perfScatterPlot(parscores,        model, singleBest,        folds, satsolvers) +    scale_x_log10() + scale_y_log10() +    xlab("J48") + ylab("single best")# additional aesthetics for pointsperfScatterPlot(parscores,        model, singleBest,        folds, satsolvers,        pargs=aes(colour = scorex)) +    scale_x_log10() + scale_y_log10() +    xlab("J48") + ylab("single best")}

Regression model

Description

Build a regression model that predicts the algorithm to use based on thefeatures of the problem and optionally features of the algorithms.

Usage

regression(regressor = NULL, data = NULL,    pre = function(x, y=NULL) { list(features=x) },    combine = NULL, expand = identity, save.models = NA,    use.weights = TRUE)

Arguments

regressor

the mlr regressor to use. See examples.

data

the data to use with training and test sets. The structure returned byone of the partitioning functions.

pre

a function to preprocess the data. Currently onlynormalize.Optional. Does nothing by default.

combine

the function used to combine the predictions of the individual regressionmodels for stacking. DefaultNULL. See details.

expand

a function that takes a matrix of performance predictions (columns arealgorithms, rows problem instances) and transforms it into a matrix with thesame number of rows. Only meaningful ifcombine is not null. Default isthe identity function, which will leave the matrix unchanged. See examples.

save.models

Whether to serialize and save the models trained during evaluation of themodel. If notNA, will be used as a prefix for the file name.

use.weights

Whether to use instance weights if supported. DefaultTRUE.

Details

regression takesdata and processes it usingpre (ifsupplied). If no algorithm features are provided,regressor is called to induce separate regression models foreach of the algorithms to predict its performance. When algorithm features are present,regressor is called to induce one regression model for all algorithms to predict their performance.The best algorithm isdetermined from the predicted performances by examining whether performance isto be minimized or not, as specified when creating the data structure throughinput.

The evaluation across the training and test sets will be parallelizedautomatically if a suitable backend for parallel computation is loaded.TheparallelMap level is "llama.fold".

Ifcombine is not null, it is assumed to be an mlr classifier and will beused to learn a model to predict the best algorithm given the original featuresand the performance predictions for the individual algorithms.combine optionis currently not supported with algorithm features. If thisclassifier supports weights anduse.weights isTRUE, they will bepassed as the difference between the best and the worst algorithm. Optionally,expand can be used to supply a function that will modify the predictionsbefore giving them to the classifier, e.g. augment the performance predictionswith the pairwise differences (see examples).

If all predictions of an underlying machine learning model areNA, theprediction will beNA for the algorithm and-Inf for the score ifthe performance value is to be maximised,Inf otherwise.

Value

predictions

predictor

a function that encapsulates the regression model learned ontheentire data set. Can be called with data for the same features withthe same feature names as the training data to obtain predictions in the sameformat as thepredictions member.

models

the list of models trained on theentire data set. This ismeant for debugging/inspection purposes and does not include any models used tocombine predictions of individual models.

Author(s)

Lars Kotthoff

References

Kotthoff, L. (2012)Hybrid Regression-Classification Models for Algorithm Selection.20th European Conference on Artificial Intelligence, 480–485.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)res = regression(regressor=makeLearner("regr.lm"), data=folds)# the total number of successessum(successes(folds, res))# predictions on the entire data setres$predictor(satsolvers$data[satsolvers$features])res = regression(regressor=makeLearner("regr.ksvm"), data=folds)# combine performance predictions using classifierress = regression(regressor=makeLearner("regr.ksvm"),                  data=folds,                  combine=makeLearner("classif.J48"))# add pairwise differences to performance predictions before running classifierress = regression(regressor=makeLearner("regr.ksvm"),                  data=folds,                  combine=makeLearner("classif.J48"),                  expand=function(x) { cbind(x, combn(c(1:ncol(x)), 2,                         function(y) { abs(x[,y[1]] - x[,y[2]]) })) })}

Regression model for pairs of algorithms

Description

Builds regression models for each pair of algorithms that predict theperformance difference based on the features of the problem and optionally features of the algorithms. The sum over all pairs that involve a particular algorithm is aggregated as the score of thealgorithm.

Usage

regressionPairs(regressor = NULL, data = NULL,    pre = function(x, y=NULL) { list(features=x) }, combine = NULL,    save.models = NA, use.weights = TRUE)

Arguments

regressor

the regression function to use. Must accept a formula of the values to predictand a data frame with features. Return value should be a structure that can begiven topredict along with new data. See examples.

data

the data to use with training and test sets. The structure returned byone of the partitioning functions.

pre

a function to preprocess the data. Currently onlynormalize.Optional. Does nothing by default.

combine

the function used to combine the predictions of the individual regressionmodels for stacking. DefaultNULL. See details.

save.models

Whether to serialize and save the models trained during evaluation of themodel. If notNA, will be used as a prefix for the file name.

use.weights

Whether to use instance weights if supported. DefaultTRUE.

Details

regressionPairs takes the training and test sets indata andprocesses it usingpre (if supplied). If no algorithm features are provided,regressor is called to induce a regression model for each pair of algorithms to predict the performance difference between them. When algorithm features are present,regressor is called toinduce one regression model for all pairs of algorithms to predict the performance difference between them. Ifcombine is not supplied, the best overallalgorithm is determined by summing the performance differences over all pairsfor each algorithm and ranking them by this sum. The algorithm with the largestvalue is chosen. If it is supplied, it is assumed to be an mlr classifier. Thisclassifier is passed the original features and the predictions for each pair ofalgorithms.combine option is currently not supported with algorithm features.If the classifier supports weights anduse.weights isTRUE, the performance difference between the best and the worst algorithmis passed as weight.

The aggregated score for each algorithm quantifies how much better it is thanthe other algorithms, where bigger values are better. Positive numbers denotethat the respective algorithm usually exhibits better performance than most ofthe other algorithms, while negative numbers denote that it is usually worse.

The evaluation across the training and test sets will be parallelizedautomatically if a suitable backend for parallel computation is loaded.TheparallelMap level is "llama.fold".

Training this model can take a very long time. Givenn algorithms,choose(n, 2) * n models are trained and evaluated. This is significantlyslower than the other approaches that train a single model or one for eachalgorithm. Even with algorithmic features present, when only a single model is trained, the process still takes a long time due to the amount of data.

If all predictions of an underlying machine learning model areNA, theprediction will beNA for the algorithm and-Inf for the score ifthe performance value is to be maximised,Inf otherwise.

Value

predictions

predictor

models

the models for each pair of algorithms trained on theentire data set. This is meant for debugging/inspection purposes anddoes not include any models used to combine predictions of individual models.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)model = regressionPairs(regressor=makeLearner("regr.lm"), data=folds)# the total number of successessum(successes(folds, model))# predictions on the entire data setmodel$predictor(satsolvers$data[satsolvers$features])# combine predictions using J48 induced classifiermodel = regressionPairs(regressor=makeLearner("regr.lm"), data=folds,    combine=makeLearner("classif.J48"))}

Example data for Leveraging Learning to Automatically Manage Algorithms

Description

Performance data for 19 SAT solvers on 2433 SAT instances.

Usage

data(satsolvers)

Format

satsolvers is a list in the format returned byinput andexpected by the other functions of LLAMA. The list has the following components.

data:

The original input data merged. That is, the data frames processedbyinput in a single data frame with the following additionalcolumns.

best:: The algorithm(s) with the best performance for each row.
*_success:: For each algorithm whether it was successful on therespective row.

features:

The names of the columns that contain feature values.

performance:

The names of the columns that contain performance data.

success:

The names of the columns indicating whether an algorithm wassuccessful.

minimize:

Whether the performance is to be minimized.

cost:

The names of the columns that contain the feature groupcomputation cost for each instance.

costGroups:

A list the maps the names of the feature groups to the listof feature names that are contained in it.

Details

Performance data for 19 SAT solvers on 2433 SAT instances. For each instance, 36features were measured. In addition to the performance (time) on each instance,data on whether a solver timed out on an instance is included. The cost tocompute all features is included as well.

Source

Hurley, B., Kotthoff, L., Malitsky, Y., O'Sullivan, B. (2014)Proteus: A Hierarchical Portfolio of Solvers and Transformations.Eleventh International Conference on Integration of ArtificialIntelligence (AI) and Operations Research (OR) techniques in ConstraintProgramming.

Examples

data(satsolvers)

Success

Description

Was the problem solved successfully using the chosen algorithm?

Usage

successes(data, model, timeout, addCosts = NULL)

Arguments

data

the data used to induce the model. The same as given toclassify,classifyPairs,cluster orregression.

model

the algorithm selection model. Can be either a modelreturned by one of the model-building functions or a function that returnspredictions such asvbs or the predictor function of a trainedmodel.

timeout

the timeout value to be multiplied by the penalization factor.If not specified, the maximum performance value of all algorithms on theentire data is used.

addCosts

Details

ReturnsTRUE if the chosen algorithm successfully solved the probleminstance,FALSE otherwise for each problem instance.

If feature costs have been given andaddCosts isTRUE, the cost ofthe used features or feature groups is added to the performance of the chosenalgorithm. The used features are determined by examining the thefeaturesmember ofdata, not the model. If after that the performance value isabove the timeout value,FALSE is assumed. If whether an algorithm wassuccessful is not determined by performance and feature costs, don't pass costswhen creating the LLAMA data frame.

If the model returnsNA (e.g. because no algorithm solved the instance),FALSE is returned as success.

data may contain a train/test partition or not. This makes a differencewhen computing the successes for the single best algorithm. If no train/testsplit is present, the single best algorithm is determined on the entire data. Ifit is present, the single best algorithm is determined on each test partition.That is, the single best is local to the partition and may vary acrosspartitions.

Value

A list of the success values.

Author(s)

Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {data(satsolvers)folds = cvFolds(satsolvers)model = classify(classifier=makeLearner("classif.J48"), data=folds)sum(successes(folds, model))}

Train / test split

Description

Split a data set into train and test set.

Usage

trainTest(data, trainpart = 0.6, stratify = FALSE)

Arguments

data

the data to use. The structure returned byinput.

trainpart

the fraction of the data to use for training. Default 0.6.

stratify

whether to stratify the folds. Makes really only sense for classificationmodels. Defaults toFALSE.

Details

Partitions the data set into training and test set according to the specifiedfraction. The training and test index sets are added to the original data andreturned. If requested, the distribution of the best algorithms in training andtest set is approximately the same, i.e. the sets are stratified.

If the data set has train and test partitions already, they are overwritten.

Value

train

a (one-element) list of index sets for training.

test

a (one-element) list of index sets for testing.

...

the original members ofdata. Seeinput.

Author(s)

Lars Kotthoff

Examples

data(satsolvers)trainTest = trainTest(satsolvers)# use 50-50 split instead of 60-40trainTest1 = trainTest(satsolvers, 0.5)# stratifytrainTestU = trainTest(satsolvers, stratify=TRUE)

Tune the hyperparameters of the machine learning algorithm underlying a model

Description

Functions to tune the hyperparameters of the machine learning algorithmunderlying a model with respect to a performance measure.

Usage

tuneModel(ldf, llama.fun, learner, design, metric = parscores, nfolds = 10L,    quiet = FALSE)

Arguments

ldf

the LLAMA data to use. The structure returned byinput.

llama.fun

the LLAMA model building function.

learner

the mlr learner to use.

design

the data frame denoting the parameter values to try. Can beproduced with theParamHelpers package. See examples.

metric

the metric used to evaluate the model. Can be one ofmisclassificationPenalties,parscores orsuccesses.

nfolds

the number of folds. Defaults to 10. If -1 is given,leave-one-out cross-validation folds are produced.

quiet

whether to output information on the intermediate values andprogress during tuning.

Details

tuneModel finds the hyperparameters from the set denoted bydesignof the machine learning algorithmlearner that give the best performancewith respect to the measuremetric for the LLAMA model typellama.fun on dataldf. It uses a nested cross-validationinternally; the number of inner folds is given throughnfolds, the numberof outer folds is either determined by any existing partitions ofldf or,if none are present, bynfolds as well.

During each iteration of the inner cross-validation, all parameter setsspecified indesign are evaluated and the one with the best performancevalue chosen. The mean performance over all instances in the data is logged forall evaluations. This parameter set is then used to build and evaluate a modelin the outer cross-validation. The predictions made by this model along with theparameter values used to train it are returned.

Finally, a normal (not-nested) cross-validation is performed to find the bestparameter values on theentire data set. The predictor of this modelalong with the parameter values used to train it is returned. The interfacecorresponds to the normal LLAMA model-building functions in that respect – thereturned data structure is the same with a few additional values.

The evaluation across the folds sets will be parallelized automatically if asuitable backend for parallel computation is loaded. TheparallelMaplevel is "llama.tune".

Value

predictions

a data frame with the predictions for each instance and testset. The structure is the same as for the underlying model building functionand the predictions are the ones made by the models trained with the bestparameter values for the respective fold.

predictor

a function that encapsulates the classifier learned on theentire data set with the best parameter values determined on theentire data set. Can be called with data for the same features with thesame feature names as the training data to obtain predictions in the sameformat as thepredictions member.

models

the list of models trained on theentire data set. This ismeant for debugging/inspection purposes.

parvals

the best parameter values on the entire data set used fortraining thepredictor model.

inner.parvals

the best parameter values during each iteration of theouter cross-validation. These parameters were used to train the models thatmade the predictions inpredictions.

Author(s)

Bernd Bischl, Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {library(ParamHelpers)data(satsolvers)learner = makeLearner("classif.J48")# parameter set for J48ps = makeParamSet(makeIntegerParam("M", lower = 1, upper = 100))# generate 10 random parameter setsdesign = generateRandomDesign(10, ps)# tune with respect to PAR10 score (default) with 10 outer and inner folds# (default)res = tuneModel(satsolvers, classify, learner, design)}

Movatterモバイル変換

Leveraging Learning to Automatically Manage Algorithms

Description

Details

Author(s)

References

Examples

Analysis functions

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Bootstrapping folds

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Classification model

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Classification model for pairs of algorithms

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Cluster model

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Cross-validation folds

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Helpers

Description

Usage

Arguments

Author(s)

Impute censored values

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Read data

Description

Usage