Movatterモバイル変換

FAMoS provides an automated and unbiased model selection algorithmthat aims at determining the most appropriate subset of model parametersto describe a specific data set. Due to its flexibility with respect tothe cost/optimisation function, FAMoS can handle many differentmathematical structures, including for example regression models andODEs.

Installation

# install.packages("devtools")devtools::install_github("GabelHub/FAMoS")# alternative installation commanddevtools::install_git("git://github.com/GabelHub/FAMoS.git",branch ="master")

Features

Adaptive methods of modelselection

FAMoS keeps track of the methods used in the previous iterations anddynamically changes them according to the outcome of each iteration.

Flexibility

FAMoS is designed to allow for a maximum of flexibility regarding thefitting procedures and model types in R. While it comes with the defaultoption to fit the cost function viaoptim, it also allows theusers to specify their own optimisation routines, hence making itpossible to perform model selection based on various other Rpackages.

Easy parallelisation

FAMoS makes use of the future-package which allows for easyparallelisation, meaning many different models can be testedsimultaneously if the required computational resources areavailable.

Smart testing procedures

FAMoS keeps track of previously tested models and checks also thateach model fulfills all user-specified restrictions, therefore testingonly relevant models and saving computational resources.

Example

As a simple example, we generate a simple data set generated by twoparameters and apply FAMoS on a global model consisting of fivedifferent parameters.

library(FAMoS)#setting datatrue.p2<-3true.p5<-2sim.data<-cbind.data.frame(range =1:10,y = true.p2^2* (1:10)^2-exp(true.p5* (1:10)))#define initial parameter values and corresponding test functioninits<-c(p1 =3,p2 =4,p3 =-2,p4 =2,p5 =0)cost_function<-function(parms, binary, data){if(max(abs(parms))>5){return(NA)  }with(as.list(c(parms)), {    res<- p1*4+ p2^2*data$range^2+ p3*sin(data$range)+ p4*data$range-exp(p5*data$range)    diff<-sum((res- data$y)^2)#calculate AICC    nr.par<-length(which(binary==1))    nr.data<-nrow(data)    AICC<- diff+2*nr.par+2*nr.par*(nr.par+1)/(nr.data- nr.par-1)return(AICC)  })}#set swap setswaps<-list(c("p1","p5"))#perform model selectionres<-famos(init.par = inits,fit.fn = cost_function,homedir =tempdir(),method ="swap",swap.parameters = swaps,init.model.type =c("p1","p3"),optim.runs =1,data = sim.data)

FAMoS returns a lot of verbose output, telling the user what’scurrently happening (Note: The output can be turned on and off by usingthe optionverbose). In the beginning, the overall settings aredefined and the corresponding directories are created (if they don’texist).

#> Initializing...#> Create FAMoS directory...#>#> Algorithm run: 001#> Refitting disabled.#> Starting algorithm with method 'swap'

In each iteration, FAMoS identifies new models to be tested based onthe current search method:

#> FAMoS iteration #3 - method: forward#> Add parameter p1#> Add parameter p2#> Add parameter p4#> Time passed since start: 00:00:00

Each model will be submitted and tested. Since FAMoS uses futures forevaluation, the search process can be easily parallelised by setting thecorresponding future plan. Every model is subsequently evaluated byperforming (multiple) optimisation routines based either on the defaultfitting routineoptim or a user-specified fitting routine (seethe vignettes for examples).

After all models have been evaluated, the algorithm reads in theresults and checks, if a better model was found

#> Evaluate results ...#> Best selection criterion value of this run is 10#> Parameter p2 was added#> Time passed since start: 1.92 secs

The cycle continues until no better model is found based on thecurrently used methods. After halting, the results are returned

#> Best model found. Algorithm stopped.#> FAMoS run 001#> Selection criterion value of best model: 7#> Best model (binary): 01001#> Best model (vector):#> p1 p2 p3 p4 p5#>  0  1  0  0  1#> Estimated parameter values:#> p1 p2 p3 p4 p5#>  0  -3 0  0  2#> Time needed: 3.84 secs

FAMoS options in detail

init.par

The vectorinit.par is one of two mandatory variables thatneed to be specified. It contains the names and initial values ofall model parameters, that FAMoS is supposed to analyse. In ourexample above, we specified this vector as

#define initial parameter valuesinits<-c(p1 =3,p2 =4,p3 =-2,p4 =2,p5 =0)

Depending on the starting model, FAMoS automatically extracts thecorresponding values and uses them for its first iteration only. Allfollowing iterations inherit the best values from previous fits.

Additional specifications for the use of the inital parameter vectorcan be supplied by the optionsdo.not.fit anddefault.val.

fit.fn

To allow independence of specific mathematical model structures, theuser can specify any cost or optimisation function. If a cost functionis used, it has to take the complete parameter vector as an input (namesparms) and has to return a selection criterion value. Ifuse.optim = TRUE, the cost function needs to return a single numericvalue, which corresponds to the selection criterion value. However, ifuse.optim = FALSE, the cost function needs to return a list containingin its first entry the selection criterion value and in its second entrythe named vector of the fitted parameter values (non-fitted parametersare internally assessed).

Additionally, the cost and optimisation functions can also use theoptional inputbinary, which contains the binary information ofthe current model, i.e. the information which parameters are currentlyconsidered to be fitted. This is useful to extract the to-be-fittedparameters, if a custom optimisation functions is used

Due to this flexible structure, FAMoS is able to tackle manydifferent problems, e.g. modelling approaches like linear regression,ODEs or PDEs.

homedir

FAMoS generates and saves many different files, in order to makeresults available over time as well as to simultaneously running FAMoSruns.homedir specifies the folder, in which all results aregoing to be stored. The default is set to the current workingdirectory.

do.not.fit

In order to exclude some parameters from the fitting procedures,their names can be specified in thedo.not.fit option. Thisallows to test different model restrictions without needing to changeeitherinit.par orfit.fn. For example, if we wantedto exclude the parameterp4 from our analysis, we would specifyinitially

#define initial parameter valuesinits<-c(p1 =3,p2 =4,p3 =-2,p4 =2,p5 =0)no.fit<-c("p4")

and pass this option on to FAMoS. Note that excluded parameters areautomatically removed from the initial model, ifinit.model.type =“random”,init.model.type = “global” orinit.model.type = “most.distant” is used. If the user-specifiedinitial model contains an excluded parameter, an error will bereturned.

The specified initial model violates critical conditions or the do.not.fit specifications

method

FAMoS can use three different methods to search for different modelsto test: Forward search, backward elimination and swap search. As thealgorithm dynamically changes these methods over the course of eachiteration, the optionmethod only specifies the startingmethod.

If the algorithm is able to find a better model, the current methodwill be used in the next iteration as well (except the swap method,which always uses a forward search next - if it doesn’t terminate inthat step). If no better model is found, the algorithm will change themethod according to the following scheme:

In case the swap method is not used (due to unspecified critical orswap sets), the algorithm will terminate after a succession of anunsuccessful forward and backward search.

init.model.type

current method	previous method	next method
forward	backward	swap (or terminate)
forward	forward or swap	backward
backward	backward	forward
backward	forward	swap (or terminate)
swap	forward or backward	terminate

To verify if FAMoS results are consistent, it is important to run thealgorithm with different starting models. To set the initial model, theuser can either use the built-in optionsrandom (whichgenerates a random model),global (which uses the completemodel as a starting point) ormost.distant (uses the model mostdissimilar to all previously tested models). Alternatively, the user canspecify a model by supplying a parameter vector containing the names ofthe initial model.

#Three options for the starting modelinit.model1<-"random"# generates a random starting modelinit.model2<-"global"# uses all available parametersinit.model3<-"most.distant"# uses the most dissimilar modelinit.model4<-c("p1","p4")# a user-specified model

In caserandom,global ormost.distant arechosen, FAMoS automatically applies critical conditions and removesexcluded parameters (see optionscritical.parameters anddo.not.fit).

refit

Before testing a model, FAMoS checks if this model has been testedbefore. In caserefit = FALSE (default) is specified, the modelwill not be tested again. If refitting is set to TRUE, FAMoS will try tooptimise the model again. If the new run returns a better fit, the oldresults will be overwritten, otherwise the new run will bediscarded.

Refitting makes sense if the model optimisation is dependent on theinitial parameter combination (see alsooptim.runs). If a modelis reencountered, it might well be that the new parameter set to betested with is much more appropriate than the previous one, especiallyif this reencounter happens within the same FAMoS run.

use.optim

The default fitting routine that FAMoS relies on is the built-infunctionoptim. However, by settinguse.optim = FALSE,the user can use any other optimisation routine suitable. Theoptimisation routine then has to be included in the cost functionfit.fn which needs to return a list containing the currentselection value criterion as well as the parameter values used. See thevignettes for an example.

optim.runs

Finding the best fit for each model is crucial to guarantee a correctmodel selection procedure. Often, fitting a model once is enough andrepeating the fitting procedure with different initial conditions doesnot lead to new results. Sometimes, however, one wants to run multiplefits for each model, e.g. if the parameter space is very large. To doso, the user can specifyoptim.runs, which gives the number offitting attempts. For each optimisation run adifferentstarting condition is used. The first fitting attempt takes theinherited parameter vectors from previous runs, while all followingfitting attempts randomly samples parameter vectors to test (see alsorandom.borders).

If multiple optimisation runs are performed, FAMoS will return thebest of these runs.

In each optimisation run fitting in FAMoS is either performed withthe built-in functionoptim, which is repeatedly evaluateduntil convergence, or a custom optimisation routine, which is evaluatedonly once. As the default optimisation method is based on theNelder-Mead approach, which often tends to not give reliable results ifonly one optimisation is performed, the optimisation for each fittingattempt is wrapped into a while-loop, in which the fitting procedure isrepeatedly halted and restarted (based on the optionscontrol.optim), until the relative convergence tolerance incon.tol is reached.

for(iin1:optim.runs){#number of fitting attempts specified by optim.runs  start.parameters<- either the inherited or a randomly sampledset (for i>2)if(use.optim==TRUE){#If use.optim = TRUE, the fitting routine is evaluated in a while loopwhile(abs((old.optim.value- new.optim.value)/old.optim.value)< con.tol){      ... run optim with start.parameters ...      start.parameters<- new parameters estimated by optim    }  }else{#If use.optim = FALSE, the custom optimisation routine is evaluated#only once in each optimisation run    ... run custom optimisation with start parameters ...  }}

default.val

Normally, FAMoS sets the parameters that are not fitted equal tozero. However, this might not be appropriate if, for example, aparameter describes an initial condition or a baseline turnover. Here,default.val allows to specify the value that a parameterassumes, if it is not fitted.default.val needs to be given asa named list, which can either store numerical values or the name of theparameter from which the value should be inherited. For example

#define initial parameter valuesinits<-c(p1 =3,p2 =4,p3 =-2,p4 =2,p5 =0)#set default valuesdef.val<-list(p1 =2,p2 =-5,p3 ="p1",p4 =0,p5 ="p4")

Here, the values ofp1,p2, andp4 are setto their respective values. However,p3 andp5 willinherit their values fromp1 andp4, respectively.This feature is useful if two rates describe similar processes and onewants to test if the difference between them is significant enough towarrant the fitting of an additional parameter. Here’s a shortexample

cost.function<-function(parms){  x<- par1+ par2*x  y<- par3+ par4*x}def.val<-list(p1 =0,p2 =0,p3 ="p1",p4 ="p2")

Note that the parameter inheritance cannot be chained, meaning thatentries that point to another parameter need a numeric value toaccess

#INCORRECT use of default.valdef.val<-list(p1 =1,p2 ="p1",p3 ="p2",p4 ="p3")#CORRECT use of default.valdef.val<-list(p1 =1,p2 ="p1",p3 ="p1",p4 ="p1")

swap.parameters

The swap search that FAMoS can perform relies on sets which specifyparameters that can be swapped by one another. For example, if we wantedto allow parametersp1,p2 andp3, as well asp4 andp5 to be replaceable by each other, we wouldspecify:

swap.set<-list(c("p1","p2","p3"),c("p4","p5"))

critical.parameters

In some cases, it does not make sense to fit certain submodels of theglobal model to the data, as they might lack crucial parameters. FAMoScan incorporate these restrictions by the specification of criticalparameter sets. For example, if at least one of the first threeparameters need to be present in the model, and all models that don’tfeaturep4 are not correct, we can specify:

crit.set<-list(c("p1","p2","p3"),c("p4"))

random.borders

Since the parameters of alloptim.runs larger than one aresampled based on a random uniform distribution, it might be important toset the correct sampling intervals. By default, FAMoS samples parameterswith a 100% deviation of the inherited parameter values (for example, ifa model contains two parameters, and the currently best values arep1 = 0.1 andp2 = -1000, the sampled values will liein the intervals [0,0.2] and [-2000,0], respectively). Alternatively,the user can specify relative or absolute sampling intervals. Forrelative intervals, a numeric value has to be given for each parameterdenoting its relative deviation. For absolute sampling intervals, amatrix containing the lower and upper borders has to be specified.Here’s an example:

#relative sampling rangesrandom.bord1<-0.3# deviates all parameters by 30%random.bord2<-c(0.1,0.5,0.2)# deviates the parameters by 10%, 50% and 20%, respectively#absolute sampling rangesrandom.bord3<-matrix(c(1,2),nrow =1)#uses the interval [1,2] for all parameter samplesrandom.bord4<-cbind(c(0,-10,0.3),c(5,-9,0.7))#uses the intervals [0,5], [-10,-9] and [0.3, 0.7] to sample the respective parameters#use a function to sample the resultsrandom.bord5<- rnorm#note that in this case, 'mean' and 'sd' need to passed to famos as well, if other values than the default settings should be used

control.optim

parscale.pars

If parameters values span over several orders of magnitudes, usingthe built-in optionparscale inoptim can reduce thenumbers of evaluations needed. Settingparscale.pars = TRUEautomatically adjusts the scaling procedure inoptim. In ourexperience, usingparscale.pars = TRUE is usually beneficial ifa large number of parameters with different orders of magnitude need tobe fitted. However, the actual performance is very problem-specific andtherefore we would recommend initially testing both approaches to seewhich one performs better for the problem at hand. Also, one needs tomake sure that the other options given incontrol.optim andcon.tol are specified appropriately.

con.tol

Specifies the relative convergence tolerance and determines when therepeated useoptim fits will be terminated (seeoptim.runs for more details).

save.performance

If true, a plot of the current FAMoS performance is stored in thefolder “FAMoS-Results/Figures/”, which will be updated during eachiteration.

use.futures

To allow for parallelisation, FAMoS uses thefuture packageby Henrik Bengtsson (seehttps://github.com/futureverse/future). To use futures,the option needs to be set to TRUE and a future plan needs to bespecified.

reattempt

By default, FAMoS terminates once all search methods are exhausted.However, ifreattempts is set to true, FAMoS will instead jumpto a distant model and continues to search the model space from there.The algorithm is then terminated if the best model is re-encountered (orif no other models are available to test).

log.interval

If futures are used during a FAMoS run, there will be a messageprinted everyX seconds, informing the user which models fitsare still running.log.interval allows to specify the intervalofX. Default to 10 minutes (600 seconds).

interactive.session

As FAMoS allows to use previously generated results in later runs, itperforms a consistency check to see if the previous results weregenerated by the same cost function. If this is not the case, FAMoSrequires user input to decide what to do next. However, if runnon-locally, supplying user input might not b possible. Therefore,interactive.session can be set to FALSE. This will result inFAMoS issuing a warning instead of an interaction prompt.

verbose

The verbose output of FAMoS can be turned on and off. Ifverbose= FALSE, only a minimum of information is shown.