Command-line parsing for an R world
optigrab simplifies the creation of command-line interfaces.It favors a easy-to-use, straight-forward conventions that covers 99% ofuse cases over more complex configuration command-line parsing solutionswithout sacrificing features when needed.
From Git Hub:
devtools::install_github('decisionpatterns/optigrab')From CRAN:
install.packages('optigrab')Getting a command-line option, is easy:
opt_get('foo') # -OR-foo <- opt_get('foo') # -OR-Or, for the truly lazy:
opt_assign('foo')Other examples:
name <- opt_get( 'name' )dates <- as.Date( opt_get( 'dates', n=2 ) ) # SAMEyesNo <- opt_get( c( 'yes', 'y' ), n=0 ) # LOGICALGenerate auto-help:
opt_help()Get verb command:
opt_get_verb()Set option style
opt_style(ms_style)opt_style(java_style)opt_style(gnu+style) # The defaultoptigrab is designed for R in mind. Other packages arederive from package written for other languages. This ignores severalaspects of the R language such as R’s inherent vectorization.
It eschews complex and messy configurations that often clutterthe head of programs.optigrab favors conventions overconfigurations(cf. (CoC)[https://en.wikipedia.org/wiki/Convention_over_configuration]).This design choice allows for a simple, terse and comprehendiblesyntax.
--dates 2014-01-01 2015-12-31--help,-?for usage informationThese are things that are not currently supported, but will be comingsoon, if requests are made:
option bundling, e.g. -xvzf ==>-x -v -z -f
auto coercions: this is less likely with the popularity of pipelibraries (e.g. magrittr,pipeR).Coersions are straight-forward.
opt_get(‘count’) %>% as.integer
Simple syntax for specifying both a short and long optionvariants.
To start, clearing up some nomenclature will be beneficial.
Command-line options are often referred to as both ‘options’ and‘arguments’. For this document, the term ‘option(s)’ are preferred.‘Arguments’ refers to function or method arguments used within the Rlanguage or other information on the command line that does not have anoption flag. This distinction makes it clear the difference betweenthose values provided on the command-line (“options”) and those providedto functions and methods (“arguments”).
There are already at least three command-line option (“CLOs”)processing solutions for R:
commandArgs() from the base package returns the commandline arguments from when the R program was invoked. It can be used as arudimentary method for option retrieval but lacks the features of afull-featured command-line parsing package.
Theoptparse package follows closely Python’s optparsesemantics and syntax. It provides agetopt that emulatesC-like behaviors. Both of these are designed for languages signigicantlydifferent from R.
Theargparse package has lots of configurations; if youlike writting configuration or need the extra features, this package isfor you.
Thegetopt package invites users to useoptparse andargparse; enough said.
The main difference betweenoptigrab and the alternativesare:
Handling CL options in R is tricky. R variables are not single scalarvalues, but are vectors that can assume many values. It is notunreasonable to assume that command-line options should accommodatevectors by default.
Common programming practice is to assign one variable at a time eachassignment on its ownline. Packagesgetopt,optparse andargparse; require theuser to write a specification that parses command-line and assignsvalues all-at-once. For a application that support many options, thespecification quickly becomes complex and hard to read/follow/debug.These packages assign values to a list and then subsequently referencedand validated as needed. This means that the logic for parsing thecommand-line and using those value, e.g. to build objects is often timesdistant in the program, making debugging doubly hard.
There are good reasons for the specification of option all-at-once.With all specification in one place: * an automatic help file can beprovided
The all-at-once specification does not gracefully handle applicationwhose arguments are indeterminant or not known at execution time. Thismay be typical of certain applications.
While the define-all-at-once syntax works; a better approach is tohave the abillity to specify each option when it is needed.
The optigrab package provides a solution to both of these problems.Supplied flags can read and parse command-line options as vectorsvectors; and, option parsing can occur incrementally allowing theprogrammer to deal with each option one-at-a-timei, leading to a morereadable syntax.
There are a number of idioms for specifying program inputs. A fairlytypically call will look something like:
prog –name=val opt1 opt2 target prog [flag[(=| )value [value][value…] ]… [command] [arg1 [arg2 [argn]]]
Generically, the GNU-style command-line syntax style that looks likethis:
prog [[-n [val1]]|[–name [val1 val2 …]]] command [arg1 …]
The various components:
prog : the name of the program/script-n : short-form option valN : one or more values --name : long-form option command : the (sub)command to the program, e.g. programs like gitargN : Unamed arguments often targetsThough options and arguments both appear on the command line, theyare different. Options are denoted with flags and have names that areassigned values. Arguments, on the other hand, are unnamed. Thisdifference is analogous to named and positional arguments in a functioncall. Unnamed arguments are simpler. They are useful for great whenthere the supplied values mean the same thing Options are better forcomplex situations. Arguments are A good CLO processing package providesaccess to both options and unnamed arguments.
If each option is assumed to take a scalar value, the example is:
prog --name w --name2 x y zThe problem becomes difficult inR when we consider that thevariables are vectors and not simple scalars. Variables assume multiplevalues. Consider the previous example. Is the value of option#2 val2? Oris it (val2, arg1, arg2, arg3)? It is ambiguous.
A good deal of the time, it doesn’t matter. Most often only one valueis needed. One solution often deployed. is to always specify the numberof values needed by the options.
An option is one or more values provided to the program from thecommand-line. They: * can be optional or required * have 0 or morevalues * have a default value * may be coerced into various types orclasses
In addition to options, the command line may also contain unboundargument such as one or more file paths. The distinction of betweenoptions and arguments is not always clear. Both occur on thecommand-line and both supply values to the program. The main differnceis that options provide a name and a value and can always appear in anyorder. This is nice since it requires the user to remember the name ofarguments rather than there order. This is cognitively much simpler andis analogous to the difference between calling a function with namedarguments rather than positional ones.
Arguments, on the other hand, follow two patterns. They are eitherall the same type, such as a list of filenames or they are dependentupon the ordering such as x,y,z coordinates. Many programs use botharguments and options. In this situation, it is good practice to haveall options preceed all arguments. Some programs allow arguments andoptions can be interspersed. When interspersed, it becomes cumbersome toseperate options from arguments. In fact, it can be impossible todistinguish the if the number of values supplied for each option isunknown. Thus, it is important to always specify the number of valuesrequired by the option.
Option names are specified with flags. Flags should begin with“–” or “-” followed by one or more alpha-characters. Generally,
long versions of flags begin with “–” followed by the full namefor the option.
Short versions of the flags begin with a single hyphen, “-” andusually are named with a single alpha-character.
Many names/aliases for the same option are not a goodidea.
Always have a long version flag.
Consider short versions for very common arguments.
Flags should be named with names understandable to the user andnot the author.
All flags and aliases should be explicitly specified.
Are always initially interpretted as character values. Later,they may be coerced into different types.
Options may have 0, 1 or more values.
The type of value returned may be specified through a coercionfunction.
Only logical options can have 0 values. If present, the option isset to TRUE. Otherwise, it is FALSE. Logical values may take more than 0values, e.g. if an array of logical is wanted, but by convention shouldtake 0 values for simple options.
The number of values may be deterministic or indeterminstice. Inthe later case, it is most common to want at least n values. Thesevalues are taken greedily. This is generally not supported.
Values may also have an indeterminate number of values.
Flags requested, but not found should return the ‘NA’value.
Required values. Values may be required. If this is the case andno value is supplied, then an error should be thrown indicating that arequired value was not supplied.
If a value is not required nor provided, then the default shouldbe used. If no default has been specified than NA should bereturned.
Greediness. One alternative to specifying the number of values isto greedily accept values. Thus, a flag indicates that all the argumentsfollowing it should be considered values until either another flag isencountered or the end of the argument array is reached.
Logical values present an interesting challenge to command-lineprocessing. They are the only type that can accept 0 values. In fact,this is the default for logical values. If the flag is present, then thevalue is set to TRUE otherwise it is set to FALSE or it’s default
The use of default values is an a nice addition to command-lineprocessing.
Other command-line parsing programs require specification of theoptions type. This is needlessly verbose for three reasons:
Often the developer will need to post-process the optionanyhow.
R often does coercisions as needed and explicit coercion is notneeded
Packages providing a pipe operator already allow for a concisesyntax
‘foo’ %>% opt_get %>% as.integer
Inoptigrab, flags and values are initially interpreted asstrings. These may be subsequently coerced into any valid type or classthrough a coercion function.
At present, most command line processing libraries require fullspecification for all options. This requires a often very complicatedspecifications at the program’s beginning. Most programs, however, havevery simple option processing requirements. It is, therefore, desirablethat options be able to be processed with a very simply and clean syntax– one that better fits into the flow of the program. An example wouldallow the retrival of one option on every line. For example, to get aname and date, you might do the following:
name <- opt_get( "--name" )date <- opt_get( "--date", default=Sys.Date )In the first line, all we need to do is specify the option flag. Thisis because the default is to return a string value or NA if not found.On the second line, the function returns a single value coerced byas.Date. If no value is supplied it defaults to today’s date.
Since the processing of options is serialized, the processing of alloptions should be done prior to grabing options. If this is done, we canuse theopt_get functions to specify do the apriorispecifications to handle the
Command-line option processing should mostly focus on batchprocessing since this is the most common usage scenario. Still,processing should work in Interactive mode for development. In fact, inthis use case, it is important that the parsing be able to handle anarray of options different from the actual arguments used to start thesession.
In Batch processing, the preferred way to launch a session is byusing Rscript. Rscript introduces several arguments to the command-line.These are:
[1] "/usr/lib/R/bin/exec/R" "--slave" [3] "--no-restore" "--file=./test.r" [5] "--argsThese are not really arguments from the script itself, thus allarguments up to and including the first--args are notconsidered options.
Q. Why is this called optigrab A. I am a Jerk. SeeReferences below.
The Jerk. Dir. Carl Reiner. Perf Steve Martin, BernadettePeters, Caitlin Adams. Universal Pictures, 1979.http://www.imdb.com/title/tt0079367/