Movatterモバイル変換

optigrab

Command-line parsing for an R world

ABSTRACT

optigrab simplifies the creation of command-line interfaces.It favors a easy-to-use, straight-forward conventions that covers 99% ofuse cases over more complex configuration command-line parsing solutionswithout sacrificing features when needed.

INSTALLATION

From Git Hub:

devtools::install_github('decisionpatterns/optigrab')

From CRAN:

install.packages('optigrab')

USAGE

Getting a command-line option, is easy:

opt_get('foo')         # -OR-foo <- opt_get('foo')  # -OR-

Or, for the truly lazy:

opt_assign('foo')

Other examples:

name  <- opt_get( 'name' )dates <- as.Date( opt_get( 'dates', n=2 ) )  # SAMEyesNo <- opt_get( c( 'yes', 'y' ), n=0 )     # LOGICAL

Generate auto-help:

opt_help()

Get verb command:

opt_get_verb()

Set option style

opt_style(ms_style)opt_style(java_style)opt_style(gnu+style)  # The default

ADVANTAGES

optigrab is designed for R in mind. Other packages arederive from package written for other languages. This ignores severalaspects of the R language such as R’s inherent vectorization.
It eschews complex and messy configurations that often clutterthe head of programs.optigrab favors conventions overconfigurations(cf. (CoC)[https://en.wikipedia.org/wiki/Convention_over_configuration]).This design choice allows for a simple, terse and comprehendiblesyntax.

DESIGN PHILOSOPHY

Simple, consise, expressive syntax, especially in apipe-line world
Conventions over Configuration:(CoC)[https://en.wikipedia.org/wiki/Convention_over_configuration])
Support common cases over complex/edge cases
Non-destructive to commandArgs array
Feature complete

FEATURES

Simple syntax
Support vectorized nature of R language:--dates 2014-01-01 2015-12-31
supply convenience functions **--help,-?for usage information
Supportsverb commands, e.g. gitpull
GNU-, Java- and Microsoft-style command line options

LIMITATIONS andFEATURES UNDER DEVELOPMENT

These are things that are not currently supported, but will be comingsoon, if requests are made:

option bundling, e.g. -xvzf ==>-x -v -z -f
auto coercions: this is less likely with the popularity of pipelibraries (e.g. magrittr,pipeR).Coersions are straight-forward.
opt_get(‘count’) %>% as.integer
Simple syntax for specifying both a short and long optionvariants.

BACKGROUND

To start, clearing up some nomenclature will be beneficial.

Options vs Arguments

Command-line options are often referred to as both ‘options’ and‘arguments’. For this document, the term ‘option(s)’ are preferred.‘Arguments’ refers to function or method arguments used within the Rlanguage or other information on the command line that does not have anoption flag. This distinction makes it clear the difference betweenthose values provided on the command-line (“options”) and those providedto functions and methods (“arguments”).

Alternatives

There are already at least three command-line option (“CLOs”)processing solutions for R:

commandArgs() from the base package returns the commandline arguments from when the R program was invoked. It can be used as arudimentary method for option retrieval but lacks the features of afull-featured command-line parsing package.
Theoptparse package follows closely Python’s optparsesemantics and syntax. It provides agetopt that emulatesC-like behaviors. Both of these are designed for languages signigicantlydifferent from R.
Theargparse package has lots of configurations; if youlike writting configuration or need the extra features, this package isfor you.
Thegetopt package invites users to useoptparse andargparse; enough said.

Differentiation fromAlternatives

The main difference betweenoptigrab and the alternativesare:

optigrab handles the entire command-line:script,verbs,options andtargets
optigrab is designed for R and accomodating vectorarguments

Handling CL options in R is tricky. R variables are not single scalarvalues, but are vectors that can assume many values. It is notunreasonable to assume that command-line options should accommodatevectors by default.

Common programming practice is to assign one variable at a time eachassignment on its ownline. Packagesgetopt,optparse andargparse; require theuser to write a specification that parses command-line and assignsvalues all-at-once. For a application that support many options, thespecification quickly becomes complex and hard to read/follow/debug.These packages assign values to a list and then subsequently referencedand validated as needed. This means that the logic for parsing thecommand-line and using those value, e.g. to build objects is often timesdistant in the program, making debugging doubly hard.

There are good reasons for the specification of option all-at-once.With all specification in one place: * an automatic help file can beprovided

The all-at-once specification does not gracefully handle applicationwhose arguments are indeterminant or not known at execution time. Thismay be typical of certain applications.

While the define-all-at-once syntax works; a better approach is tohave the abillity to specify each option when it is needed.

The optigrab package provides a solution to both of these problems.Supplied flags can read and parse command-line options as vectorsvectors; and, option parsing can occur incrementally allowing theprogrammer to deal with each option one-at-a-timei, leading to a morereadable syntax.

COMMAND LINE

There are a number of idioms for specifying program inputs. A fairlytypically call will look something like:

prog –name=val opt1 opt2 target prog [flag[(=| )value [value][value…] ]… [command] [arg1 [arg2 [argn]]]

Generically, the GNU-style command-line syntax style that looks likethis:

prog [[-n [val1]]|[–name [val1 val2 …]]] command [arg1 …]

The various components:

prog    : the name of the program/script-n      : short-form option valN    : one or more values --name  : long-form option command : the (sub)command to the program, e.g. programs like gitargN    : Unamed arguments often targets

Though options and arguments both appear on the command line, theyare different. Options are denoted with flags and have names that areassigned values. Arguments, on the other hand, are unnamed. Thisdifference is analogous to named and positional arguments in a functioncall. Unnamed arguments are simpler. They are useful for great whenthere the supplied values mean the same thing Options are better forcomplex situations. Arguments are A good CLO processing package providesaccess to both options and unnamed arguments.

If each option is assumed to take a scalar value, the example is:

prog --name w --name2 x y z

The problem becomes difficult inR when we consider that thevariables are vectors and not simple scalars. Variables assume multiplevalues. Consider the previous example. Is the value of option#2 val2? Oris it (val2, arg1, arg2, arg3)? It is ambiguous.

A good deal of the time, it doesn’t matter. Most often only one valueis needed. One solution often deployed. is to always specify the numberof values needed by the options.

Options

An option is one or more values provided to the program from thecommand-line. They: * can be optional or required * have 0 or morevalues * have a default value * may be coerced into various types orclasses

(Unbound) Arguments

In addition to options, the command line may also contain unboundargument such as one or more file paths. The distinction of betweenoptions and arguments is not always clear. Both occur on thecommand-line and both supply values to the program. The main differnceis that options provide a name and a value and can always appear in anyorder. This is nice since it requires the user to remember the name ofarguments rather than there order. This is cognitively much simpler andis analogous to the difference between calling a function with namedarguments rather than positional ones.

Arguments, on the other hand, follow two patterns. They are eitherall the same type, such as a list of filenames or they are dependentupon the ordering such as x,y,z coordinates. Many programs use botharguments and options. In this situation, it is good practice to haveall options preceed all arguments. Some programs allow arguments andoptions can be interspersed. When interspersed, it becomes cumbersome toseperate options from arguments. In fact, it can be impossible todistinguish the if the number of values supplied for each option isunknown. Thus, it is important to always specify the number of valuesrequired by the option.

Flags

Option names are specified with flags. Flags should begin with“–” or “-” followed by one or more alpha-characters. Generally,
long versions of flags begin with “–” followed by the full namefor the option.
Short versions of the flags begin with a single hyphen, “-” andusually are named with a single alpha-character.
Many names/aliases for the same option are not a goodidea.
Always have a long version flag.
Consider short versions for very common arguments.
Flags should be named with names understandable to the user andnot the author.
All flags and aliases should be explicitly specified.

Values

Are always initially interpretted as character values. Later,they may be coerced into different types.
Options may have 0, 1 or more values.
The type of value returned may be specified through a coercionfunction.
Only logical options can have 0 values. If present, the option isset to TRUE. Otherwise, it is FALSE. Logical values may take more than 0values, e.g. if an array of logical is wanted, but by convention shouldtake 0 values for simple options.
The number of values may be deterministic or indeterminstice. Inthe later case, it is most common to want at least n values. Thesevalues are taken greedily. This is generally not supported.
Values may also have an indeterminate number of values.
Flags requested, but not found should return the ‘NA’value.
Required values. Values may be required. If this is the case andno value is supplied, then an error should be thrown indicating that arequired value was not supplied.
If a value is not required nor provided, then the default shouldbe used. If no default has been specified than NA should bereturned.
Greediness. One alternative to specifying the number of values isto greedily accept values. Thus, a flag indicates that all the argumentsfollowing it should be considered values until either another flag isencountered or the end of the argument array is reached.

logical values

Logical values present an interesting challenge to command-lineprocessing. They are the only type that can accept 0 values. In fact,this is the default for logical values. If the flag is present, then thevalue is set to TRUE otherwise it is set to FALSE or it’s default

default values

The use of default values is an a nice addition to command-lineprocessing.

coercion

Other command-line parsing programs require specification of theoptions type. This is needlessly verbose for three reasons:

Often the developer will need to post-process the optionanyhow.
R often does coercisions as needed and explicit coercion is notneeded
Packages providing a pipe operator already allow for a concisesyntax
‘foo’ %>% opt_get %>% as.integer

Inoptigrab, flags and values are initially interpreted asstrings. These may be subsequently coerced into any valid type or classthrough a coercion function.

PROCESSING

At present, most command line processing libraries require fullspecification for all options. This requires a often very complicatedspecifications at the program’s beginning. Most programs, however, havevery simple option processing requirements. It is, therefore, desirablethat options be able to be processed with a very simply and clean syntax– one that better fits into the flow of the program. An example wouldallow the retrival of one option on every line. For example, to get aname and date, you might do the following:

name <- opt_get( "--name" )date <- opt_get( "--date", default=Sys.Date )

In the first line, all we need to do is specify the option flag. Thisis because the default is to return a string value or NA if not found.On the second line, the function returns a single value coerced byas.Date. If no value is supplied it defaults to today’s date.

Since the processing of options is serialized, the processing of alloptions should be done prior to grabing options. If this is done, we canuse theopt_get functions to specify do the apriorispecifications to handle the

BATCH -vs- INTERACTIVE MODE

Command-line option processing should mostly focus on batchprocessing since this is the most common usage scenario. Still,processing should work in Interactive mode for development. In fact, inthis use case, it is important that the parsing be able to handle anarray of options different from the actual arguments used to start thesession.

In Batch processing, the preferred way to launch a session is byusing Rscript. Rscript introduces several arguments to the command-line.These are:

[1] "/usr/lib/R/bin/exec/R" "--slave"                                    [3] "--no-restore"          "--file=./test.r"                             [5] "--args

These are not really arguments from the script itself, thus allarguments up to and including the first--args are notconsidered options.

FAQ

Q. Why is this called optigrab A. I am a Jerk. SeeReferences below.

REFERENCES

The Jerk. Dir. Carl Reiner. Perf Steve Martin, BernadettePeters, Caitlin Adams. Universal Pictures, 1979.http://www.imdb.com/title/tt0079367/

GNUcommand-line standards

commandline

[8]ページ先頭