Movatterモバイル変換


[0]ホーム

URL:


optigrab

Command-line parsing for an R world

lifecycleDownloads

ABSTRACT

optigrab simplifies the creation of command-line interfaces.It favors a easy-to-use, straight-forward conventions that covers 99% ofuse cases over more complex configuration command-line parsing solutionswithout sacrificing features when needed.

INSTALLATION

From Git Hub:

devtools::install_github('decisionpatterns/optigrab')

From CRAN:

install.packages('optigrab')

USAGE

Getting a command-line option, is easy:

opt_get('foo')         # -OR-foo <- opt_get('foo')  # -OR-

Or, for the truly lazy:

opt_assign('foo')

Other examples:

name  <- opt_get( 'name' )dates <- as.Date( opt_get( 'dates', n=2 ) )  # SAMEyesNo <- opt_get( c( 'yes', 'y' ), n=0 )     # LOGICAL

Generate auto-help:

opt_help()

Get verb command:

opt_get_verb()

Set option style

opt_style(ms_style)opt_style(java_style)opt_style(gnu+style)  # The default

ADVANTAGES

  1. optigrab is designed for R in mind. Other packages arederive from package written for other languages. This ignores severalaspects of the R language such as R’s inherent vectorization.

  2. It eschews complex and messy configurations that often clutterthe head of programs.optigrab favors conventions overconfigurations(cf. (CoC)[https://en.wikipedia.org/wiki/Convention_over_configuration]).This design choice allows for a simple, terse and comprehendiblesyntax.

DESIGN PHILOSOPHY

FEATURES

LIMITATIONS andFEATURES UNDER DEVELOPMENT

These are things that are not currently supported, but will be comingsoon, if requests are made:

BACKGROUND

To start, clearing up some nomenclature will be beneficial.

Options vs Arguments

Command-line options are often referred to as both ‘options’ and‘arguments’. For this document, the term ‘option(s)’ are preferred.‘Arguments’ refers to function or method arguments used within the Rlanguage or other information on the command line that does not have anoption flag. This distinction makes it clear the difference betweenthose values provided on the command-line (“options”) and those providedto functions and methods (“arguments”).

Alternatives

There are already at least three command-line option (“CLOs”)processing solutions for R:

Differentiation fromAlternatives

The main difference betweenoptigrab and the alternativesare:

  1. optigrab handles the entire command-line:script,verbs,options andtargets
  2. optigrab is designed for R and accomodating vectorarguments

Handling CL options in R is tricky. R variables are not single scalarvalues, but are vectors that can assume many values. It is notunreasonable to assume that command-line options should accommodatevectors by default.

Common programming practice is to assign one variable at a time eachassignment on its ownline. Packagesgetopt,optparse andargparse; require theuser to write a specification that parses command-line and assignsvalues all-at-once. For a application that support many options, thespecification quickly becomes complex and hard to read/follow/debug.These packages assign values to a list and then subsequently referencedand validated as needed. This means that the logic for parsing thecommand-line and using those value, e.g. to build objects is often timesdistant in the program, making debugging doubly hard.

There are good reasons for the specification of option all-at-once.With all specification in one place: * an automatic help file can beprovided

The all-at-once specification does not gracefully handle applicationwhose arguments are indeterminant or not known at execution time. Thismay be typical of certain applications.

While the define-all-at-once syntax works; a better approach is tohave the abillity to specify each option when it is needed.

The optigrab package provides a solution to both of these problems.Supplied flags can read and parse command-line options as vectorsvectors; and, option parsing can occur incrementally allowing theprogrammer to deal with each option one-at-a-timei, leading to a morereadable syntax.

COMMAND LINE

There are a number of idioms for specifying program inputs. A fairlytypically call will look something like:

prog –name=val opt1 opt2 target prog [flag[(=| )value [value][value…] ]… [command] [arg1 [arg2 [argn]]]

Generically, the GNU-style command-line syntax style that looks likethis:

prog [[-n [val1]]|[–name [val1 val2 …]]] command [arg1 …]

The various components:

prog    : the name of the program/script-n      : short-form option valN    : one or more values --name  : long-form option command : the (sub)command to the program, e.g. programs like gitargN    : Unamed arguments often targets

Though options and arguments both appear on the command line, theyare different. Options are denoted with flags and have names that areassigned values. Arguments, on the other hand, are unnamed. Thisdifference is analogous to named and positional arguments in a functioncall. Unnamed arguments are simpler. They are useful for great whenthere the supplied values mean the same thing Options are better forcomplex situations. Arguments are A good CLO processing package providesaccess to both options and unnamed arguments.

If each option is assumed to take a scalar value, the example is:

prog --name w --name2 x y z

The problem becomes difficult inR when we consider that thevariables are vectors and not simple scalars. Variables assume multiplevalues. Consider the previous example. Is the value of option#2 val2? Oris it (val2, arg1, arg2, arg3)? It is ambiguous.

A good deal of the time, it doesn’t matter. Most often only one valueis needed. One solution often deployed. is to always specify the numberof values needed by the options.

Options

An option is one or more values provided to the program from thecommand-line. They: * can be optional or required * have 0 or morevalues * have a default value * may be coerced into various types orclasses

(Unbound) Arguments

In addition to options, the command line may also contain unboundargument such as one or more file paths. The distinction of betweenoptions and arguments is not always clear. Both occur on thecommand-line and both supply values to the program. The main differnceis that options provide a name and a value and can always appear in anyorder. This is nice since it requires the user to remember the name ofarguments rather than there order. This is cognitively much simpler andis analogous to the difference between calling a function with namedarguments rather than positional ones.

Arguments, on the other hand, follow two patterns. They are eitherall the same type, such as a list of filenames or they are dependentupon the ordering such as x,y,z coordinates. Many programs use botharguments and options. In this situation, it is good practice to haveall options preceed all arguments. Some programs allow arguments andoptions can be interspersed. When interspersed, it becomes cumbersome toseperate options from arguments. In fact, it can be impossible todistinguish the if the number of values supplied for each option isunknown. Thus, it is important to always specify the number of valuesrequired by the option.

Flags

Values

logical values

Logical values present an interesting challenge to command-lineprocessing. They are the only type that can accept 0 values. In fact,this is the default for logical values. If the flag is present, then thevalue is set to TRUE otherwise it is set to FALSE or it’s default

default values

The use of default values is an a nice addition to command-lineprocessing.

coercion

Other command-line parsing programs require specification of theoptions type. This is needlessly verbose for three reasons:

Inoptigrab, flags and values are initially interpreted asstrings. These may be subsequently coerced into any valid type or classthrough a coercion function.

PROCESSING

At present, most command line processing libraries require fullspecification for all options. This requires a often very complicatedspecifications at the program’s beginning. Most programs, however, havevery simple option processing requirements. It is, therefore, desirablethat options be able to be processed with a very simply and clean syntax– one that better fits into the flow of the program. An example wouldallow the retrival of one option on every line. For example, to get aname and date, you might do the following:

name <- opt_get( "--name" )date <- opt_get( "--date", default=Sys.Date )

In the first line, all we need to do is specify the option flag. Thisis because the default is to return a string value or NA if not found.On the second line, the function returns a single value coerced byas.Date. If no value is supplied it defaults to today’s date.

Since the processing of options is serialized, the processing of alloptions should be done prior to grabing options. If this is done, we canuse theopt_get functions to specify do the apriorispecifications to handle the

BATCH -vs- INTERACTIVE MODE

Command-line option processing should mostly focus on batchprocessing since this is the most common usage scenario. Still,processing should work in Interactive mode for development. In fact, inthis use case, it is important that the parsing be able to handle anarray of options different from the actual arguments used to start thesession.

In Batch processing, the preferred way to launch a session is byusing Rscript. Rscript introduces several arguments to the command-line.These are:

[1] "/usr/lib/R/bin/exec/R" "--slave"                                    [3] "--no-restore"          "--file=./test.r"                             [5] "--args

These are not really arguments from the script itself, thus allarguments up to and including the first--args are notconsidered options.

FAQ

Q. Why is this called optigrab A. I am a Jerk. SeeReferences below.

REFERENCES

The Jerk. Dir. Carl Reiner. Perf Steve Martin, BernadettePeters, Caitlin Adams. Universal Pictures, 1979.http://www.imdb.com/title/tt0079367/

GNUcommand-line standards

commandline


[8]ページ先頭

©2009-2025 Movatter.jp