24.3.shlex — Simple lexical analysis

Source code:Lib/shlex.py


Theshlex class makes it easy to write lexical analyzers forsimple syntaxes resembling that of the Unix shell. This will often be usefulfor writing minilanguages, (for example, in run control files for Pythonapplications) or for parsing quoted strings.

Theshlex module defines the following functions:

shlex.split(s,comments=False,posix=True)

Split the strings using shell-like syntax. Ifcomments isFalse(the default), the parsing of comments in the given string will be disabled(setting thecommenters attribute of theshlex instance to the empty string). This function operatesin POSIX mode by default, but uses non-POSIX mode if theposix argument isfalse.

Note

Since thesplit() function instantiates ashlexinstance, passingNone fors will read the string to split fromstandard input.

shlex.quote(s)

Return a shell-escaped version of the strings. The returned value is astring that can safely be used as one token in a shell command line, forcases where you cannot use a list.

This idiom would be unsafe:

>>>filename='somefile; rm -rf ~'>>>command='ls -l{}'.format(filename)>>>print(command)# executed by a shell: boom!ls -l somefile; rm -rf ~

quote() lets you plug the security hole:

>>>command='ls -l{}'.format(quote(filename))>>>print(command)ls -l 'somefile; rm -rf ~'>>>remote_command='ssh home{}'.format(quote(command))>>>print(remote_command)ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''

The quoting is compatible with UNIX shells and withsplit():

>>>remote_command=split(remote_command)>>>remote_command['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]>>>command=split(remote_command[-1])>>>command['ls', '-l', 'somefile; rm -rf ~']

New in version 3.3.

Theshlex module defines the following class:

classshlex.shlex(instream=None,infile=None,posix=False)

Ashlex instance or subclass instance is a lexical analyzerobject. The initialization argument, if present, specifies where to readcharacters from. It must be a file-/stream-like object withread() andreadline() methods, ora string. If no argument is given, input will be taken fromsys.stdin.The second optional argument is a filename string, which sets the initialvalue of theinfile attribute. If theinstreamargument is omitted or equal tosys.stdin, this second argumentdefaults to “stdin”. Theposix argument defines the operational mode:whenposix is not true (default), theshlex instance willoperate in compatibility mode. When operating in POSIX mode,shlex will try to be as close as possible to the POSIX shellparsing rules.

See also

Moduleconfigparser
Parser for configuration files similar to the Windows.ini files.

24.3.1. shlex Objects

Ashlex instance has the following methods:

shlex.get_token()

Return a token. If tokens have been stacked usingpush_token(), pop atoken off the stack. Otherwise, read one from the input stream. If readingencounters an immediate end-of-file,eof is returned (the emptystring ('') in non-POSIX mode, andNone in POSIX mode).

shlex.push_token(str)

Push the argument onto the token stack.

shlex.read_token()

Read a raw token. Ignore the pushback stack, and do not interpret sourcerequests. (This is not ordinarily a useful entry point, and is documented hereonly for the sake of completeness.)

shlex.sourcehook(filename)

Whenshlex detects a source request (seesourcebelow) this method is given the following token as argument, and expectedto return a tuple consisting of a filename and an open file-like object.

Normally, this method first strips any quotes off the argument. If the resultis an absolute pathname, or there was no previous source request in effect, orthe previous source was a stream (such assys.stdin), the result is leftalone. Otherwise, if the result is a relative pathname, the directory part ofthe name of the file immediately before it on the source inclusion stack isprepended (this behavior is like the way the C preprocessor handles#include"file.h").

The result of the manipulations is treated as a filename, and returned as thefirst component of the tuple, withopen() called on it to yield the secondcomponent. (Note: this is the reverse of the order of arguments in instanceinitialization!)

This hook is exposed so that you can use it to implement directory search paths,addition of file extensions, and other namespace hacks. There is nocorresponding ‘close’ hook, but a shlex instance will call theclose() method of the sourced input stream when it returnsEOF.

For more explicit control of source stacking, use thepush_source() andpop_source() methods.

shlex.push_source(newstream,newfile=None)

Push an input source stream onto the input stack. If the filename argument isspecified it will later be available for use in error messages. This is thesame method used internally by thesourcehook() method.

shlex.pop_source()

Pop the last-pushed input source from the input stack. This is the same methodused internally when the lexer reaches EOF on a stacked input stream.

shlex.error_leader(infile=None,lineno=None)

This method generates an error message leader in the format of a Unix C compilererror label; the format is'"%s",line%d:', where the%s is replacedwith the name of the current source file and the%d with the current inputline number (the optional arguments can be used to override these).

This convenience is provided to encourageshlex users to generate errormessages in the standard, parseable format understood by Emacs and other Unixtools.

Instances ofshlex subclasses have some public instancevariables which either control lexical analysis or can be used for debugging:

shlex.commenters

The string of characters that are recognized as comment beginners. Allcharacters from the comment beginner to end of line are ignored. Includes just'#' by default.

shlex.wordchars

The string of characters that will accumulate into multi-character tokens. Bydefault, includes all ASCII alphanumerics and underscore.

shlex.whitespace

Characters that will be considered whitespace and skipped. Whitespace boundstokens. By default, includes space, tab, linefeed and carriage-return.

shlex.escape

Characters that will be considered as escape. This will be only used in POSIXmode, and includes just'\' by default.

shlex.quotes

Characters that will be considered string quotes. The token accumulates untilthe same quote is encountered again (thus, different quote types protect eachother as in the shell.) By default, includes ASCII single and double quotes.

shlex.escapedquotes

Characters inquotes that will interpret escape characters defined inescape. This is only used in POSIX mode, and includes just'"' bydefault.

shlex.whitespace_split

IfTrue, tokens will only be split in whitespaces. This is useful, forexample, for parsing command lines withshlex, gettingtokens in a similar way to shell arguments.

shlex.infile

The name of the current input file, as initially set at class instantiation timeor stacked by later source requests. It may be useful to examine this whenconstructing error messages.

shlex.instream

The input stream from which thisshlex instance is readingcharacters.

shlex.source

This attribute isNone by default. If you assign a string to it, thatstring will be recognized as a lexical-level inclusion request similar to thesource keyword in various shells. That is, the immediately following tokenwill be opened as a filename and input willbe taken from that stream until EOF, at whichpoint theclose() method of that stream will be called andthe input source will again become the original input stream. Sourcerequests may be stacked any number of levels deep.

shlex.debug

If this attribute is numeric and1 or more, ashlexinstance will print verbose progress output on its behavior. If you needto use this, you can read the module source code to learn the details.

shlex.lineno

Source line number (count of newlines seen so far plus one).

shlex.token

The token buffer. It may be useful to examine this when catching exceptions.

shlex.eof

Token used to determine end of file. This will be set to the empty string(''), in non-POSIX mode, and toNone in POSIX mode.

24.3.2. Parsing Rules

When operating in non-POSIX mode,shlex will try to obey to thefollowing rules.

  • Quote characters are not recognized within words (Do"Not"Separate isparsed as the single wordDo"Not"Separate);
  • Escape characters are not recognized;
  • Enclosing characters in quotes preserve the literal value of all characterswithin the quotes;
  • Closing quotes separate words ("Do"Separate is parsed as"Do" andSeparate);
  • Ifwhitespace_split isFalse, any character notdeclared to be a word character, whitespace, or a quote will be returned asa single-character token. If it isTrue,shlex will onlysplit words in whitespaces;
  • EOF is signaled with an empty string ('');
  • It’s not possible to parse empty strings, even if quoted.

When operating in POSIX mode,shlex will try to obey to thefollowing parsing rules.

  • Quotes are stripped out, and do not separate words ("Do"Not"Separate" isparsed as the single wordDoNotSeparate);
  • Non-quoted escape characters (e.g.'\') preserve the literal value of thenext character that follows;
  • Enclosing characters in quotes which are not part ofescapedquotes (e.g."'") preserve the literal valueof all characters within the quotes;
  • Enclosing characters in quotes which are part ofescapedquotes (e.g.'"') preserves the literal valueof all characters within the quotes, with the exception of the charactersmentioned inescape. The escape characters retain itsspecial meaning only when followed by the quote in use, or the escapecharacter itself. Otherwise the escape character will be considered anormal character.
  • EOF is signaled with aNone value;
  • Quoted empty strings ('') are allowed.