shlex
--- 簡單的語法分析¶
原始碼:Lib/shlex.py
Theshlex
class makes it easy to write lexical analyzers forsimple syntaxes resembling that of the Unix shell. This will often be usefulfor writing minilanguages, (for example, in run control files for Pythonapplications) or for parsing quoted strings.
Theshlex
module defines the following functions:
- shlex.split(s,comments=False,posix=True)¶
Split the strings using shell-like syntax. Ifcomments is
False
(the default), the parsing of comments in the given string will be disabled(setting thecommenters
attribute of theshlex
instance to the empty string). This function operatesin POSIX mode by default, but uses non-POSIX mode if theposix argument isfalse.在 3.12 版的變更:Passing
None
fors argument now raises an exception, rather thanreadingsys.stdin
.
- shlex.join(split_command)¶
Concatenate the tokens of the listsplit_command and return a string.This function is the inverse of
split()
.>>>fromshleximportjoin>>>print(join(['echo','-n','Multiple words']))echo -n 'Multiple words'
The returned value is shell-escaped to protect against injectionvulnerabilities (see
quote()
).在 3.8 版被加入.
- shlex.quote(s)¶
Return a shell-escaped version of the strings. The returned value is astring that can safely be used as one token in a shell command line, forcases where you cannot use a list.
警告
The
shlex
module isonly designed for Unix shells.The
quote()
function is not guaranteed to be correct on non-POSIXcompliant shells or shells from other operating systems such as Windows.Executing commands quoted by this module on such shells can open up thepossibility of a command injection vulnerability.Consider using functions that pass command arguments with lists such as
subprocess.run()
withshell=False
.This idiom would be unsafe:
>>>filename='somefile; rm -rf ~'>>>command='ls -l{}'.format(filename)>>>print(command)# executed by a shell: boom!ls -l somefile; rm -rf ~
quote()
lets you plug the security hole:>>>fromshleximportquote>>>command='ls -l{}'.format(quote(filename))>>>print(command)ls -l 'somefile; rm -rf ~'>>>remote_command='ssh home{}'.format(quote(command))>>>print(remote_command)ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''
The quoting is compatible with UNIX shells and with
split()
:>>>fromshleximportsplit>>>remote_command=split(remote_command)>>>remote_command['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]>>>command=split(remote_command[-1])>>>command['ls', '-l', 'somefile; rm -rf ~']
在 3.3 版被加入.
Theshlex
module defines the following class:
- classshlex.shlex(instream=None,infile=None,posix=False,punctuation_chars=False)¶
A
shlex
instance or subclass instance is a lexical analyzerobject. The initialization argument, if present, specifies where to readcharacters from. It must be a file-/stream-like object withread()
andreadline()
methods, ora string. If no argument is given, input will be taken fromsys.stdin
.The second optional argument is a filename string, which sets the initialvalue of theinfile
attribute. If theinstreamargument is omitted or equal tosys.stdin
, this second argumentdefaults to "stdin". Theposix argument defines the operational mode:whenposix is not true (default), theshlex
instance willoperate in compatibility mode. When operating in POSIX mode,shlex
will try to be as close as possible to the POSIX shellparsing rules. Thepunctuation_chars argument provides a way to make thebehaviour even closer to how real shells parse. This can take a number ofvalues: the default value,False
, preserves the behaviour seen underPython 3.5 and earlier. If set toTrue
, then parsing of the characters();<>|&
is changed: any run of these characters (considered punctuationcharacters) is returned as a single token. If set to a non-empty string ofcharacters, those characters will be used as the punctuation characters. Anycharacters in thewordchars
attribute that appear inpunctuation_chars will be removed fromwordchars
. SeeImproved Compatibility with Shells for more information.punctuation_charscan be set only uponshlex
instance creation and can't bemodified later.在 3.6 版的變更:新增punctuation_chars 參數。
也參考
configparser
模組Parser for configuration files similar to the Windows
.ini
files.
shlex 物件¶
Ashlex
instance has the following methods:
- shlex.get_token()¶
Return a token. If tokens have been stacked using
push_token()
, pop atoken off the stack. Otherwise, read one from the input stream. If readingencounters an immediate end-of-file,eof
is returned (the emptystring (''
) in non-POSIX mode, andNone
in POSIX mode).
- shlex.push_token(str)¶
Push the argument onto the token stack.
- shlex.read_token()¶
Read a raw token. Ignore the pushback stack, and do not interpret sourcerequests. (This is not ordinarily a useful entry point, and is documented hereonly for the sake of completeness.)
- shlex.sourcehook(filename)¶
When
shlex
detects a source request (seesource
below) this method is given the following token as argument, and expectedto return a tuple consisting of a filename and an open file-like object.Normally, this method first strips any quotes off the argument. If the resultis an absolute pathname, or there was no previous source request in effect, orthe previous source was a stream (such as
sys.stdin
), the result is leftalone. Otherwise, if the result is a relative pathname, the directory part ofthe name of the file immediately before it on the source inclusion stack isprepended (this behavior is like the way the C preprocessor handles#include"file.h"
).The result of the manipulations is treated as a filename, and returned as thefirst component of the tuple, with
open()
called on it to yield the secondcomponent. (Note: this is the reverse of the order of arguments in instanceinitialization!)This hook is exposed so that you can use it to implement directory search paths,addition of file extensions, and other namespace hacks. There is nocorresponding 'close' hook, but a shlex instance will call the
close()
method of the sourced input stream when it returnsEOF.For more explicit control of source stacking, use the
push_source()
andpop_source()
methods.
- shlex.push_source(newstream,newfile=None)¶
Push an input source stream onto the input stack. If the filename argument isspecified it will later be available for use in error messages. This is thesame method used internally by the
sourcehook()
method.
- shlex.pop_source()¶
Pop the last-pushed input source from the input stack. This is the same methodused internally when the lexer reaches EOF on a stacked input stream.
- shlex.error_leader(infile=None,lineno=None)¶
This method generates an error message leader in the format of a Unix C compilererror label; the format is
'"%s",line%d:'
, where the%s
is replacedwith the name of the current source file and the%d
with the current inputline number (the optional arguments can be used to override these).This convenience is provided to encourage
shlex
users to generate errormessages in the standard, parseable format understood by Emacs and other Unixtools.
Instances ofshlex
subclasses have some public instancevariables which either control lexical analysis or can be used for debugging:
- shlex.commenters¶
The string of characters that are recognized as comment beginners. Allcharacters from the comment beginner to end of line are ignored. Includes just
'#'
by default.
- shlex.wordchars¶
The string of characters that will accumulate into multi-character tokens. Bydefault, includes all ASCII alphanumerics and underscore. In POSIX mode, theaccented characters in the Latin-1 set are also included. If
punctuation_chars
is not empty, the characters~-./*?=
, which canappear in filename specifications and command line parameters, will also beincluded in this attribute, and any characters which appear inpunctuation_chars
will be removed fromwordchars
if they are presentthere. Ifwhitespace_split
is set toTrue
, this will have noeffect.
- shlex.whitespace¶
Characters that will be considered whitespace and skipped. Whitespace boundstokens. By default, includes space, tab, linefeed and carriage-return.
- shlex.escape¶
Characters that will be considered as escape. This will be only used in POSIXmode, and includes just
'\'
by default.
- shlex.quotes¶
Characters that will be considered string quotes. The token accumulates untilthe same quote is encountered again (thus, different quote types protect eachother as in the shell.) By default, includes ASCII single and double quotes.
- shlex.escapedquotes¶
Characters in
quotes
that will interpret escape characters defined inescape
. This is only used in POSIX mode, and includes just'"'
bydefault.
- shlex.whitespace_split¶
If
True
, tokens will only be split in whitespaces. This is useful, forexample, for parsing command lines withshlex
, gettingtokens in a similar way to shell arguments. When used in combination withpunctuation_chars
, tokens will be split on whitespace in addition tothose characters.在 3.8 版的變更:The
punctuation_chars
attribute was made compatible with thewhitespace_split
attribute.
- shlex.infile¶
The name of the current input file, as initially set at class instantiation timeor stacked by later source requests. It may be useful to examine this whenconstructing error messages.
- shlex.source¶
This attribute is
None
by default. If you assign a string to it, thatstring will be recognized as a lexical-level inclusion request similar to thesource
keyword in various shells. That is, the immediately following tokenwill be opened as a filename and input will be taken from that stream untilEOF, at which point theclose()
method of that stream will becalled and the input source will again become the original input stream. Sourcerequests may be stacked any number of levels deep.
- shlex.debug¶
If this attribute is numeric and
1
or more, ashlex
instance will print verbose progress output on its behavior. If you needto use this, you can read the module source code to learn the details.
- shlex.lineno¶
Source line number (count of newlines seen so far plus one).
- shlex.token¶
The token buffer. It may be useful to examine this when catching exceptions.
- shlex.eof¶
Token used to determine end of file. This will be set to the empty string(
''
), in non-POSIX mode, and toNone
in POSIX mode.
- shlex.punctuation_chars¶
A read-only property. Characters that will be considered punctuation. Runs ofpunctuation characters will be returned as a single token. However, note that nosemantic validity checking will be performed: for example, '>>>' could bereturned as a token, even though it may not be recognised as such by shells.
在 3.6 版被加入.
Parsing Rules¶
When operating in non-POSIX mode,shlex
will try to obey to thefollowing rules.
Quote characters are not recognized within words (
Do"Not"Separate
isparsed as the single wordDo"Not"Separate
);Escape characters are not recognized;
Enclosing characters in quotes preserve the literal value of all characterswithin the quotes;
Closing quotes separate words (
"Do"Separate
is parsed as"Do"
andSeparate
);If
whitespace_split
isFalse
, any character notdeclared to be a word character, whitespace, or a quote will be returned asa single-character token. If it isTrue
,shlex
will onlysplit words in whitespaces;EOF is signaled with an empty string (
''
);It's not possible to parse empty strings, even if quoted.
When operating in POSIX mode,shlex
will try to obey to thefollowing parsing rules.
Quotes are stripped out, and do not separate words (
"Do"Not"Separate"
isparsed as the single wordDoNotSeparate
);Non-quoted escape characters (e.g.
'\'
) preserve the literal value of thenext character that follows;Enclosing characters in quotes which are not part of
escapedquotes
(e.g."'"
) preserve the literal valueof all characters within the quotes;Enclosing characters in quotes which are part of
escapedquotes
(e.g.'"'
) preserves the literal valueof all characters within the quotes, with the exception of the charactersmentioned inescape
. The escape characters retain itsspecial meaning only when followed by the quote in use, or the escapecharacter itself. Otherwise the escape character will be considered anormal character.EOF is signaled with a
None
value;Quoted empty strings (
''
) are allowed.
Improved Compatibility with Shells¶
在 3.6 版被加入.
Theshlex
class provides compatibility with the parsing performed bycommon Unix shells likebash
,dash
, andsh
. To take advantage ofthis compatibility, specify thepunctuation_chars
argument in theconstructor. This defaults toFalse
, which preserves pre-3.6 behaviour.However, if it is set toTrue
, then parsing of the characters();<>|&
is changed: any run of these characters is returned as a single token. Whilethis is short of a full parser for shells (which would be out of scope for thestandard library, given the multiplicity of shells out there), it does allowyou to perform processing of command lines more easily than you couldotherwise. To illustrate, you can see the difference in the following snippet:
>>>importshlex>>>text="a && b; c && d || e; f >'abc'; (def\"ghi\")">>>s=shlex.shlex(text,posix=True)>>>s.whitespace_split=True>>>list(s)['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)']>>>s=shlex.shlex(text,posix=True,punctuation_chars=True)>>>s.whitespace_split=True>>>list(s)['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';','(', 'def', 'ghi', ')']
Of course, tokens will be returned which are not valid for shells, and you'llneed to implement your own error checks on the returned tokens.
Instead of passingTrue
as the value for the punctuation_chars parameter,you can pass a string with specific characters, which will be used to determinewhich characters constitute punctuation. For example:
>>>importshlex>>>s=shlex.shlex("a && b || c",punctuation_chars="|")>>>list(s)['a', '&', '&', 'b', '||', 'c']
備註
Whenpunctuation_chars
is specified, thewordchars
attribute is augmented with the characters~-./*?=
. That is because thesecharacters can appear in file names (including wildcards) and command-linearguments (e.g.--color=auto
). Hence:
>>>importshlex>>>s=shlex.shlex('~/a && b-c --color=auto || d *.py?',...punctuation_chars=True)>>>list(s)['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?']
However, to match the shell as closely as possible, it is recommended toalways useposix
andwhitespace_split
when usingpunctuation_chars
, which will negatewordchars
entirely.
For best effect,punctuation_chars
should be set in conjunction withposix=True
. (Note thatposix=False
is the default forshlex
.)