parser — Access Python parse trees


Theparser module provides an interface to Python’s internal parser andbyte-code compiler. The primary purpose for this interface is to allow Pythoncode to edit the parse tree of a Python expression and create executable codefrom this. This is better than trying to parse and modify an arbitrary Pythoncode fragment as a string because parsing is performed in a manner identical tothe code forming the application. It is also faster.

Note

From Python 2.5 onward, it’s much more convenient to cut in at the AbstractSyntax Tree (AST) generation and compilation stage, using theastmodule.

There are a few things to note about this module which are important to makinguse of the data structures created. This is not a tutorial on editing the parsetrees for Python code, but some examples of using theparser module arepresented.

Most importantly, a good understanding of the Python grammar processed by theinternal parser is required. For full information on the language syntax, refertoThe Python Language Reference. The parseritself is created from a grammar specification defined in the fileGrammar/Grammar in the standard Python distribution. The parse treesstored in the ST objects created by this module are the actual output from theinternal parser when created by theexpr() orsuite() functions,described below. The ST objects created bysequence2st() faithfullysimulate those structures. Be aware that the values of the sequences which areconsidered “correct” will vary from one version of Python to another as theformal grammar for the language is revised. However, transporting code from onePython version to another as source text will always allow correct parse treesto be created in the target version, with the only restriction being thatmigrating to an older version of the interpreter will not support more recentlanguage constructs. The parse trees are not typically compatible from oneversion to another, though source code has usually been forward-compatible withina major release series.

Each element of the sequences returned byst2list() orst2tuple()has a simple form. Sequences representing non-terminal elements in the grammaralways have a length greater than one. The first element is an integer whichidentifies a production in the grammar. These integers are given symbolic namesin the C header fileInclude/graminit.h and the Python modulesymbol. Each additional element of the sequence represents a componentof the production as recognized in the input string: these are always sequenceswhich have the same form as the parent. An important aspect of this structurewhich should be noted is that keywords used to identify the parent node type,such as the keywordif in anif_stmt, are included in thenode tree without any special treatment. For example, theif keywordis represented by the tuple(1,'if'), where1 is the numeric valueassociated with allNAME tokens, including variable and function namesdefined by the user. In an alternate form returned when line number informationis requested, the same token might be represented as(1,'if',12), wherethe12 represents the line number at which the terminal symbol was found.

Terminal elements are represented in much the same way, but without any childelements and the addition of the source text which was identified. The exampleof theif keyword above is representative. The various types ofterminal symbols are defined in the C header fileInclude/token.h andthe Python moduletoken.

The ST objects are not required to support the functionality of this module,but are provided for three purposes: to allow an application to amortize thecost of processing complex parse trees, to provide a parse tree representationwhich conserves memory space when compared to the Python list or tuplerepresentation, and to ease the creation of additional modules in C whichmanipulate parse trees. A simple “wrapper” class may be created in Python tohide the use of ST objects.

Theparser module defines functions for a few distinct purposes. Themost important purposes are to create ST objects and to convert ST objects toother representations such as parse trees and compiled code objects, but thereare also functions which serve to query the type of parse tree represented by anST object.

See also

Modulesymbol

Useful constants representing internal nodes of the parse tree.

Moduletoken

Useful constants representing leaf nodes of the parse tree and functions fortesting node values.

Creating ST Objects

ST objects may be created from source code or from a parse tree. When creatingan ST object from source, different functions are used to create the'eval'and'exec' forms.

parser.expr(source)

Theexpr() function parses the parametersource as if it were an inputtocompile(source,'file.py','eval'). If the parse succeeds, an ST objectis created to hold the internal parse tree representation, otherwise anappropriate exception is raised.

parser.suite(source)

Thesuite() function parses the parametersource as if it were an inputtocompile(source,'file.py','exec'). If the parse succeeds, an ST objectis created to hold the internal parse tree representation, otherwise anappropriate exception is raised.

parser.sequence2st(sequence)

This function accepts a parse tree represented as a sequence and builds aninternal representation if possible. If it can validate that the tree conformsto the Python grammar and all nodes are valid node types in the host version ofPython, an ST object is created from the internal representation and returnedto the called. If there is a problem creating the internal representation, orif the tree cannot be validated, aParserError exception is raised. AnST object created this way should not be assumed to compile correctly; normalexceptions raised by compilation may still be initiated when the ST object ispassed tocompilest(). This may indicate problems not related to syntax(such as aMemoryError exception), but may also be due to constructs suchas the result of parsingdelf(0), which escapes the Python parser but ischecked by the bytecode compiler.

Sequences representing terminal tokens may be represented as either two-elementlists of the form(1,'name') or as three-element lists of the form(1,'name',56). If the third element is present, it is assumed to be a validline number. The line number may be specified for any subset of the terminalsymbols in the input tree.

parser.tuple2st(sequence)

This is the same function assequence2st(). This entry point ismaintained for backward compatibility.

Converting ST Objects

ST objects, regardless of the input used to create them, may be converted toparse trees represented as list- or tuple- trees, or may be compiled intoexecutable code objects. Parse trees may be extracted with or without linenumbering information.

parser.st2list(st,line_info=False,col_info=False)

This function accepts an ST object from the caller inst and returns aPython list representing the equivalent parse tree. The resulting listrepresentation can be used for inspection or the creation of a new parse tree inlist form. This function does not fail so long as memory is available to buildthe list representation. If the parse tree will only be used for inspection,st2tuple() should be used instead to reduce memory consumption andfragmentation. When the list representation is required, this function issignificantly faster than retrieving a tuple representation and converting thatto nested lists.

Ifline_info is true, line number information will be included for allterminal tokens as a third element of the list representing the token. Notethat the line number provided specifies the line on which the tokenends.This information is omitted if the flag is false or omitted.

parser.st2tuple(st,line_info=False,col_info=False)

This function accepts an ST object from the caller inst and returns aPython tuple representing the equivalent parse tree. Other than returning atuple instead of a list, this function is identical tost2list().

Ifline_info is true, line number information will be included for allterminal tokens as a third element of the list representing the token. Thisinformation is omitted if the flag is false or omitted.

parser.compilest(st,filename='<syntax-tree>')

The Python byte compiler can be invoked on an ST object to produce code objectswhich can be used as part of a call to the built-inexec() oreval()functions. This function provides the interface to the compiler, passing theinternal parse tree fromst to the parser, using the source file namespecified by thefilename parameter. The default value supplied forfilenameindicates that the source was an ST object.

Compiling an ST object may result in exceptions related to compilation; anexample would be aSyntaxError caused by the parse tree fordelf(0):this statement is considered legal within the formal grammar for Python but isnot a legal language construct. TheSyntaxError raised for thiscondition is actually generated by the Python byte-compiler normally, which iswhy it can be raised at this point by theparser module. Most causes ofcompilation failure can be diagnosed programmatically by inspection of the parsetree.

Queries on ST Objects

Two functions are provided which allow an application to determine if an ST wascreated as an expression or a suite. Neither of these functions can be used todetermine if an ST was created from source code viaexpr() orsuite() or from a parse tree viasequence2st().

parser.isexpr(st)

Whenst represents an'eval' form, this function returnsTrue, otherwiseit returnsFalse. This is useful, since code objects normally cannot be queriedfor this information using existing built-in functions. Note that the codeobjects created bycompilest() cannot be queried like this either, andare identical to those created by the built-incompile() function.

parser.issuite(st)

This function mirrorsisexpr() in that it reports whether an ST objectrepresents an'exec' form, commonly known as a “suite.” It is not safe toassume that this function is equivalent tonotisexpr(st), as additionalsyntactic fragments may be supported in the future.

Exceptions and Error Handling

The parser module defines a single exception, but may also pass other built-inexceptions from other portions of the Python runtime environment. See eachfunction for information about the exceptions it can raise.

exceptionparser.ParserError

Exception raised when a failure occurs within the parser module. This isgenerally produced for validation failures rather than the built-inSyntaxError raised during normal parsing. The exception argument iseither a string describing the reason of the failure or a tuple containing asequence causing the failure from a parse tree passed tosequence2st()and an explanatory string. Calls tosequence2st() need to be able tohandle either type of exception, while calls to other functions in the modulewill only need to be aware of the simple string values.

Note that the functionscompilest(),expr(), andsuite() mayraise exceptions which are normally raised by the parsing and compilationprocess. These include the built in exceptionsMemoryError,OverflowError,SyntaxError, andSystemError. In thesecases, these exceptions carry all the meaning normally associated with them.Refer to the descriptions of each function for detailed information.

ST Objects

Ordered and equality comparisons are supported between ST objects. Pickling ofST objects (using thepickle module) is also supported.

parser.STType

The type of the objects returned byexpr(),suite() andsequence2st().

ST objects have the following methods:

ST.compile(filename='<syntax-tree>')

Same ascompilest(st,filename).

ST.isexpr()

Same asisexpr(st).

ST.issuite()

Same asissuite(st).

ST.tolist(line_info=False,col_info=False)

Same asst2list(st,line_info,col_info).

ST.totuple(line_info=False,col_info=False)

Same asst2tuple(st,line_info,col_info).

Example: Emulation ofcompile()

While many useful operations may take place between parsing and bytecodegeneration, the simplest operation is to do nothing. For this purpose, usingtheparser module to produce an intermediate data structure is equivalentto the code

>>>code=compile('a + 5','file.py','eval')>>>a=5>>>eval(code)10

The equivalent operation using theparser module is somewhat longer, andallows the intermediate internal parse tree to be retained as an ST object:

>>>importparser>>>st=parser.expr('a + 5')>>>code=st.compile('file.py')>>>a=5>>>eval(code)10

An application which needs both ST and code objects can package this code intoreadily available functions:

importparserdefload_suite(source_string):st=parser.suite(source_string)returnst,st.compile()defload_expression(source_string):st=parser.expr(source_string)returnst,st.compile()