33.Python compiler package¶
Deprecated since version 2.6:Thecompiler package has been removed in Python 3.
The Python compiler package is a tool for analyzing Python source code andgenerating Python bytecode. The compiler contains libraries to generate anabstract syntax tree from Python source code and to generate Pythonbytecode from the tree.
Thecompiler package is a Python source to bytecode translator written inPython. It uses the built-in parser and standardparser module togenerate a concrete syntax tree. This tree is used to generate an abstractsyntax tree (AST) and then Python bytecode.
The full functionality of the package duplicates the built-in compiler providedwith the Python interpreter. It is intended to match its behavior almostexactly. Why implement another compiler that does the same thing? The packageis useful for a variety of purposes. It can be modified more easily than thebuilt-in compiler. The AST it generates is useful for analyzing Python sourcecode.
This chapter explains how the various components of thecompiler packagework. It blends reference material with a tutorial.
33.1.The basic interface¶
The top-level of the package defines four functions. If you importcompiler, you will get these functions and a collection of modulescontained in the package.
compiler.parse(buf)¶Returns an abstract syntax tree for the Python source code inbuf. Thefunction raises
SyntaxErrorif there is an error in the source code. Thereturn value is acompiler.ast.Moduleinstance that contains the tree.
compiler.parseFile(path)¶Return an abstract syntax tree for the Python source code in the file specifiedbypath. It is equivalent to
parse(open(path).read()).
compiler.walk(ast,visitor[,verbose])¶Do a pre-order walk over the abstract syntax treeast. Call the appropriatemethod on thevisitor instance for each node encountered.
compiler.compile(source,filename,mode,flags=None,dont_inherit=None)¶Compile the stringsource, a Python module, statement or expression, into acode object that can be executed by the exec statement or
eval(). Thisfunction is a replacement for the built-incompile()function.Thefilename will be used for run-time error messages.
Themode must be ‘exec’ to compile a module, ‘single’ to compile a single(interactive) statement, or ‘eval’ to compile an expression.
Theflags anddont_inherit arguments affect future-related statements, butare not supported yet.
compiler.compileFile(source)¶Compiles the filesource and generates a .pyc file.
Thecompiler package contains the following modules:ast,consts,future,misc,pyassem,pycodegen,symbols,transformer, andvisitor.
33.2.Limitations¶
There are some problems with the error checking of the compiler package. Theinterpreter detects syntax errors in two distinct phases. One set of errors isdetected by the interpreter’s parser, the other set by the compiler. Thecompiler package relies on the interpreter’s parser, so it get the first phasesof error checking for free. It implements the second phase itself, and thatimplementation is incomplete. For example, the compiler package does not raisean error if a name appears more than once in an argument list:deff(x,x):...
A future version of the compiler should fix these problems.
33.3.Python Abstract Syntax¶
Thecompiler.ast module defines an abstract syntax for Python. In theabstract syntax tree, each node represents a syntactic construct. The root ofthe tree isModule object.
The abstract syntax offers a higher level interface to parsed Python sourcecode. Theparser module and the compiler written in C for the Pythoninterpreter use a concrete syntax tree. The concrete syntax is tied closely tothe grammar description used for the Python parser. Instead of a single nodefor a construct, there are often several levels of nested nodes that areintroduced by Python’s precedence rules.
The abstract syntax tree is created by thecompiler.transformer module.The transformer relies on the built-in Python parser to generate a concretesyntax tree. It generates an abstract syntax tree from the concrete tree.
Thetransformer module was created by Greg Stein and Bill Tutt for anexperimental Python-to-C compiler. The current version contains a number ofmodifications and improvements, but the basic form of the abstract syntax and ofthe transformer are due to Stein and Tutt.
33.3.1.AST Nodes¶
Thecompiler.ast module is generated from a text file that describes eachnode type and its elements. Each node type is represented as a class thatinherits from the abstract base classcompiler.ast.Node and defines aset of named attributes for child nodes.
- class
compiler.ast.Node¶ The
Nodeinstances are created automatically by the parser generator.The recommended interface for specificNodeinstances is to use thepublic attributes to access child nodes. A public attribute may be bound to asingle node or to a sequence of nodes, depending on theNodetype. Forexample, thebasesattribute of theClassnode, is bound to alist of base class nodes, and thedocattribute is bound to a singlenode.Each
Nodeinstance has alinenoattribute which may beNone. XXX Not sure what the rules are for which nodes will have a usefullineno.All
Nodeobjects offer the following methods:getChildren()¶Returns a flattened list of the child nodes and objects in the order theyoccur. Specifically, the order of the nodes is the order in which theyappear in the Python grammar. Not all of the children are
Nodeinstances. The names of functions and classes, for example, are plainstrings.
getChildNodes()¶Returns a flattened list of the child nodes in the order they occur. Thismethod is like
getChildren(), except that it only returns thosechildren that areNodeinstances.
Two examples illustrate the general structure ofNode classes. Thewhile statement is defined by the following grammar production:
while_stmt:"while"expression":"suite["else"":"suite]
TheWhile node has three attributes:test,body, andelse_. (If the natural name for an attribute is also a Python reservedword, it can’t be used as an attribute name. An underscore is appended to theword to make it a legal identifier, henceelse_ instead ofelse.)
Theif statement is more complicated because it can include severaltests.
if_stmt:'if'test':'suite('elif'test':'suite)*['else'':'suite]
TheIf node only defines two attributes:tests andelse_. Thetests attribute is a sequence of test expression,consequent body pairs. There is one pair for eachif/elifclause. The first element of the pair is the test expression. The secondelements is aStmt node that contains the code to execute if the testis true.
ThegetChildren() method ofIf returns a flat list of childnodes. If there are threeif/elif clauses and noelse clause, thengetChildren() will return a list of sixelements: the first test expression, the firstStmt, the second textexpression, etc.
The following table lists each of theNode subclasses defined incompiler.ast and each of the public attributes available on theirinstances. The values of most of the attributes are themselvesNodeinstances or sequences of instances. When the value is something other than aninstance, the type is noted in the comment. The attributes are listed in theorder in which they are returned bygetChildren() andgetChildNodes().
Node type | Attribute | Value |
|---|---|---|
|
| left operand |
| right operand | |
|
| list of operands |
| attribute as target ofassignment | |
| expression on the left-handside of the dot | |
| the attribute name, a string | |
| XXX | |
|
| list of list elements beingassigned to |
|
| name being assigned to |
| XXX | |
|
| list of tuple elements beingassigned to |
| the expression to be tested | |
| the value of the | |
|
| a list of assignment targets,one per equal sign |
| the value being assigned | |
|
| |
| ||
| ||
|
| |
|
| |
|
| |
|
| |
| ||
|
| expression for the callee |
| a list of arguments | |
| the extended *-arg value | |
| the extended **-arg value | |
|
| the name of the class, a string |
| a list of base classes | |
| doc string, a string or | |
the body of the class statement | ||
|
| |
| ||
|
| |
| ||
|
| List of function decoratorexpressions |
|
| |
|
| |
|
| |
| ||
|
| |
|
| |
|
| |
| ||
|
| |
| ||
| ||
| ||
|
| |
| ||
|
|
|
| name used in def, a string | |
| list of argument names, asstrings | |
| list of default values | |
| xxx | |
| doc string, a string or | |
the body of the function | ||
| ||
|
| |
| ||
| ||
|
| |
| ||
|
| |
| ||
|
| |
|
| |
| ||
|
| |
|
| |
|
| |
| ||
|
| |
| ||
| ||
|
| |
| ||
|
| |
|
| |
| ||
|
| |
| ||
| ||
| ||
|
| |
| ||
|
| doc string, a string or |
| body of the module, a | |
|
| |
| ||
|
| |
|
| |
|
| |
| ||
|
| |
| ||
|
| |
| ||
|
| |
| ||
|
| |
| ||
| ||
|
| |
|
| |
| ||
|
| |
| ||
| ||
| ||
|
| list of statements |
|
| |
|
| |
| ||
|
| |
| ||
| ||
|
| |
| ||
| ||
|
| |
| ||
|
| |
|
| |
|
| |
| ||
| ||
| ||
|
| |
| ||
|
|
33.3.2.Assignment nodes¶
There is a collection of nodes used to represent assignments. Each assignmentstatement in the source code becomes a singleAssign node in the AST.Thenodes attribute is a list that contains a node for each assignmenttarget. This is necessary because assignment can be chained, e.g.a=b=2. EachNode in the list will be one of the following classes:AssAttr,AssList,AssName, orAssTuple.
Each target assignment node will describe the kind of object being assigned to:AssName for a simple name, e.g.a=1.AssAttr for anattribute assigned, e.g.a.x=1.AssList andAssTuple forlist and tuple expansion respectively, e.g.a,b,c=a_tuple.
The target assignment nodes also have aflags attribute that indicateswhether the node is being used for assignment or in a delete statement. TheAssName is also used to represent a delete statement, e.g.delx.
When an expression contains several attribute references, an assignment ordelete statement will contain only oneAssAttr node – for the finalattribute reference. The other attribute references will be represented asGetattr nodes in theexpr attribute of theAssAttrinstance.
33.3.3.Examples¶
This section shows several simple examples of ASTs for Python source code. Theexamples demonstrate how to use theparse() function, what the repr of anAST looks like, and how to access attributes of an AST node.
The first module defines a single function. Assume it is stored indoublelib.py.
"""This is an example module.This is the docstring."""defdouble(x):"Return twice the argument"returnx*2
In the interactive interpreter session below, I have reformatted the long ASTreprs for readability. The AST reprs use unqualified class names. If you wantto create an instance from a repr, you must import the class names from thecompiler.ast module.
>>>importcompiler>>>mod=compiler.parseFile("doublelib.py")>>>modModule('This is an example module.\n\nThis is the docstring.\n', Stmt([Function(None, 'double', ['x'], [], 0, 'Return twice the argument', Stmt([Return(Mul((Name('x'), Const(2))))]))]))>>>fromcompiler.astimport*>>>Module('This is an example module.\n\nThis is the docstring.\n',...Stmt([Function(None,'double',['x'],[],0,...'Return twice the argument',...Stmt([Return(Mul((Name('x'),Const(2))))]))]))Module('This is an example module.\n\nThis is the docstring.\n', Stmt([Function(None, 'double', ['x'], [], 0, 'Return twice the argument', Stmt([Return(Mul((Name('x'), Const(2))))]))]))>>>mod.doc'This is an example module.\n\nThis is the docstring.\n'>>>fornodeinmod.node.nodes:...printnode...Function(None, 'double', ['x'], [], 0, 'Return twice the argument', Stmt([Return(Mul((Name('x'), Const(2))))]))>>>func=mod.node.nodes[0]>>>func.codeStmt([Return(Mul((Name('x'), Const(2))))])
33.4.Using Visitors to Walk ASTs¶
The visitor pattern is … Thecompiler package uses a variant on thevisitor pattern that takes advantage of Python’s introspection features toeliminate the need for much of the visitor’s infrastructure.
The classes being visited do not need to be programmed to accept visitors. Thevisitor need only define visit methods for classes it is specifically interestedin; a default visit method can handle the rest.
XXX The magicvisit() method for visitors.
compiler.visitor.walk(tree,visitor[,verbose])¶
- class
compiler.visitor.ASTVisitor¶ The
ASTVisitoris responsible for walking over the tree in the correctorder. A walk begins with a call topreorder(). For each node, it checksthevisitor argument topreorder()for a method named ‘visitNodeType,’where NodeType is the name of the node’s class, e.g. for aWhilenode avisitWhile()would be called. If the method exists, it is called with thenode as its first argument.The visitor method for a particular node type can control how child nodes arevisited during the walk. The
ASTVisitormodifies the visitor argumentby adding a visit method to the visitor; this method can be used to visit aparticular child node. If no visitor is found for a particular node type, thedefault()method is called.ASTVisitorobjects have the following methods:XXX describe extra arguments
default(node[,...])¶
dispatch(node[,...])¶
preorder(tree,visitor)¶
33.5.Bytecode Generation¶
The code generator is a visitor that emits bytecodes. Each visit method cancall theemit() method to emit a new bytecode. The basic code generatoris specialized for modules, classes, and functions. An assembler converts thatemitted instructions to the low-level bytecode format. It handles things likegeneration of constant lists of code objects and calculation of jump offsets.
