33.Python compiler package

Deprecated since version 2.6:Thecompiler package has been removed in Python 3.

The Python compiler package is a tool for analyzing Python source code andgenerating Python bytecode. The compiler contains libraries to generate anabstract syntax tree from Python source code and to generate Pythonbytecode from the tree.

Thecompiler package is a Python source to bytecode translator written inPython. It uses the built-in parser and standardparser module togenerate a concrete syntax tree. This tree is used to generate an abstractsyntax tree (AST) and then Python bytecode.

The full functionality of the package duplicates the built-in compiler providedwith the Python interpreter. It is intended to match its behavior almostexactly. Why implement another compiler that does the same thing? The packageis useful for a variety of purposes. It can be modified more easily than thebuilt-in compiler. The AST it generates is useful for analyzing Python sourcecode.

This chapter explains how the various components of thecompiler packagework. It blends reference material with a tutorial.

33.1.The basic interface

The top-level of the package defines four functions. If you importcompiler, you will get these functions and a collection of modulescontained in the package.

compiler.parse(buf)

Returns an abstract syntax tree for the Python source code inbuf. Thefunction raisesSyntaxError if there is an error in the source code. Thereturn value is acompiler.ast.Module instance that contains the tree.

compiler.parseFile(path)

Return an abstract syntax tree for the Python source code in the file specifiedbypath. It is equivalent toparse(open(path).read()).

compiler.walk(ast,visitor[,verbose])

Do a pre-order walk over the abstract syntax treeast. Call the appropriatemethod on thevisitor instance for each node encountered.

compiler.compile(source,filename,mode,flags=None,dont_inherit=None)

Compile the stringsource, a Python module, statement or expression, into acode object that can be executed by the exec statement oreval(). Thisfunction is a replacement for the built-incompile() function.

Thefilename will be used for run-time error messages.

Themode must be ‘exec’ to compile a module, ‘single’ to compile a single(interactive) statement, or ‘eval’ to compile an expression.

Theflags anddont_inherit arguments affect future-related statements, butare not supported yet.

compiler.compileFile(source)

Compiles the filesource and generates a .pyc file.

Thecompiler package contains the following modules:ast,consts,future,misc,pyassem,pycodegen,symbols,transformer, andvisitor.

33.2.Limitations

There are some problems with the error checking of the compiler package. Theinterpreter detects syntax errors in two distinct phases. One set of errors isdetected by the interpreter’s parser, the other set by the compiler. Thecompiler package relies on the interpreter’s parser, so it get the first phasesof error checking for free. It implements the second phase itself, and thatimplementation is incomplete. For example, the compiler package does not raisean error if a name appears more than once in an argument list:deff(x,x):...

A future version of the compiler should fix these problems.

33.3.Python Abstract Syntax

Thecompiler.ast module defines an abstract syntax for Python. In theabstract syntax tree, each node represents a syntactic construct. The root ofthe tree isModule object.

The abstract syntax offers a higher level interface to parsed Python sourcecode. Theparser module and the compiler written in C for the Pythoninterpreter use a concrete syntax tree. The concrete syntax is tied closely tothe grammar description used for the Python parser. Instead of a single nodefor a construct, there are often several levels of nested nodes that areintroduced by Python’s precedence rules.

The abstract syntax tree is created by thecompiler.transformer module.The transformer relies on the built-in Python parser to generate a concretesyntax tree. It generates an abstract syntax tree from the concrete tree.

Thetransformer module was created by Greg Stein and Bill Tutt for anexperimental Python-to-C compiler. The current version contains a number ofmodifications and improvements, but the basic form of the abstract syntax and ofthe transformer are due to Stein and Tutt.

33.3.1.AST Nodes

Thecompiler.ast module is generated from a text file that describes eachnode type and its elements. Each node type is represented as a class thatinherits from the abstract base classcompiler.ast.Node and defines aset of named attributes for child nodes.

classcompiler.ast.Node

TheNode instances are created automatically by the parser generator.The recommended interface for specificNode instances is to use thepublic attributes to access child nodes. A public attribute may be bound to asingle node or to a sequence of nodes, depending on theNode type. Forexample, thebases attribute of theClass node, is bound to alist of base class nodes, and thedoc attribute is bound to a singlenode.

EachNode instance has alineno attribute which may beNone. XXX Not sure what the rules are for which nodes will have a usefullineno.

AllNode objects offer the following methods:

getChildren()

Returns a flattened list of the child nodes and objects in the order theyoccur. Specifically, the order of the nodes is the order in which theyappear in the Python grammar. Not all of the children areNodeinstances. The names of functions and classes, for example, are plainstrings.

getChildNodes()

Returns a flattened list of the child nodes in the order they occur. Thismethod is likegetChildren(), except that it only returns thosechildren that areNode instances.

Two examples illustrate the general structure ofNode classes. Thewhile statement is defined by the following grammar production:

while_stmt:"while"expression":"suite["else"":"suite]

TheWhile node has three attributes:test,body, andelse_. (If the natural name for an attribute is also a Python reservedword, it can’t be used as an attribute name. An underscore is appended to theword to make it a legal identifier, henceelse_ instead ofelse.)

Theif statement is more complicated because it can include severaltests.

if_stmt:'if'test':'suite('elif'test':'suite)*['else'':'suite]

TheIf node only defines two attributes:tests andelse_. Thetests attribute is a sequence of test expression,consequent body pairs. There is one pair for eachif/elifclause. The first element of the pair is the test expression. The secondelements is aStmt node that contains the code to execute if the testis true.

ThegetChildren() method ofIf returns a flat list of childnodes. If there are threeif/elif clauses and noelse clause, thengetChildren() will return a list of sixelements: the first test expression, the firstStmt, the second textexpression, etc.

The following table lists each of theNode subclasses defined incompiler.ast and each of the public attributes available on theirinstances. The values of most of the attributes are themselvesNodeinstances or sequences of instances. When the value is something other than aninstance, the type is noted in the comment. The attributes are listed in theorder in which they are returned bygetChildren() andgetChildNodes().

Node type

Attribute

Value

Add

left

left operand

right

right operand

And

nodes

list of operands

AssAttr

attribute as target ofassignment

expr

expression on the left-handside of the dot

attrname

the attribute name, a string

flags

XXX

AssList

nodes

list of list elements beingassigned to

AssName

name

name being assigned to

flags

XXX

AssTuple

nodes

list of tuple elements beingassigned to

Assert

test

the expression to be tested

fail

the value of theAssertionError

Assign

nodes

a list of assignment targets,one per equal sign

expr

the value being assigned

AugAssign

node

op

expr

Backquote

expr

Bitand

nodes

Bitor

nodes

Bitxor

nodes

Break

CallFunc

node

expression for the callee

args

a list of arguments

star_args

the extended *-arg value

dstar_args

the extended **-arg value

Class

name

the name of the class, a string

bases

a list of base classes

doc

doc string, a string orNone

code

the body of the class statement

Compare

expr

ops

Const

value

Continue

Decorators

nodes

List of function decoratorexpressions

Dict

items

Discard

expr

Div

left

right

Ellipsis

Expression

node

Exec

expr

locals

globals

FloorDiv

left

right

For

assign

list

body

else_

From

modname

names

Function

decorators

Decorators orNone

name

name used in def, a string

argnames

list of argument names, asstrings

defaults

list of default values

flags

xxx

doc

doc string, a string orNone

code

the body of the function

GenExpr

code

GenExprFor

assign

iter

ifs

GenExprIf

test

GenExprInner

expr

quals

Getattr

expr

attrname

Global

names

If

tests

else_

Import

names

Invert

expr

Keyword

name

expr

Lambda

argnames

defaults

flags

code

LeftShift

left

right

List

nodes

ListComp

expr

quals

ListCompFor

assign

list

ifs

ListCompIf

test

Mod

left

right

Module

doc

doc string, a string orNone

node

body of the module, aStmt

Mul

left

right

Name

name

Not

expr

Or

nodes

Pass

Power

left

right

Print

nodes

dest

Printnl

nodes

dest

Raise

expr1

expr2

expr3

Return

value

RightShift

left

right

Slice

expr

flags

lower

upper

Sliceobj

nodes

list of statements

Stmt

nodes

Sub

left

right

Subscript

expr

flags

subs

TryExcept

body

handlers

else_

TryFinally

body

final

Tuple

nodes

UnaryAdd

expr

UnarySub

expr

While

test

body

else_

With

expr

vars

body

Yield

value

33.3.2.Assignment nodes

There is a collection of nodes used to represent assignments. Each assignmentstatement in the source code becomes a singleAssign node in the AST.Thenodes attribute is a list that contains a node for each assignmenttarget. This is necessary because assignment can be chained, e.g.a=b=2. EachNode in the list will be one of the following classes:AssAttr,AssList,AssName, orAssTuple.

Each target assignment node will describe the kind of object being assigned to:AssName for a simple name, e.g.a=1.AssAttr for anattribute assigned, e.g.a.x=1.AssList andAssTuple forlist and tuple expansion respectively, e.g.a,b,c=a_tuple.

The target assignment nodes also have aflags attribute that indicateswhether the node is being used for assignment or in a delete statement. TheAssName is also used to represent a delete statement, e.g.delx.

When an expression contains several attribute references, an assignment ordelete statement will contain only oneAssAttr node – for the finalattribute reference. The other attribute references will be represented asGetattr nodes in theexpr attribute of theAssAttrinstance.

33.3.3.Examples

This section shows several simple examples of ASTs for Python source code. Theexamples demonstrate how to use theparse() function, what the repr of anAST looks like, and how to access attributes of an AST node.

The first module defines a single function. Assume it is stored indoublelib.py.

"""This is an example module.This is the docstring."""defdouble(x):"Return twice the argument"returnx*2

In the interactive interpreter session below, I have reformatted the long ASTreprs for readability. The AST reprs use unqualified class names. If you wantto create an instance from a repr, you must import the class names from thecompiler.ast module.

>>>importcompiler>>>mod=compiler.parseFile("doublelib.py")>>>modModule('This is an example module.\n\nThis is the docstring.\n',       Stmt([Function(None, 'double', ['x'], [], 0,                      'Return twice the argument',                      Stmt([Return(Mul((Name('x'), Const(2))))]))]))>>>fromcompiler.astimport*>>>Module('This is an example module.\n\nThis is the docstring.\n',...Stmt([Function(None,'double',['x'],[],0,...'Return twice the argument',...Stmt([Return(Mul((Name('x'),Const(2))))]))]))Module('This is an example module.\n\nThis is the docstring.\n',       Stmt([Function(None, 'double', ['x'], [], 0,                      'Return twice the argument',                      Stmt([Return(Mul((Name('x'), Const(2))))]))]))>>>mod.doc'This is an example module.\n\nThis is the docstring.\n'>>>fornodeinmod.node.nodes:...printnode...Function(None, 'double', ['x'], [], 0, 'Return twice the argument',         Stmt([Return(Mul((Name('x'), Const(2))))]))>>>func=mod.node.nodes[0]>>>func.codeStmt([Return(Mul((Name('x'), Const(2))))])

33.4.Using Visitors to Walk ASTs

The visitor pattern is … Thecompiler package uses a variant on thevisitor pattern that takes advantage of Python’s introspection features toeliminate the need for much of the visitor’s infrastructure.

The classes being visited do not need to be programmed to accept visitors. Thevisitor need only define visit methods for classes it is specifically interestedin; a default visit method can handle the rest.

XXX The magicvisit() method for visitors.

compiler.visitor.walk(tree,visitor[,verbose])
classcompiler.visitor.ASTVisitor

TheASTVisitor is responsible for walking over the tree in the correctorder. A walk begins with a call topreorder(). For each node, it checksthevisitor argument topreorder() for a method named ‘visitNodeType,’where NodeType is the name of the node’s class, e.g. for aWhile node avisitWhile() would be called. If the method exists, it is called with thenode as its first argument.

The visitor method for a particular node type can control how child nodes arevisited during the walk. TheASTVisitor modifies the visitor argumentby adding a visit method to the visitor; this method can be used to visit aparticular child node. If no visitor is found for a particular node type, thedefault() method is called.

ASTVisitor objects have the following methods:

XXX describe extra arguments

default(node[,...])
dispatch(node[,...])
preorder(tree,visitor)

33.5.Bytecode Generation

The code generator is a visitor that emits bytecodes. Each visit method cancall theemit() method to emit a new bytecode. The basic code generatoris specialized for modules, classes, and functions. An assembler converts thatemitted instructions to the low-level bytecode format. It handles things likegeneration of constant lists of code objects and calculation of jump offsets.