we-like-parsers/pegenPublic

NotificationsYou must be signed in to change notification settings
Fork37
Star182

PEG parser generator for Python

License

MIT license

182 stars 37 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
data		data
docs		docs
media		media
pegen		pegen
scripts		scripts
src/pegen		src/pegen
stories		stories
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
releasenotes.rst		releasenotes.rst
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

What is this?

Pegen is the parser generator used in CPython to produce the parser used by the interpreter. It allows toproduce PEG parsers from a description of a formal Grammar.

Installing

Install withpip or your favorite PyPi package manager.

pip install pegen

Documentation

The documentation is availablehere.

How to generate a parser

Given a grammar file compatible withpegen (you can write your own or start with one in thedata directory), youcan easily generate a parser by running:

python -m pegen <path-to-grammar-file> -o parser.py

This will generate a file calledparser.py in the current directory. This can be used to parse code using the grammar thatwe just used:

python parser.py <file-with-code-to-parse>

As a demo: generate a Python parser from data/python.gram, and use the generated parser to parse and run tests/demo.py

make demo

How to contribute

See the instructions in theCONTRIBUTING.md file.

Differences with CPython's Pegen

This repository exists to distribute a version of the Python PEG parser generator used by CPython that can be installed via PyPI,with some improvements. Although the official PEG generator included in CPython can generate both Python and C code, this distributionof the generator only allows to generate Python code. This is due to the fact that the C code generated by the generator includedin CPython includes a lot of implementation details and private headers that are not available for general usage.

The official PEG generator for Python 3.9 and later is now included in the CPython repo underTools/peg_generator/. We aim to keep this repo in sync with thePython generator from that version ofpegen.

Repository structure

Thesrc directory contains thepegen source (the package itself).
Thetests directory contains the test suite for thepegen package.
Thedata directory contains some example grammars compatible withpegen. Thisincludes apure-Python version of the Python grammar.
Thedocs directory contains the documentation for the package.
Thescripts directory contains some useful scripts that can be used for visualizinggrammars, benchmarking and other usages relevant to the development of the generator itself.
Thestories directory contains the backing files and examples forGuido's series on PEG parser.

Quick syntax overview

The grammar consists of a sequence of rules of the form:

    rule_name: expression

Optionally, a type can be included right after the rule name, whichspecifies the return type of the Python function corresponding tothe rule:

    rule_name[return_type]: expression

If the return type is omitted, thenAny is returned.

Grammar Expressions

`# comment`

Python-style comments.

`e1 e2`

Match e1, then match e2.

    rule_name: first_rule second_rule

`e1 | e2`

Match e1 or e2.

The first alternative can also appear on the line after the rule namefor formatting purposes. In that case, a | must be used before thefirst alternative, like so:

    rule_name[return_type]:        | first_alt        | second_alt

`( e )`

Match e.

    rule_name: (e)

A slightly more complex and useful example includes using the groupingoperator together with the repeat operators:

    rule_name: (e1 e2)*

`[ e ] or e?`

Optionally match e.

    rule_name: [e]

A more useful example includes defining that a trailing comma isoptional:

    rule_name: e (',' e)* [',']

`e*`

Match zero or more occurrences of e.

    rule_name: (e1 e2)*

`e+`

Match one or more occurrences of e.

    rule_name: (e1 e2)+

`s.e+`

Match one or more occurrences of e, separated by s. The generated parsetree does not include the separator. This is otherwise identical to(e (s e)*).

    rule_name: ','.e+

`&e`

Succeed if e can be parsed, without consuming any input.

`!e`

Fail if e can be parsed, without consuming any input.

An example taken from the Python grammar specifies that a primaryconsists of an atom, which is not followed by a. or a( or a[:

    primary: atom !'.' !'(' !'['

`~`

Commit to the current alternative, even if it fails to parse.

    rule_name: '(' ~ some_rule ')' | some_alt

In this example, if a left parenthesis is parsed, then the otheralternative won’t be considered, even if some_rule or ‘)’ fail to beparsed.

Left recursion

PEG parsers normally do not support left recursion but Pegen implements atechnique that allows left recursion using the memoization cache. This allowsus to write not only simple left-recursive rules but also more complicatedrules that involve indirect left-recursion like

  rule1: rule2 | 'a'  rule2: rule3 | 'b'  rule3: rule1 | 'c'

and "hidden left-recursion" like::

  rule: 'optional'? rule '@' some_other_rule

Variables in the Grammar

A sub-expression can be named by preceding it with an identifier and an= sign. The name can then be used in the action (see below), like this: ::

    rule_name[return_type]: '(' a=some_other_rule ')' { a }

Grammar actions

To avoid the intermediate steps that obscure the relationship between thegrammar and the AST generation the PEG parser allows directly generating ASTnodes for a rule via grammar actions. Grammar actions are language-specificexpressions that are evaluated when a grammar rule is successfully parsed. Theseexpressions can be written in Python. As an example of a grammar with Python actions,the piece of the parser generator that parses grammar files is bootstrapped from ameta-grammar file with Python actions that generate the grammar tree as a resultof the parsing.

In the specific case of the PEG grammar for Python, having actions allowsdirectly describing how the AST is composed in the grammar itself, making itmore clear and maintainable. This AST generation process is supported by the useof some helper functions that factor out common AST object manipulations andsome other required operations that are not directly related to the grammar.

To indicate these actions each alternative can be followed by the action codeinside curly-braces, which specifies the return value of the alternative

    rule_name[return_type]:        | first_alt1 first_alt2 { first_alt1 }        | second_alt1 second_alt2 { second_alt1 }

If the action is ommited, a default action is generated:

If there's a single name in the rule in the rule, it gets returned.
If there is more than one name in the rule, a collection with all parsedexpressions gets returned.

This default behaviour is primarily made for very simple situations and fordebugging purposes.

As an illustrative example this simple grammar file allows directlygenerating a full parser that can parse simple arithmetic expressions and thatreturns a valid Python AST:

    start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or []) }    expr_stmt: a=expr NEWLINE { ast.Expr(value=a, EXTRA) }    expr:        | l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, EXTRA) }        | l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, EXTRA) }        | term    term:        | l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, EXTRA) }        | l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, EXTRA) }        | factor    factor:        | '(' e=expr ')' { e }        | atom    atom:        | NAME        | NUMBER

About

PEG parser generator for Python

we-like-parsers.github.io/pegen/

Releases2

v0.3.0 Latest

Nov 14, 2023

+ 1 release

Packages

No packages published

Contributors15

Languages

Python98.9%
Other1.1%

Movatterモバイル変換

License

we-like-parsers/pegen

Folders and files

Latest commit

History

Repository files navigation

What is this?

Installing

Documentation

How to generate a parser

How to contribute

Differences with CPython's Pegen

Repository structure

Quick syntax overview

Grammar Expressions

# comment

e1 e2

e1 | e2

( e )

[ e ] or e?

e*

e+

s.e+

&e

!e

~

Left recursion

Variables in the Grammar

Grammar actions

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases2

Packages0

Uh oh!

Contributors15

Uh oh!

Languages

`# comment`

`e1 e2`

`e1 | e2`

`( e )`

`[ e ] or e?`

`e*`

`e+`

`s.e+`

`&e`

`!e`

`~`

Packages