- Notifications
You must be signed in to change notification settings - Fork7
A pure-Python module that implements an LR(1) parser generator, as well as CFSM and GLR parser drivers.
License
MagicStack/parsing
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Theparsing module implements an LR(1) parser generator, as well as theruntime support for using a generated parser, via the Lr and Glr parserdrivers. There is no special parser generator input file format, but theparser generator still needs to know what classes/methods correspond tovarious aspects of the parser. This information is specified viadocstrings, which the parser generator introspects in order to generate aparser. Only one parser specification can be embedded in each module, butit is possible to share modules between parser specifications so that, forexample, the same token definitions can be used by multiple parserspecifications.
The parsing tables are LR(1), but they are generated using a fast algorithmthat avoids creating duplicate states that result when using the genericLR(1) algorithm. Creation time and table size are on par with the LALR(1)algorithm. However, LALR(1) can create reduce/reduce conflicts that don'texist in a true LR(1) parser. For more information on the algorithm, see:
A Practical General Method for Constructing LR(k) ParsersDavid PagerActa Informatica 7, 249-268 (1977)
Parsing table generation requires non-trivial amounts of time for largegrammars, however it is still quite fast. Internal pickling support makesit possible to cache the most recent version of the parsing table on disk,and use the table if the current parser specification is still compatiblewith the one that was used to generate the pickled parsing table. Sincethe compatibility checking is quite fast, even for large grammars, thisremoves the need to use the standard code generation method that is usedby most parser generators.
Parser specifications are encapsulated by theSpec class. Parserinstances useSpec instances, but are themselves based on separateclasses. This allows multiple parser instances to exist simultaneously,without requiring multiple copies of the parsing tables. There are twoseparate parser driver classes:
- Lr:
- Standard Characteristic Finite State Machine (CFSM) driver, based onunambiguous LR(1) parsing tables. This driver is faster than the Glrdriver, but it cannot deal with all parsing tables that the Glrdriver can.
- Glr:
Generalized LR driver, capable of tracking multiple parse treessimultaneously, if the %split precedence is used to mark ambiguousactions. This driver is closely based on Elkhound's design, whichis described in a technical report:
Elkhound: A Fast, Practical GLR Parser GeneratorScott McPeakReport No. UCB/CSD-2-1214 (December 2002)http://www.cs.berkeley.edu/~smcpeak/elkhound/
Parser generator directives are embedded in docstrings, and must begin witha '%' character, followed immediately by one of several keywords:
- Precedence:
%fail%nonassoc%left%right%split- Token:
%token- Non-terminal:
%start%nonterm- Production:
%reduce
All of these directives are associated with classes except for %reduce.%reduce is associated with methods within non-terminal classes. The Parsingmodule provides base classes from which precedences, tokens, andnon-terminals must be derived. This is not as restrictive as it sounds,since there is nothing preventing, for example, a master Token class thatsubclasses Parsing.Token, which all of the actual token types then subclass.Also, nothing prevents using multiple inheritance.
Folowing are the base classes to be subclassed by parser specifications:
- Precedence
- Token
- Nonterm
The Parsing module implements the following exception classes:
- SpecError - when there is a problem with the grammar specification
- ParsingException - any problem that occurs during parsing
- UnexpectedToken - when the input sequence contains a token that isnot allowed by the grammar (including end-of-input)
In order to maintain compatibility with legacy code, the Parsing moduledefines the following aliases. New code should use the exceptions abovethat do not shadow Python's builtin exceptions.
- Exception - superclass for all exceptions that can be raised
- SyntaxError - alias for UnexpectedToken
- Additionally, trying to set private attributes may raise:
- AttributeError
Author: Jason Evansjasone@canonware.com
Github repo:http://github.com/MagicStack/parsing
About
A pure-Python module that implements an LR(1) parser generator, as well as CFSM and GLR parser drivers.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors7
Uh oh!
There was an error while loading.Please reload this page.