Movatterモバイル変換

[0]ホーム

Jump to content

Scannerless parsing

Português

Edit links

From Wikipedia, the free encyclopedia

Algorithm that combines tokenization and parsing

This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Scannerless parsing" – news ·newspapers ·books ·scholar ·JSTOR(December 2016) (Learn how and when to remove this message)

This articlerelies largely or entirely on asingle source. Relevant discussion may be found on thetalk page. Please helpimprove this article byintroducing citations to additional sources.
Find sources: "Scannerless parsing" – news ·newspapers ·books ·scholar ·JSTOR(December 2016)

The topic of this articlemay not meet Wikipedia'sgeneral notability guideline. Please help to demonstrate the notability of the topic by citingreliable secondary sources that areindependent of the topic and provide significant coverage of it beyond a mere trivial mention. If notability cannot be shown, the article is likely to bemerged,redirected, ordeleted.
Find sources: "Scannerless parsing" – news ·newspapers ·books ·scholar ·JSTOR(June 2023) (Learn how and when to remove this message)

(Learn how and when to remove this message)

Incomputer science,scannerless parsing (also calledlexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into apipeline of alexer followed by aparser, executingconcurrently. Alanguage grammar is scannerless if it uses a single formalism to express both thelexical (word level) and phrase level structure of the language.

Dividing processing into a lexer followed by a parser is more modular; scannerless parsing is primarily used when a clear lexer–parser distinction is unneeded or unwanted. Examples of when this is appropriate includeTeX, mostwiki grammars,makefiles, simple application-specificscripting languages, andRaku.

Advantages

[edit]

Only onemetalanguage is needed
Non-regular lexical structure is handled easily
"Token classification" is unneeded which removes the need for design accommodations such as "the lexer hack" and languagereserved words (such as "while" inC)
Grammars can be compositional (can be merged without human intervention)^[a]

Disadvantages

[edit]

Since the lexical scanning and syntactic parsing are combined, the resulting parser tends to be more complicated and thus harder to understand and debug. The same will hold for the associated grammar, if a grammar is used to generate the parser.
The resulting parser tends to be significantly less efficient than a lexer-parser pipeline with regard to both time and memory.^[1]

Implementations

[edit]

SGLR is a parser for the modularSyntax Definition Formalism (SDF), and is part of theASF+SDF Meta-Environment and theStratego/XT program transformation system.
JSGLR, a pure Java implementation of SGLR, also based on SDF.
TXL supports character-level parsing.
dparser generates ANSI C code for scannerlessGLR parsers.
Spirit allows for both scannerless and scanner-based parsing.
SBP is a scannerless parser forBoolean grammars (a superset of context-free grammars), written in Java.
Laja is a two-phase scannerless parser generator with support for mapping the grammar rules into objects, written in Java.
TheRaku grammars feature of the general purpose programming languageRaku.
PyParsing is a scannerless parser written in pure Python.
META II Has built in token parsers functions.
TREE-META Like META II also is scannerless having builtin lexer functions.
CWIC compiler for writing and implementing compilers. Has token rules as part of its language. Rules in CWIC were compiled into Boolean functions returning success or failure.

Notes

[edit]

^a This is because parsing at the character level makes the language recognized by the parser a singlecontext-free language defined on characters, as opposed to a context-free language of sequences of strings inregular languages. Some lexerless parsers handle the entire class of context-free languages, which is closed under composition.

References

[edit]

^Economopoulos, Giorgios; Klint, Paul; Vinju, Jurgen (2009)."Faster Scannerless GLR Parsing"(PDF).Compiler Construction. Lecture Notes in Computer Science. Vol. 5501. pp. 126–141.doi:10.1007/978-3-642-00722-4_10.ISBN 978-3-642-00721-7.

v t e Parsing algorithms
Top-down	Earley LL Recursive descent Tail recursive
Bottom-up	Precedence Simple Operator Shunting-yard LR Simple Look-ahead Canonical Generalized CYK Recursive ascent Shift-reduce
Mixed, other	Combinator Chart Left corner Statistical
Related topics	PEG Definite clause grammar Deterministic parsing Dynamic programming Memoization Parser generator LALR Parse tree AST Scannerless parsing History of compiler construction Comparison of parser generators Operator-precedence grammar

Movatterモバイル変換

Advantages

Disadvantages

Implementations

Notes

References

Further reading