lagodiuk/earley-parser-jsPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star122

Tiny JavaScript implementation of context-free languages parser - Earley parser (including generation of the parsing-forest).

License

Apache-2.0 license

122 stars 11 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
demo_classify_terminal_symbols		demo_classify_terminal_symbols
demo_hardcoded_terminal_symbols		demo_hardcoded_terminal_symbols
demo_live_parsing		demo_live_parsing
demo_nlp		demo_nlp
demo_nlp_deutsch		demo_nlp_deutsch
demo_nlp_live		demo_nlp_live
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
earley-oop.js		earley-oop.js
earley-oop.min.js		earley-oop.min.js

Repository files navigation

earley-parser-js

Tiny JavaScript implementation of context-free languages parser -Earley parser

General information

The Earley parser is an algorithm for parsing strings that belong to a givencontext-free language (the algorithm, named after its inventor,Jay Earley).

This algorithm is appealing because it can parse all context-free languages, unlikeLR parsers andLL parsers, which are more typically used in compilers but which can only handle restricted classes of languages.

Complexity of Earley parsing algorithm (in terms ofn - the length of the parsed string):

O(n³) - cubic time in the general case
O(n²) - quadratic time for unambiguous grammars
O(n) - linear time for almost all LR(k) grammars

Earley parser performs particularly well when the rules are written left-recursively.

Online demo

Parser of the tiny subset of German grammar

Parser of the tiny subset of German grammar:http://lagodiuk.github.io/nlp/deutsch.html

Parser of the tiny subset of English grammar

Parser of a tiny subset of English grammar:https://jsfiddle.net/2mb3w9c1/4/embedded/result/

Parser of arithmetic expressions

Parser of arithmetic expressions:https://jsfiddle.net/vsf982m9/embedded/result/

vargrammar=newtinynlp.Grammar(['R -> N','S -> S add_sub M | M','M -> M mul_div T | T','N -> S lt_gt S | S','T -> num | ( S )',]);grammar.terminalSymbols=function(token){if('<'===token||'>'===token)return['lt_gt'];if('+'===token||'-'===token)return['add_sub'];if('*'===token||'/'===token)return['mul_div'];if('('===token)return['('];if(')'===token)return[')'];// Otherwise - token considered as a number:return['num'];}

Usage

Attach to your project - single file with implementation of Earley algorithm:https://rawgithub.com/lagodiuk/earley-parser-js/master/earley-oop.min.js

Grammar with hardcoded terminal symbols

// Define grammarvargrammar=newtinynlp.Grammar([// Define grammar production rules'R -> S','S -> S add_sub M | M | num','M -> M mul_div T | T | num','T -> num',// Define terminal symbols'num -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0','add_sub -> + | -','mul_div -> * | /']);// You have to tokenize input by yourself!// Creating array of tokensvartokens='2 + 3 * 4'.split(' ');// ParsingvarrootRule='R';varchart=tinynlp.parse(tokens,grammar,rootRule);// Get array with all parsed trees// In case of ambiguous grammar - there might be more than 1 parsing treevartrees=chart.getFinishedRoot(rootRule).traverse();// Iterate over all parsed trees and display them on HTML pagefor(variintrees){console.log(JSON.stringify(trees[i]))}

Customizing logic of tokens classification into terminal symbols

Potentially, this approach allows to extend parser with custom classifier of tokens - into terminal symbols (e.g. recognize terminal symbols using Regular expressions or more sophisticated classifiers):

vargrammar=newtinynlp.Grammar(['R -> S','S -> S add_sub M | M | num','M -> M mul_div T | T | num','T -> num',]);// Define function, which will classify tokens into terminal typesgrammar.terminalSymbols=function(token){// Actually, this method might be customized// to use some more sophisticated classification mechanismsif('+'===token||'-'===token)return['add_sub'];if('*'===token||'/'===token)return['mul_div'];if(token.match(/^\d+$/))return['num'];// It is also possible that classifier returns// more than one terminal symbol, e.g.: ['term1', 'term2']// Otherwise:thrownewError("Can't recognize token: "+token);}// You have to tokenize input by yourself!// Creating array of tokensvartokens='2 + 3 * 4'.split(' ');// ParsingvarrootRule='R';varchart=tinynlp.parse(tokens,grammar,rootRule);// Get array with all parsed trees// In case of ambiguous grammar - there might be more than 1 parsing treevartrees=chart.getFinishedRoot(rootRule).traverse();// Iterate over all parsed trees and display them on HTML pagefor(variintrees){console.log(JSON.stringify(trees[i]))}

Traversing parsed trees

Following snippet shows how to transform parsed trees into nested HTML lists:

functiontoNestedList(tree){if(!tree.subtrees||tree.subtrees.length==0){return'<li>'+tree.root+'</li>';}varbuilder=[];builder.push('<li>');builder.push(tree.root);builder.push('<ul>')for(variintree.subtrees){builder.push(toNestedList(tree.subtrees[i]))}builder.push('</ul>')builder.push('</li>')returnbuilder.join('');}

Example of usage:

// Iterate over all parsed trees and display them on HTML pagefor(variintrees){htmlRepresentstion='<ul>'+toNestedList(trees[i])+'</ul>'// embed htmlRepresentstion into HTML page}

Parsing tiny subset of English language grammar

Grammar taken from:https://en.wikipedia.org/wiki/CYK_algorithm#Example

vargrammar=newtinynlp.Grammar(['S -> NP VP','VP -> VP PP | V NP | V','PP -> P NP','NP -> Det N | N']);grammar.terminalSymbols=function(token){if(token=='eats')return['V'];if(token=='fish')return['N'];if(token=='fork')return['N'];if(token=='she')return['N'];if(token=='a')return['Det'];if(token=='with')return['P'];// otherwise:return[];}// Tokenizing sentencevartokens='she eats a fish with a fork'.split(' ');// ParsingvarrootRule='S';varchart=tinynlp.parse(tokens,grammar,rootRule);// Get array with all parsed trees// In case of ambiguous grammar - there might be more than 1 parsing treevartrees=chart.getFinishedRoot(rootRule).traverse();for(variintrees){// visit each parse tree}

Used Tools and Libraries

Implementation of the Earley Parser is self-sufficient (https://github.com/lagodiuk/earley-parser-js/blob/master/earley-oop.js).

However, for producing of the minimized variant of library was used theGoogle Closure Compiler Service.

Additionally, some of the demo examples are using:

Some of the demo examples are currently hosted onhttps://jsfiddle.net/

This library is used by...

The web site of the book "Formale Sprachen, abstrakte Automaten und Compiler: Lehr- und Arbeitsbuch für Grundstudium und Fortbildung" (by Christian Wagenknecht (Autor), Michael Hielscher (Mitwirkende)):https://flaci.com/home/The library is used by this web site for creation of the browser environment, where users can experiment with different Context Free Grammars (define grammars and produce the parsing trees on the fly)

About

Tiny JavaScript implementation of context-free languages parser - Earley parser (including generation of the parsing-forest).

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

earley-parser-js

Table of contents

General information

Online demo

Parser of the tiny subset of German grammar

Parser of the tiny subset of English grammar

Parser of arithmetic expressions

Usage

Grammar with hardcoded terminal symbols

Customizing logic of tokens classification into terminal symbols

Traversing parsed trees

Parsing tiny subset of English language grammar

Used Tools and Libraries

This library is used by...

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

lagodiuk/earley-parser-js

Folders and files

Latest commit

History

Repository files navigation

earley-parser-js

Table of contents

General information

Online demo

Parser of the tiny subset of German grammar

Parser of the tiny subset of English grammar

Parser of arithmetic expressions

Usage

Grammar with hardcoded terminal symbols

Customizing logic of tokens classification into terminal symbols

Traversing parsed trees

Parsing tiny subset of English language grammar

Used Tools and Libraries

This library is used by...

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages