- Notifications
You must be signed in to change notification settings - Fork11
Tiny JavaScript implementation of context-free languages parser - Earley parser (including generation of the parsing-forest).
License
lagodiuk/earley-parser-js
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Tiny JavaScript implementation of context-free languages parser -Earley parser
- General information about Earley parsing algorithm
- Online demo
2.1Parser of a tiny subset of English grammar
2.2Parser of arithmetic expressions - Quick start
3.1Grammar with hardcoded terminal symbols
3.2Customizing logic of tokens classification into terminal symbols
3.3Traversing parsed trees (parsing-forest)
3.4Parsing tiny subset of English language grammar - Used Tools and Libraries
###General information###
The Earley parser is an algorithm for parsing strings that belong to a givencontext-free language (the algorithm, named after its inventor,Jay Earley).
This algorithm is appealing because it can parse all context-free languages, unlikeLR parsers andLL parsers, which are more typically used in compilers but which can only handle restricted classes of languages.
Complexity of Earley parsing algorithm (in terms ofn - the length of the parsed string):
- O(n3) - cubic time in the general case
- O(n2) - quadratic time for unambiguous grammars
- O(n) - linear time for almost all LR(k) grammars
Earley parser performs particularly well when the rules are written left-recursively.
###Online demo###
####Parser of a tiny subset of English grammar####
Parser of a tiny subset of English grammar:https://jsfiddle.net/2mb3w9c1/4/embedded/result/
####Parser of arithmetic expressions####
Parser of arithmetic expressions:https://jsfiddle.net/vsf982m9/embedded/result/
vargrammar=newtinynlp.Grammar(['R -> N','S -> S add_sub M | M','M -> M mul_div T | T','N -> S lt_gt S | S','T -> num | ( S )',]);grammar.terminalSymbols=function(token){if('<'===token||'>'===token)return['lt_gt'];if('+'===token||'-'===token)return['add_sub'];if('*'===token||'/'===token)return['mul_div'];if('('===token)return['('];if(')'===token)return[')'];// Otherwise - token considered as a number:return['num'];}
##Usage##
Attach to your project - single file with implementation of Earley algorithm:https://rawgithub.com/lagodiuk/earley-parser-js/master/earley-oop.min.js
###Grammar with hardcoded terminal symbols###
// Define grammarvargrammar=newtinynlp.Grammar([// Define grammar production rules'R -> S','S -> S add_sub M | M | num','M -> M mul_div T | T | num','T -> num',// Define terminal symbols'num -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0','add_sub -> + | -','mul_div -> * | /']);// You have to tokenize input by yourself!// Creating array of tokensvartokens='2 + 3 * 4'.split(' ');// ParsingvarrootRule='R';varchart=tinynlp.parse(tokens,grammar,rootRule);// Get array with all parsed trees// In case of ambiguous grammar - there might be more than 1 parsing treevartrees=chart.getFinishedRoot(rootRule).traverse();// Iterate over all parsed trees and display them on HTML pagefor(variintrees){console.log(JSON.stringify(trees[i]))}
###Customizing logic of tokens classification into terminal symbols###Potentially, this approach allows to extend parser with custom classifier of tokens - into terminal symbols (e.g. recognize terminal symbols using Regular expressions or more sophisticated classifiers):
vargrammar=newtinynlp.Grammar(['R -> S','S -> S add_sub M | M | num','M -> M mul_div T | T | num','T -> num',]);// Define function, which will classify tokens into terminal typesgrammar.terminalSymbols=function(token){// Actually, this method might be customized// to use some more sophisticated classification mechanismsif('+'===token||'-'===token)return['add_sub'];if('*'===token||'/'===token)return['mul_div'];if(token.match(/^\d+$/))return['num'];// It is also possible that classifier returns// more than one terminal symbol, e.g.: ['term1', 'term2']// Otherwise:thrownewError("Can't recognize token: "+token);}// You have to tokenize input by yourself!// Creating array of tokensvartokens='2 + 3 * 4'.split(' ');// ParsingvarrootRule='R';varchart=tinynlp.parse(tokens,grammar,rootRule);// Get array with all parsed trees// In case of ambiguous grammar - there might be more than 1 parsing treevartrees=chart.getFinishedRoot(rootRule).traverse();// Iterate over all parsed trees and display them on HTML pagefor(variintrees){console.log(JSON.stringify(trees[i]))}
###Traversing parsed trees###Following snippet shows how to transform parsed trees into nested HTML lists:
functiontoNestedList(tree){if(!tree.subtrees||tree.subtrees.length==0){return'<li>'+tree.root+'</li>';}varbuilder=[];builder.push('<li>');builder.push(tree.root);builder.push('<ul>')for(variintree.subtrees){builder.push(toNestedList(tree.subtrees[i]))}builder.push('</ul>')builder.push('</li>')returnbuilder.join('');}
Example of usage:
// Iterate over all parsed trees and display them on HTML pagefor(variintrees){htmlRepresentstion='<ul>'+toNestedList(trees[i])+'</ul>'// embed htmlRepresentstion into HTML page}
###Parsing tiny subset of English language grammar###Grammar taken from:https://en.wikipedia.org/wiki/CYK_algorithm#Example
vargrammar=newtinynlp.Grammar(['S -> NP VP','VP -> VP PP | V NP | V','PP -> P NP','NP -> Det N | N']);grammar.terminalSymbols=function(token){if(token=='eats')return['V'];if(token=='fish')return['N'];if(token=='fork')return['N'];if(token=='she')return['N'];if(token=='a')return['Det'];if(token=='with')return['P'];// otherwise:return[];}// Tokenizing sentencevartokens='she eats a fish with a fork'.split(' ');// ParsingvarrootRule='S';varchart=tinynlp.parse(tokens,grammar,rootRule);// Get array with all parsed trees// In case of ambiguous grammar - there might be more than 1 parsing treevartrees=chart.getFinishedRoot(rootRule).traverse();for(variintrees){// visit each parse tree}
###Used Tools and Libraries###Implementation of the Earley Parser is self-sufficient (https://github.com/lagodiuk/earley-parser-js/blob/master/earley-oop.js).
However, for producing of the minimized variant of library was used theGoogle Closure Compiler Service.
Additionally, some of the demo examples are using:
Some of the demo examples are currently hosted onhttps://jsfiddle.net/
About
Tiny JavaScript implementation of context-free languages parser - Earley parser (including generation of the parsing-forest).
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.

