- Notifications
You must be signed in to change notification settings - Fork4
Lexer Generator and Parser Generator as a Library in Nim.
License
loloicci/nimly
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Lexer Generator and Parser Generator as a Macro Library in Nim.
With nimly, you can make lexer/parser by writing definitionin formats like lex/yacc.nimly
generates lexer and parser by using macro in compile-time,so you can usenimly
not as external tool of your program but as a library.
niml
is a macro to generate a lexer.
macroniml
makes a lexer.Almost all part of constructing a lexer is done in compile-time.Example is as follows.
## This makes a LexData object named myLexer.## This lexer returns value with type ``Token`` when a token is found.niml myLexer[Token]:r"if":## this part converted to procbody.## the arg is (token: LToken).returnTokenIf()r"else":returnTokenElse()r"true":returnTokenTrue()r"false":returnTokenFalse()## you can use ``..`` instead of ``-`` in ``[]``.r"[a..zA..Z\-_][a..zA..Z0..9\-_]*":returnTokenIdentifier(token)## you can define ``setUp`` and ``tearDown`` function.## ``setUp`` is called from ``open``, ``newWithString`` and## ``initWithString``.## ``tearDown`` is called from ``close``.## an example is ``test/lexer_global_var.nim``. setUp:doSomething() tearDown:doSomething()
Meta charactors are as following:
\
: escape character.
: match with any charactor[
: start of character class|
: means or(
: start of subpattern)
: end of subpattern?
: 0 or 1 times quantifier*
: 0 or more times quantifire+
: 1 or more times quantifire{
:{n,m}
is n or more and m or less times quantifire
In[]
, meta charactors are as following
\
: escape character^
: negate character (only in first position)]
: end of this class-
: specify character range (..
can be used instead of this)
Each of followings is recognized as character set.
\d
:[0..9]
\D
:[^0..9]
\s
:[ \t\n\r\f\v]
\S
:[^ \t\n\r\f\v]
\w
:[a..zA..Z0..9_]
\w
:[^a..zA..Z0..9_]
nimy
is a macro to generate a LALR(1) parser.
macronimy
makes a parser.Almost all part of constructing a parser is done in compile-time.Example is as follows.
## This makes a LexData object named myParser.## first cloud is the top-level of the BNF.## This lexer recieve tokens with type ``Token`` and token must have a value## ``kind`` with type enum ``[TokenTypeName]Kind``.## This is naturally satisfied when you use ``patty`` to define the token.nimy myParser[Token]:## the starting non-terminal## the return type of the parser is ``Expr`` top[Expr]:## a pattern. expr:## proc body that is used when parse the pattern with single ``expr``.## $1 means first position of the pattern (expr)return$1## non-terminal named ``expr``## with returning type ``Expr`` expr[Expr]:## first pattern of expr.## ``LPAR`` and ``RPAR`` is TokenKind.LPARexprRPAR:return$2## second pattern of expr.## ``PLUS`` is TokenKind.exprPLUS exprreturn$2
You can use following EBNF functions:
XXX[]
: Option (0 or 1XXX
).The type isseq[xxx]
wherexxx
is type ofXXX
.XXX{}
: Repeat (0 or moreXXX
).The type isseq[xxx]
wherexxx
is type ofXXX
.
Example of these is in next section.
tests/test_readme_example.nim
is an easy example.
import unittestimport pattyimport strutilsimport nimly## variant is defined in pattyvariantMyToken:PLUSMULTINUM(val:int)DOTLPARENRPARENIGNOREniml testLex[MyToken]:r"\(":returnLPAREN()r"\)":returnRPAREN()r"\+":returnPLUS()r"\*":returnMULTI()r"\d":returnNUM(parseInt(token.token))r"\.":returnDOT()r"\s":returnIGNORE()nimy testPar[MyToken]: top[string]: plus:return$1 plus[string]:multPLUS plus:return$1&" +"&$3 mult:return$1 mult[string]:numMULTI mult:return"["&$1&" *"&$3&"]" num:return$1 num[string]:LPARENplusRPAREN:return"("&$2&")"## float (integer part is 0-9) or integerNUMDOT[]NUM{}:result=""# type of `($1).val` is `int`result&=$(($1).val)if ($2).len>0:result&="."# type of `$3` is `seq[MyToken]` and each elements are NUMfor tknin$3:# type of `tkn.val` is `int`result&=$(tkn.val)test"test Lexer":var testLexer= testLex.newWithString("1 + 42 * 101010") testLexer.ignoreIf=proc(r:MyToken):bool= r.kind==MyTokenKind.IGNOREvar ret:seq[MyTokenKind]=@[]for tokenin testLexer.lexIter: ret.add(token.kind)check ret==@[MyTokenKind.NUM,MyTokenKind.PLUS,MyTokenKind.NUM,MyTokenKind.NUM,MyTokenKind.MULTI,MyTokenKind.NUM,MyTokenKind.NUM,MyTokenKind.NUM,MyTokenKind.NUM,MyTokenKind.NUM,MyTokenKind.NUM]test"test Parser 1":var testLexer= testLex.newWithString("1 + 42 * 101010") testLexer.ignoreIf=proc(r:MyToken):bool= r.kind==MyTokenKind.IGNOREvar parser= testPar.newParser()check parser.parse(testLexer)=="1 + [42 * 101010]" testLexer.initWithString("1 + 42 * 1010") parser.init()check parser.parse(testLexer)=="1 + [42 * 1010]"test"test Parser 2":var testLexer= testLex.newWithString("1 + 42 * 1.01010") testLexer.ignoreIf=proc(r:MyToken):bool= r.kind==MyTokenKind.IGNOREvar parser= testPar.newParser()check parser.parse(testLexer)=="1 + [42 * 1.01010]" testLexer.initWithString("1. + 4.2 * 101010") parser.init()check parser.parse(testLexer)=="1. + [4.2 * 101010]"test"test Parser 3":var testLexer= testLex.newWithString("(1 + 42) * 1.01010") testLexer.ignoreIf=proc(r:MyToken):bool= r.kind==MyTokenKind.IGNOREvar parser= testPar.newParser()check parser.parse(testLexer)=="[(1 + 42) * 1.01010]"
nimble install nimly
Now, you can use nimly withimport nimly
.
During compiling lexer/parser, you can encounter errors withinterpretation requires too many iterations
.You can avoid this error to use the compiler optionmaxLoopIterationsVM:N
which is available since nim v1.0.6.
See#11 to detail.
- Fork this
- Create new branch
- Commit your change
- Push it to the branch
- Create new pull request
Seechangelog.rst.
You can usenimldebug
andnimydebug
as a conditional symbolto print debug info.
example:nim c -d:nimldebug -d:nimydebug -r tests/test_readme_example.nim
About
Lexer Generator and Parser Generator as a Library in Nim.