- Notifications
You must be signed in to change notification settings - Fork0
ashwin9999/LAG
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Lexical Analyzer Generator
This is a Lexical Analyzer Generator implemented in C++. It reads input specifications and outputs a Lexical Analyzer which can parse C++ code into a token list.
(Regex -> Postfix -> NFA -> DFA)
- Firstly, it parses the input specifications provided via the command-line (in the form of a .lex format file).
- It then generates a list of NFA for the regex entries.
- It combines all the NFAs into one giant NFA.
- And then, it converts the NFA to a DFA.
- It attempts at minimizing the DFA and finally, generates a Lexical Analyzer (c++ code) from the DFA.
https://swtch.com/~rsc/regexp/regexp1.html
https://en.wikipedia.org/wiki/Lex_(software)
https://www.epaperpress.com/lexandyacc/download/lex.pdf
To compile and run the LAG:
make ./lag files/input.specs output.cpp
Note: running the make command will give a warning, but that warning can be ignored. It will not give any errors.
This will generate the Lexical Analyzer in the file output.cpp.To run the Lexical Analyzer on the test samples.
g++ output.cpp -o lex./lex files/test.cpp
%{[Declaration]%}%%[Entry]%%[Code]
Reference for this format:https://en.wikipedia.org/wiki/Lex_(software)
%{#include <iostream>using namespace std;int lineNum = 1;%}%%\s|\t|\r { strcpy(token->type, "SKIP");}\n { strcpy(token->type, "SKIP"); lineNum++;}if|else|while|do|break|switch|for|case|struct|typedef|cout|cin { strcpy(token->type,"KEYWORD");}int|float|double|long|char { strcpy(token->type, "TYPE");}(un)?signed { strcpy(token->type, "TYPE_MODIFIER");}#(include|define|undef|ifdef|ifndef|if|endif|elif|else) { strcpy(token->type,"PREPROC_DERIV");}[a-zA-Z_][a-z0-9A-Z_]* { strcpy(token->type, "IDENTIFIER");}[1-9][0-9]*|[0-9] { strcpy(token->type, "INTEGER");}\+|\-|\*|/|% { strcpy(token->type, "ARITHMETIC_OP");}==|&&|(\|\|)|!|<|>|<=|>= { strcpy(token->type, "LOGICAL_OP");}^|&|~|\| { strcpy(token->type, "BIT_OP");}\-> { strcpy(token->type, "ACCESS_OP");}= { strcpy(token->type, "ASSIGN_OP");}; { strcpy(token->type, "SEMICOLON");}\. { strcpy(token->type, "DOT");}, { strcpy(token->type, "COMMA");}\[ { strcpy(token->type, "LEFT_SQBRAC");}\] { strcpy(token->type, "RIGHT_SQBRAC");}" { strcpy(token->type, "QUOTE");}<< { strcpy(token->type, "LEFT_SHIFT");}>> { strcpy(token->type, "RIGHT_SHIFT");}\( { strcpy(token->type, "LEFT_PAREN");}\) { strcpy(token->type, "RIGHT_PAREN");}} { strcpy(token->type, "BLOCK_END");}{ { strcpy(token->type, "BLOCK_START");}%%void print() { iter_list(print_token); cout << "Number of lines: " << lineNum << endl;}int main(int argc, char** argv) { lexicalAnalyzer(argv[1], print); return 0;}
The only part you need to change is this:
%%\s|\t|\r { strcpy(token->type, "SKIP");}\n { strcpy(token->type, "SKIP"); lineNum++;}if|else|while|do|break|switch|for|case|struct|typedef|cout|cin { strcpy(token->type,"KEYWORD");}int|float|double|long|char { strcpy(token->type, "TYPE");}(un)?signed { strcpy(token->type, "TYPE_MODIFIER");}#(include|define|undef|ifdef|ifndef|if|endif|elif|else) { strcpy(token->type,"PREPROC_DERIV");}[a-zA-Z_][a-z0-9A-Z_]* { strcpy(token->type, "IDENTIFIER");}[1-9][0-9]*|[0-9] { strcpy(token->type, "INTEGER");}\+|\-|\*|/|% { strcpy(token->type, "ARITHMETIC_OP");}==|&&|(\|\|)|!|<|>|<=|>= { strcpy(token->type, "LOGICAL_OP");}^|&|~|\| { strcpy(token->type, "BIT_OP");}\-> { strcpy(token->type, "ACCESS_OP");}= { strcpy(token->type, "ASSIGN_OP");}; { strcpy(token->type, "SEMICOLON");}\. { strcpy(token->type, "DOT");}, { strcpy(token->type, "COMMA");}\[ { strcpy(token->type, "LEFT_SQBRAC");}\] { strcpy(token->type, "RIGHT_SQBRAC");}" { strcpy(token->type, "QUOTE");}<< { strcpy(token->type, "LEFT_SHIFT");}>> { strcpy(token->type, "RIGHT_SHIFT");}\( { strcpy(token->type, "LEFT_PAREN");}\) { strcpy(token->type, "RIGHT_PAREN");}} { strcpy(token->type, "BLOCK_END");}{ { strcpy(token->type, "BLOCK_START");}%%
You can add or remove tokens as needed to test my program. Adding more tokens will increase the number of DFA states, and decreasing the tokens will reduce the number of states. Please do not change the code portion of the input specifications that contain (main and print) functions.