The half diminished C compiler tutorial.
Lex - based tokenizer RPN calculator Recursive descent expression calculator Yacc - based expression calculator x86 assembly overview Code generation for a stack machine global int 32 variables loops if then else Functions call convention +------------------+ +------------------+| | -------> | || Editor | | Source code __/| | <------- | __/+------------------+ \____________/ | v+------------------+ +------------------+| | <------- | || Assembly code __/ | Compiler || __/ | |\____________/ +------------------+ | v+------------------+ +------------------+| | | || Assembler | -------> | Object file __/| | | __/+------------------+ \____________/ | v+------------------+ +------------------+| | | || Executable | <------- | Linker || | | |+------------------+ +------------------+
+------------------+ +------------------+| | -------> | || Source code __/ | Lexical Analyzer || __/ | |\____________/ +------------------+ | v+------------------+ +------------------+| Syntactic | | || analyser / Code | <------- | Tokens __/| generator | | __/+------------------+ \____________/ | v+------------------+| || Assembly code __/| __/\____________/
+------------------+ +------------------+| | -------> | || Source code __/ | Lexical Analyzer || __/ | |\____________/ +------------------+ | v+------------------+ +------------------+| Syntactic | | || analyser | <------- | Tokens __/| | | __/+------------------+ \____________/ | v+------------------+ +------------------+| | -------> | || Parse tree __/ <------- | Code generator || __/ | |\____________/ +------------------+ | v +------------------+ | | | Assembly code __/ | __/ \____________/
A String tokenizer is the part of a compiler that performs lexical analysis.
Match regular expressions in a text. Return text content associated with the regexp. git checkout tokenizercat tokenizer.l
Lex-based number tokenizer git checkout tokenizergit clean -fmakeecho "123 * + 456 -" | ./tokenizer
The parser outputs an error for the first non-recognised character.
echo "123 hello" | ./tokenizer
RPN, or postfix notation, is a mathematical notation where the operator follows its operands.
Can be evaluated without parenthesis as long as arity is fixed. Well fitted for stack-based machines. Infix notation:
Postfix rpn equivalent:
git checkout simple_rpncat rpn.c
git checkout rpngit clean -fmakeecho "123 5 4 * - 3 - 4 /" | ./rpn
Semantic errors may cause a stack underflow.
Recursive descent expression calculator Read input from left to right Leftmost derivation git checkout simple_expression_parser_recursivecat recursive.c
Example expression:
Parse tree:
Right-recursive expression calculator git checkout recursivegit clean -fmakeecho "(123 - 5 * 4 - 3) / 4" | ./recursive
Explicit parenthesis are needed to get a correct interpretation.
echo "((123 - 5 * 4) - 3) / 4" | ./recursive
Supporting left recursion with a recursive parser is possible by building a parse tree and re-parenting the nodes.
Yacc - based expression calculator Read input from left to right Rightmost derivation git checkout simple_expression_parsercat parser.y
Example expression:
Parse tree:
Left-recursive expression calculator git checkout interpretergit clean -fmakeecho "(123 - 5 * 4 - 3) / 4" | ./parser
git checkout yydebuggit clean -fmakecat y.outputecho "2 * 3" | ./parser
A one pass compiler generates assembly "on the fly" code during the syntax analysis phase.
<----------- EAX -------------> <----- AX ---->+---------------+-------+-------+| | AH | AL |+---------------+-------+-------+31 15 7 0
All purpose: EAX, EBX, ECX, EDX, ESI, EDI Stack base: EBP, pointer: ESP Assembly code to dump 32 bit register in hex mode.
Like for a RPN machine, the generated code pops the second operand, then the first operand, does the operation and pushes the result. The second operand is stored in register ebx. The first operand and the operation result are computed in eax then pushed on the stack.
Example expression:
Generated assembly code:
push 5push 3pop ebxpop eaxsub eax, ebxpush eax
Begin:^ +-----+| / \| / \| | Condition |-- false --+| \ / || \ / || +-----+ || | || | true || v || +-----------+ || | | || | Bloc | || | | || +-----------+ || | |+---------+ | | End: <----------------+
+-----+ / \ / \ | Condition |-- false --+ \ / | \ / | +-----+ | | | | true | v | +-----------+ | | | | | Bloc | | | | | +-----------+ | | |+---------+ || || End: +-----------------+| || v| +-----------+| | || | Bloc || | || +-----------+| |v vFinal:
+-------------------------------+| variable 2 | <--- EBP - 8+-------------------------------+| variable 1 | <--- EBP - 4+-------------------------------+| saved EBP | <--- EBP+-------------------------------+| return address |+-------------------------------+| argument 1 | <--- EBP + 8+-------------------------------+| argument 2 | <--- EBP + 12+-------------------------------+| argument 3 | <--- EBP + 16+-------------------------------+