Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

The half diminished C compiler tutorial. x86 32 bit assembly, BSD license

License

NotificationsYou must be signed in to change notification settings

lyriarte/Cm7b5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The half diminished C compiler tutorial.

  • Lex - based tokenizer
  • RPN calculator
  • Recursive descent expression calculator
  • Yacc - based expression calculator
  • x86 assembly overview
  • Code generation for a stack machine
  • global int 32 variables
  • loops
  • if then else
  • Functions call convention

Compiling toolchain

+------------------+          +------------------+|                  | -------> |                  ||      Editor      |          |  Source code   __/|                  | <------- |             __/+------------------+          \____________/                                     |                                     v+------------------+          +------------------+|                  | <------- |                  || Assembly code  __/          |     Compiler     ||             __/             |                  |\____________/                +------------------+       |       v+------------------+          +------------------+|                  |          |                  ||    Assembler     | -------> |  Object file   __/|                  |          |             __/+------------------+          \____________/                                     |                                     v+------------------+          +------------------+|                  |          |                  ||    Executable    | <------- |      Linker      ||                  |          |                  |+------------------+          +------------------+

Compiler components

One pass compiler

+------------------+          +------------------+|                  | -------> |                  ||   Source code  __/          | Lexical Analyzer ||             __/             |                  |\____________/                +------------------+                                     |                                     v+------------------+          +------------------+|    Syntactic     |          |                  || analyser / Code  | <------- |    Tokens      __/|    generator     |          |             __/+------------------+          \____________/       |       v+------------------+|                  || Assembly code  __/|             __/\____________/

Multi pass compiler

+------------------+          +------------------+|                  | -------> |                  ||   Source code  __/          | Lexical Analyzer ||             __/             |                  |\____________/                +------------------+                                     |                                     v+------------------+          +------------------+|    Syntactic     |          |                  ||    analyser      | <------- |    Tokens      __/|                  |          |             __/+------------------+          \____________/       |       v+------------------+          +------------------+|                  | -------> |                  ||   Parse tree   __/ <------- |  Code generator  ||             __/             |                  |\____________/                +------------------+                                     |                                     v                              +------------------+                              |                  |                              | Assembly code  __/                              |             __/                              \____________/

String tokenizer

A String tokenizer is the part of a compiler that performs lexical analysis.

  • Match regular expressions in a text.
  • Return text content associated with the regexp.

Bare-bones parser

git checkout tokenizercat tokenizer.l

Lex-based number tokenizer

git checkout tokenizergit clean -fmakeecho "123 * + 456 -" | ./tokenizer

The parser outputs an error for the first non-recognised character.

echo "123 hello" | ./tokenizer

RPN calculator

Reverse Polish Notation

RPN, or postfix notation, is a mathematical notation where the operator follows its operands.

  • Can be evaluated without parenthesis as long as arity is fixed.
  • Well fitted for stack-based machines.

Infix notation:

(123 - 5 * 4 - 3) / 4

Postfix rpn equivalent:

123 5 4 * - 3 - 4 /

Bare-bones parser

git checkout simple_rpncat rpn.c

Lex-based RPN calculator

git checkout rpngit clean -fmakeecho "123 5 4 * - 3 - 4 /" | ./rpn

Semantic errors may cause a stack underflow.

Recursive descent expression calculator

LL(1) grammar

  • Read input from left to right
  • Leftmost derivation

Bare-bones parser

git checkout simple_expression_parser_recursivecat recursive.c

Example expression:

a + b + c

Parse tree:

  + / \a   +   / \  b   c

Right-recursive expression calculator

git checkout recursivegit clean -fmakeecho "(123 - 5 * 4 - 3) / 4" | ./recursive

Explicit parenthesis are needed to get a correct interpretation.

echo "((123 - 5 * 4) - 3) / 4" | ./recursive

Supporting left recursion with a recursive parser is possible by building a parse tree and re-parenting the nodes.

Yacc - based expression calculator

LR grammar

  • Read input from left to right
  • Rightmost derivation

Bare-bones parser

git checkout simple_expression_parsercat parser.y

Example expression:

a + b + c

Parse tree:

    +   / \  +   a / \b   c

Left-recursive expression calculator

git checkout interpretergit clean -fmakeecho "(123 - 5 * 4 - 3) / 4" | ./parser

Stack automaton

git checkout yydebuggit clean -fmakecat y.outputecho "2 * 3" | ./parser

C subset compiler

A one pass compiler generates assembly "on the fly" code during the syntax analysis phase.

x86 32 bit assembly

Register layout

<-----------  EAX  ------------->                <-----  AX  ---->+---------------+-------+-------+|               |  AH   |  AL   |+---------------+-------+-------+31              15       7      0

Register usage

  • All purpose: EAX, EBX, ECX, EDX, ESI, EDI
  • Stack base: EBP, pointer: ESP

Example program

Assembly code to dump 32 bit register in hex mode.

Compiling expressions

Like for a RPN machine, the generated code pops the second operand, then the first operand, does the operation and pushes the result. The second operand is stored in register ebx. The first operand and the operation result are computed in eax then pushed on the stack.

Example expression:

5 - 3

Generated assembly code:

push 5push 3pop ebxpop eaxsub eax, ebxpush eax

Branches and labels

Loops

Begin:^      +-----+|     /       \|    /         \|   | Condition |-- false --+|    \         /            ||     \       /             ||      +-----+              ||         |                 ||         | true            ||         v                 ||   +-----------+           ||   |           |           ||   |   Bloc    |           ||   |           |           ||   +-----------+           ||         |                 |+---------+                 |                            |    End:   <----------------+

Conditions

       +-----+      /       \     /         \    | Condition |-- false --+     \         /            |      \       /             |       +-----+              |          |                 |          | true            |          v                 |    +-----------+           |    |           |           |    |   Bloc    |           |    |           |           |    +-----------+           |          |                 |+---------+                 ||                           ||   End:  +-----------------+|         ||         v|   +-----------+|   |           ||   |   Bloc    ||   |           ||   +-----------+|         |v         vFinal:

Calling convention

+-------------------------------+|           variable 2          | <--- EBP - 8+-------------------------------+|           variable 1          | <--- EBP - 4+-------------------------------+|            saved EBP          | <--- EBP+-------------------------------+|         return address        |+-------------------------------+|           argument 1          | <--- EBP + 8+-------------------------------+|           argument 2          | <--- EBP + 12+-------------------------------+|           argument 3          | <--- EBP + 16+-------------------------------+

About

The half diminished C compiler tutorial. x86 32 bit assembly, BSD license

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp