Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A small C compiler

License

NotificationsYou must be signed in to change notification settings

rui314/chibicc

Repository files navigation

(The old master has moved tohistorical/oldbranch. This is a new one uploaded in September 2020.)

chibicc is yet another small C compiler that implements most C11features. Even though it still probably falls into the "toy compilers"category just like other small compilers do, chibicc can compileseveral real-world programs, includingGit,SQLite,libpng and chibiccitself, without making modifications to the compiled programs.Generated executables of these programs pass their corresponding testsuites. So, chibicc actually supports a wide variety of C11 featuresand is able to compile hundreds of thousands of lines of real-world Ccode correctly.

chibicc is developed as the reference implementation for a book I'mcurrently writing about the C compiler and the low-level programming.The book covers the vast topic with an incremental approach; in the firstchapter, readers will implement a "compiler" that accepts just a singlenumber as a "language", which will then gain one feature at a time in eachsection of the book until the language that the compiler accepts matcheswhat the C11 spec specifies. I took this incremental approach fromthepaper by AbdulazizGhuloum.

Each commit of this project corresponds to a section of the book. For thispurpose, not only the final state of the project but each commit wascarefully written with readability in mind. Readers should be able to learnhow a C language feature can be implemented just by reading one or a fewcommits of this project. For example, this is howwhile,[],?:,andthread-localvariableare implemented. If you have plenty of spare time, it might be fun to readit from thefirstcommit.

If you like this project, please consider purchasing a copy of the bookwhen it becomes available! 😀 I publish the source code here to give peopleearly access to it, because I was planing to do that anyway with apermissive open-source license after publishing the book. If I don't chargefor the source code, it doesn't make much sense to me to keep it private. Ihope to publish the book in 2021.You can sign uphere to receive anotification when a free chapter is available online or the book is published.

I pronounce chibicc aschee bee cee cee. "chibi" means "mini" or"small" in Japanese. "cc" stands for C compiler.

Status

chibicc supports almost all mandatory features and most optionalfeatures of C11 as well as a few GCC language extensions.

Features that are often missing in a small compiler but supported bychibicc include (but not limited to):

  • Preprocessor
  • float, double and long double (x87 80-bit floating point numbers)
  • Bit-fields
  • alloca()
  • Variable-length arrays
  • Compound literals
  • Thread-local variables
  • Atomic variables
  • Common symbols
  • Designated initializers
  • L, u, U and u8 string literals
  • Functions that take or return structs as values, as specified by thex86-64 SystemV ABI

chibicc does not support complex numbers, K&R-style function prototypesand GCC-style inline assembly. Digraphs and trigraphs are intentionallyleft out.

chibicc outputs a simple but nice error message when it finds an error insource code.

There's no optimization pass. chibicc emits terrible code which is probablytwice or more slower than GCC's output. I have a plan to add anoptimization pass once the frontend is done.

I'm using Ubuntu 20.04 for x86-64 as a development platform. I made afew small changes so that chibicc works on Ubuntu 18.04, Fedora 32 andGentoo 2.6, but portability is not my goal at this moment. It may ormay not work on systems other than Ubuntu 20.04.

Internals

chibicc consists of the following stages:

  • Tokenize: A tokenizer takes a string as an input, breaks it into a listof tokens and returns them.

  • Preprocess: A preprocessor takes as an input a list of tokens and outputa new list of macro-expanded tokens. It interprets preprocessordirectives while expanding macros.

  • Parse: A recursive descendent parser constructs abstract syntax treesfrom the output of the preprocessor. It also adds a type to each ASTnode.

  • Codegen: A code generator emits an assembly text for given AST nodes.

Contributing

When I find a bug in this compiler, I go back to the original commit thatintroduced the bug and rewrite the commit history as if there were no suchbug from the beginning. This is an unusual way of fixing bugs, but as apart of a book, it is important to keep every commit bug-free.

Thus, I do not take pull requests in this repo. You can send me a pullrequest if you find a bug, but it is very likely that I will read yourpatch and then apply that to my previous commits by rewriting history. I'llcredit your name somewhere, but your changes will be rewritten by me beforesubmitted to this repository.

Also, please assume that I will occasionally force-push my local repositoryto this public one to rewrite history. If you clone this project and makelocal commits on top of it, your changes will have to be rebased by handwhen I force-push new commits.

Design principles

chibicc's core value is its simplicity and the reability of its sourcecode. To achieve this goal, I was careful not to be too clever whenwriting code. Let me explain what that means.

Oftentimes, as you get used to the code base, you are tempted toimprove the code using more abstractions and clever tricks.But that kind ofimprovements don't always improve readability forfirst-time readers and can actually hurts it. I tried to avoid thepitfall as much as possible. I wrote this code not for me but forfirst-time readers.

If you take a look at the source code, you'll find a couple ofdumb-looking pieces of code. These are written intentionally that way(but at some places I might be actually missing something,though). Here is a few notable examples:

  • The recursive descendent parser contains many similar-looking functionsfor similar-looking generative grammar rules. You might be temptedtoimprove it to reduce the duplication using higher-order functionsor macros, but I thought that that's too complicated. It's better toallow small duplications instead.

  • chibicc doesn't try too hard to save memory. An entire input sourcefile is read to memory first before the tokenizer kicks in, for example.

  • Slow algorithms are fine if we know that n isn't too big.For example, we use a linked list as a set in the preprocessor, sothe membership check takes O(n) where n is the size of the set. Butthat's fine because we know n is usually very small.And even if n can be very big, I stick with a simple slow algorithmuntil it is proved by benchmarks that that's a bottleneck.

  • Each AST node type uses only a few members of theNode struct members.Other unusedNode members are just a waste of memory at runtime.We could save memory using unions, but I decided to simply put everythingin the same struct instead. I believe the inefficiency is negligible.Even if it matters, we can always change the code to use unionsat any time. I wanted to avoid premature optimization.

  • chibicc always allocates heap memory usingcalloc, which is avariant ofmalloc that clears memory with zero.calloc isslightly slower thanmalloc, but that should be neligible.

  • Last but not least, chibicc allocates memory usingcalloc but nevercallsfree. Allocated heap memory is not freed until the process exits.I'm sure that this memory management policy (or lack thereof) looksvery odd, but it makes sense for short-lived programs such as compilers.DMD, a compiler for the D programming language, uses the same memorymanagement scheme for the same reason, for example [1].

About the Author

I'm Rui Ueyama. I'm the creator of8cc,which is a hobby C compiler, and also the original creator of the currentversion ofLLVM lld linker, which is aproduction-quality linker used by various operating systems and large-scalebuild systems.

References

[1]https://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941

DMD does memory allocation in a bit of a sneaky way. Since compilersare short-lived programs, and speed is of the essence, DMD justmallocs away, and never frees.

About

A small C compiler

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp