Atul's Mini-C Compiler
June 2, 2004
Download the tarball.
This is a compiler for a subset of the C programming language. It was
written in Python during the spring of 2004.
The lexer and parser were constructed using Dave Beazley's PLY (Python Lex-Yacc),
an open-source Python implementation of GNU lex/yacc. Stages of
compilation (symbol tree generation, type checking, flow control
checking, etc) are performed using an object-oriented design pattern
called a visitor (GoF 1995). The output is annotated Intel 80x86
assembly, suitable for translation to machine language using the
GNU Assembler (GAS).
Language Features
The subset of the C language implemented here includes:
- Functions, variables (local and global), and character and string
literals.
- Assignments (=, +=, etc), standard arithmetic
binary and unary operators (+,-,*, etc),
logical binary and unary operators (!, ==,
<, etc).
- Support for the C datatypes char and int, as
well as implicit type conversion between the two (warnings are raised
in situations of potential data loss). int variables are
assumed to be signed, and char variables are assumed to be
unsigned (this is not a violation of the ANSI C standard).
- Control flow elements including while and for
loops, if/then/else conditionals, and recursion.
- Support for the C keywords extern for functions and
variables, and static for functions.
- Pointers, including pointer dereferencing (the *
operator), multiple levels of indirection (double pointers, triple
pointers, etc), array indexing notation, and the address-of
(&) operator.
What went right
- The different stages of compilation are encapsulated in visitor
classes, which (in my opinion) makes the code quite readable, and
also made writing the compiler a lot easier. The yacc rules merely
generate the abstract syntax tree and visitors do the rest.
- The code generator is also a visitor, which makes the process
very modular; for instance, although this compiler doesn't generate
intermediate code (which is what most compilers that compile for
different architectures use), one could simply write, say, a SPARC
code generation visitor and run the AST through it to generate
assembly for that architecture. This separation also means that the
rest of the compiler is independent of machine architecture.
- Writing the compiler in Python allowed me to focus entirely on the
task at hand (compilation), without being distracted by issues of
memory management and low-level data structure creation. Using such a
high-level language also made reading and refactoring the code a lot
easier.
What went wrong
Examples
foo.c - Example C source file that uses
most of the language features of the compiler.
foo.ast - Printout of the compiler's
abstract syntax tree for foo.c after all passes of
compilation have been completed.
foo.s - Annotated x86 assembly output of
foo.c.
Other Notes
This software has been tested using Python 2.2 and Python 2.3 under
Windows and Linux.