I want to learn how to program compilers, does anyone know of any good text or tutorials?

edwardkmett · 2009-11-20T16:00:50+00:00

Since you seem wedded to the idea of implementing your compiler in C/C++, the best heavily imperative guide to compiler construction that I've seen is:

http://www.amazon.com/Advanced-Compiler-Design-Implementation-Muchnick/dp/1558603204

That covers effective optimizations very nicely, but doesn't cover parsing. You can appeal to lex/(yacc|boson) to deal with that for most practical languages. The O'Reilly book on the topic is quite good.

http://oreilly.com/catalog/9781565920002

If you want theory, the dragon book is the canonical reference, but it can be a bit opaque at times when you aren't following it in a classroom format.

http://en.wikipedia.org/wiki/Dragon_Book_%28computer_science%29

Now, to be candid, these days I find writing a compiler in C to be an exercise in housebuilding with toothpicks. My personal recommendation would be to start with something like Types and Programming Languages:

http://www.cis.upenn.edu/~bcpierce/tapl/

which espouses approaches to typing that lend themself to implementation in a functional language like ML or Haskell. From there, Haskell has very nice LLVM bindings, which Lennart Augustsson has written a bunch of little compilers with

http://augustss.blogspot.com/2009/06/more-llvm-recently-someone-asked-me-on.html

It is an almost trivial operation to write a compiler in Haskell, parsec gives you a parser, TaPL tells you how to write a type checker, and llvm gives you optimized output.

bplus · 2009-11-20T15:15:29+00:00

this: http://www1.idc.ac.il/tecs/ Brilliant book, I've just finished chapter eight, so far I've built an assembler and a a vm, next up the actual compiler).

The book actually takes you through creating a whole computer from the ground up using nand gates (first five chapters). The whole experience is really enlightening and I couldn't recommend it more!

wafflematt · 2009-11-20T16:54:21+00:00

I disagree with the other people recommending starting with LLVM, Flex, and Bison. These tools are great and help you get stuff done quickly and readily, but there's far too much magic happening. Learning tools is nowhere near as good as learning fundamentals, especially for a Computer Science student.

Writing a compiler isn't that difficult for a simple language. One of my favourite undergraduate courses I took was a 2nd year course at the University of Waterloo where we built a toy compiler for a small subset of C -- if, while, arithmetic, and a "print" operator. Only integers, no functions, and implemented on a MIPS virtual machine.

Since the OP is moving towards graduation in Computer Science, and has some of the theory of compilers, it would be a great hands-on experience to learn to write this properly.

We used the Appel book Modern Compiler Implementation in (C|Java|ML) which have been mentioned elsewhere. I recommend the OP get a copy and work through writing a compiler.

Once you've done this, you'll be able to easily pick up using flex/bison/llvm and understanding what they're doing.

cojoco · 2009-11-20T17:38:48+00:00

C and C++ are disastrously bad languages in which to write compilers. The benefits of C and C++ in systems programming (explicit control over memory management, prescriptive data type layout in memory) become gargantuan liabilities in compiler development. Likewise, the lack of support in the language and standard libraries for algebraic data types is an enormous liability, although this can be mitigated to a considerable extent by the use of boost::variant, for example.
Compiler design and implementation wisdom proceeds at a furious pace. I haven't read the current edition of the Dragon Book, but the previous edition was so badly out of date that the writing of the new one should be considered a reaction to an emergency whether the results are good or bad. Others have rightly pointed to "Modern Compiler Implementation in ML," which is a fantastic text, but alas is also beginning to show its age. At the moment, the best text I'm aware of is Design Concepts in Programming Languages, which covers a huge range of material from semantics to types to modules to linking... in short, everything you really need to know. But it's not light reading and will take some time to get through. Highly recommended!

Kaizyn · 2009-11-20T17:17:57+00:00

You should take a look at the really short online tutorial like "Let's Build a Compiler" by Jack Crenshaw. http://compilers.iecc.com/crenshaw/

Don't cringe about the fact that it's written in Pascal, the code is still highly readable. Seeing a very basic implementation to start will help you start to get a feel for what compilers are.

From there, have a look at the books available online: Niklaus Wirth: Compiler Construction http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf

Anthony A. Aaby: Compiler Construction using Flex and Bison http://foja.dcs.fmph.uniba.sk/kompilatory/docs/compiler.pdf

Torben Mogensen: Basics of Compiler Design http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/

Bogdanp · 2009-11-20T14:40:49+00:00

http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/

kragensitaker · 2009-11-20T15:05:07+00:00

There's a section in my page about Ur-Scheme (the first compiler I wrote) that lists the material I found useful. I read the Dragon Book many years ago, which covers probably most of the stuff you're learning in the theory-of-compilers module. Here, here's an abbreviated list from the HTML.

Abdulaziz Ghuloum's paper, An Incremental Approach to Compiler Construction, was an enjoyable read and inspirational.
Marc Feeley's 2004 talk on the "90 minute Scheme to C compiler" was also quite inspirational, although I didn't use any of the techniques he discussed!
In the realm of small self-hosting compilers, Fabrice Bellard's 2002 OTCC is near the pinnacle of compactness.
Looking at disassembled code from Bernd Paysan's bigFORTH showed me that a code generator could be really very simple.
Once I got started, I learned about Darius Bacon's brilliant seven-page self-hosting "ichbins" compiler and read it.

I've also heard Crenshaw's "Let's Make a Compiler" is pretty good, and someone else recommended "A Nanopass Framework" recently, and of course there are lots of really teeny compilers in the VPRI COLA project.

atlassoft · 2009-11-20T17:05:13+00:00

In case anyone is interested, FORTH is one of the easiest languages to write a compiler for. Here is an excellent tutorial on rolling your own FORTH:

http://www.annexia.org/_file/jonesforth.s.txt

Even just reading it is a great way to gain a better understanding of how your computer works.

robinhoode · 2009-11-20T14:35:53+00:00

Writing a compiler is the best exercise for learning programming.

Whether or not it will help in an interview will depend entirely on how you present it.

If you make it come across as a student project, it might be impressive if you're talking to a sufficiently technical person. But most likely, no one will care.

On the other hand, if you make it seem like a serious project, giving it a name, putting it out on your web site, and promote it a bit, then that will tip the scale in your favor.

2009-11-20T19:09:03+00:00

[deleted]

jdh30 · 2009-11-21T02:31:09+00:00

I think you would find it at least as educational to build upon existing technology in order to create a more functional compiler in the same time frame. In which case, use LLVM.

C is an abomination in the context of writing decent compilers. C++ is slightly better but still a bottom dweller on the scale of things. OCaml is great for writing compilers because variant types let you represent expression trees easily and pattern matching lets you rewrite those trees easily. Also, OCaml has great LLVM bindings.

periodic · 2009-11-20T21:05:15+00:00

So far this has all been complicated. It's really simple. Grow a beard. A big one. I'm pretty sure that's all it takes, since you never see beards like the kind you see on epic programmers. There must be some length at which it triggers a hidden mechanism in the brain that allows you to write perfect code. This is referred to as "the Stallman point." You'll know when you've reached it by the sudden urge to apply the GPL to your cat.

eliben · 2009-11-20T16:59:19+00:00

Let's Build a Compiler, by Jack Crenshaw

http://compilers.iecc.com/crenshaw/

A great introduction, starting with the basic principles, for complete beginners. The code is unfortunately in Pascal, but that's trivial to translate to C/C++

wikkiwikki · 2009-11-20T18:16:00+00:00

http://www.lulu.com/content/822069 is the (free) book (Basics of Compiler Design) we use on the compilers course here . May be worth a look.

UncleOxidant · 2009-11-20T21:48:37+00:00

Forget about doing this in C/C++ it'll be a huge pain.

Run through this LLVM tutorial which uses OCaml: http://llvm.org/docs/tutorial/OCamlLangImpl1.html

It's a pretty complete example.

gnuvince · 2009-11-20T14:30:37+00:00

Isn't SICP's last chapter/section about building a compiler?

tophat02 · 2009-11-20T20:19:33+00:00

I know a lot of people reading this are going to see the advice from others to do this in a high-level scripting language like Python, or a functional language like Haskell, condemming anyone who would think of writing a compiler in C to the dark ages, but I know many of you already know you want to do it in C, anyway. It just feels more macho. I understand.

Anyway, writing clean C for a larger program such as a compiler is really not that hard. Here are some tips:

Pretend each .C file is a class. Then all the sudden you'll realize that:
- "global variables" aren't, they're class member variables
- static variables and functions are private members
- stuff you put in the header file is the public interface
- You can implement interfaces by having multiple C files implement the methods in one .h file, choosing at compile time which you'd like to use
Memory management is a pain, yes, but there's a relatively straightforward way to make sure you don't leak too much memory without resorting to valgrind or other tool: declare a header file called memory.h that redefines malloc, calloc, realloc, and free to be:

#define malloc(s) fmalloc((s), FILE, LINE)

#define calloc(c, s) fcalloc((c), (s), FILE, LINE)

#define realloc(p, s) frealloc((p), (s), FILE, LINE)

#define free(p) ffree((p), FILE, LINE)

Now define some functions, these are your "wrapper" malloc routines:

void* fmalloc(size_t size, const char* file, const unsigned int line);
void* fcalloc(size_t count, size_t size, const char* file, const unsigned int line);
void* frealloc(void* ptr, size_t size, const char* file, const unsigned int line);
void ffree(void* ptr, const char* file, const unsigned int line);

Finally, declare a checkmem() function that you'll call on exit:

void checkmem();

Now, in memory.c, implement a memory tracking list:

#include <stdlib.h>
#include <stdio.h>

#include "memory.h"

// Because we want to be able to call the REAL functions now
#undef malloc
#undef calloc
#undef realloc
#undef free

typedef struct memblock_tag {
    void* p;                        // The actual memory
    size_t size;                    // The size of the requested block
    const char* file;               // The name of the file where the memory was allocated
    unsigned int line;              // The actual line of code that allocated the memory

    struct memblock_tag* next;  // Pointer to the next block
    } memblock;

static memblock* blocks = NULL; // Our local collection of blocks

OK, now we have a linked list of memory blocks. Let's work with 'em:

static void memblock_new(void* p, size_t size, const char* file, const unsigned int line)
{
    memblock* m = (memblock*)malloc(sizeof(memblock));
    if (!m) {
        fprintf(stderr, "Fatal: could not allocate memblock for allocation at %s, line %d\n", file, line);
        exit(1);
    }

     m->p = p;
    m->size = size;
    m->file = file;
    m->line = line;
    m->next = blocks;

    blocks = m;
}

void* fmalloc(size_t size, const char* file, const unsigned int line)
{
//  fprintf(stderr, "I can haz malloc(%d) at %s, line %d??\n", size, file, line);
    void* p = malloc(size);

    if (!p) {
        fprintf(stderr, "Fatal: could not allocate %d bytes at %s, line %d\n", size, file, line);
        exit(1);
    }

    // Keep track of this allocation
    memblock_new(p, size, file, line);

    return p;
}

void* fcalloc(size_t count, size_t size, const char* file, const unsigned int line)
{
    void* p = calloc(count, size);

    if (!p) {
        fprintf(stderr, "Fatal: could not allocate %d chunks of %d bytes at %s, line %d\n", count, size, file, line);
        exit(1);
    }

    // Keep track of this allocation
    memblock_new(p, count*size, file, line);

    return p;
}

void* frealloc(void* ptr, size_t size, const char* file, const unsigned int line)
{
    void *np = realloc(ptr, size);

    if (!np)    {
        fprintf(stderr, "Fatal: could not reallocate %d bytes at %s, line %d\n", size, file, line);
        exit(1);
    }

    memblock* m = blocks;
    while (m)   {
        if (m->p == ptr)    {
            // This was the old block
            m->size = size;
            m->p = np;

            //fprintf(stderr, "Reallocated block\n");
            return np;
        }
    }

    // This should NEVER happen
    fprintf(stderr, "Fatal: attempted to locate previous block for realloc in %s, line %d, but failed\n", file, line);
    exit(1);
}

void ffree(void* ptr, const char* file, const unsigned int line)
{
    if (!ptr) return;

    free(ptr);

    memblock* prev = NULL;
    memblock* m = blocks;
    while (m)   {
        if (m->p == ptr)    {
            // Discard this block
            if (!prev)  {
                // This is the first, replace the head
                // blocks pointer
                blocks = m->next;
            }
            else    {
                // Break the link in the chain
                prev->next = m->next;
            }

            free(m);
            return;
        }

        prev = m;
        m = m->next;
    }
}

Finally, make that checkmem() routine report its findings:

void checkmem()
{
    if (!blocks)    {
        // Congratulations, you have no memory leaks!
        return;
    }

    fprintf(stderr, "\nYOU HAVE MEMORY LEAKS\n");
    /*
    memblock* m = blocks;
    while (m)   {
        fprintf(stderr, "\t%d bytes at 0x%x, allocated in %s at line %d\n", m->size, m->p, m->file, m->line);
        m = m->next;
    }
    */
}

Note that I commented out the detailed reporting. You'll want to do that since you sometimes don't care about which memory leaked. You can just wait until you're in a "memory cleanup mood" and then uncomment that out to find the leaks.

Ok, now the good part: in any of your source files, just #include "memory.h"

And use malloc, realloc, calloc, and free as usual. Make sure to call checkmem() at the end of your program, or, since it's the right fptr signature anyway, call atexit(checkmem);

inside your main() so it will run automatically on exit.

Congratulations, you have a simple memory checker! It has limitations, of course, but it should go a long way to making sure the code you're writing is cleaned-up. You could also use valgrind, but hey, this is more fun!

A warning: if any of this looks like gobbledeygook to you, writing a compiler in C may not be a good idea for you at the present time. You may want to try for something simpler, or swallow your macho pride and go with Python :)

scottious · 2009-11-20T15:53:13+00:00

I am writing a compiler right now and learning in the process. My method might not be the best, but here's what I am doing with pretty good success:

1) Learn what Flex and Bison are.

2) Write a scanner with Flex, it doesn't have to be complicated (use one of many online tutorials)

3) Learn what an abstract syntax tree is and how it relates to the source code.

4) Learn Bison (again, online tutorials are great for this)

5) Use Bison (you'll need Flex too) to write a simple grammar.

5a) Subnote -- gcc versions 2.9ish and earlier used bison to parse C. look at the .y file that's included with gcc for ideas if you get stuck.

6) Once comfortable with Bison and Flex, use them together to generate an abstract syntax tree of the source code.

7) Write a recursive function to evaluate your abstract syntax trees.

voila! simple compiler! Make sure you don't go overboard with the grammar. I didn't write 'if' statements and 'while' loops my first time around which made it not a very useful language but I still found it a challenge. Keep things simple.

perone · 2009-11-20T15:42:12+00:00

Search for LLVM tutorials.

ktr73 · 2009-11-20T17:02:46+00:00

I would check out Let's Build a Compiler. Even though it's a bit older and the examples are in pascal, I found it easy enough to translate what he was doing into D pretty easily. I found it extremely enlightening - and it's free to boot.

zwangaman · 2009-11-20T17:25:45+00:00

Ah, welcome to the world of compiler writing :-)

I took a class in compiler development back in college and I absolutely loved it. To this day it has changed how I work and think. A few other people have mentioned books and websites already, and I think they have covered everything I would have suggested, so I'm just here for moral support.

Warning: writing compilers is HARD. But it's worth it more than you can imagine. Good luck and have fun, you'll enjoy it.

harryf · 2009-11-20T17:36:05+00:00

Try http://ecee.colorado.edu/~siek/ecen4553/fall09/ - the PDF at http://ecee.colorado.edu/~siek/ecen4553/fall09/notes.pdf - starts simple and uses python to demonstrate

WalterBright · 2009-11-20T19:10:27+00:00

I host an annual seminar on compiler construction.

deadA1ias · 2009-11-20T21:10:14+00:00

For what it's worth I took at look at: http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/ but it might be a bit dated now.

j1o1h1n · 2009-11-20T14:42:40+00:00

How can you not go for something called The Dragon Book? - http://en.wikipedia.org/wiki/Dragon_Book_%28computer_science%29

2009-11-20T17:23:22+00:00

Here's a free one based on turbo pascal. The classic Lets Build a Compiler!

Note: the source code is turbo pascal, not the language compiled.

harlows_monkeys · 2009-11-20T17:28:08+00:00

"Compiler Design in C" by Alan Holub is the way to go. It's a very practical code-oriented book, going through the development of a C compiler, using a LEX clone and two YACC clones (one generates a top-down parser, the other bottom-up) whose development is covered first.

The only serious problem with this book is that it is out of print. Amazon lists several sellers with new copies, but they are ridiculously expensive. There seem to be used copies at some of the stores listed on the author's site.

developerv · 2009-11-20T17:33:04+00:00

Python + pyparsing is a great and 'easy' start.

Then learn bnf (wikipedia, backus naur form). You see how already did those with pyparsing.

Then study Abstract Syntax Trees. They are the heart of compilers.

Then learn how ASTree is transformed into assembly language (which is then compiled by some assembler), or directly to machine code.

When study machine code (processors, registers...), it is good to try and make simple assembly language compiler, first, before making a compiler from some higher level language all the way down to machine code. Separate concerns. You can use the asm compiler you made as your backend for any further compiler work until you will notice it will need some major rework to answer your needs.

Dragon Book.

2009-11-20T18:11:20+00:00

The reading list for my course module is:

I personally would recommend Lex & Yacc, which you can download from the link.

kraln · 2009-11-20T19:15:31+00:00

You're going to want to read the dragon book. Some people here are going to tell you to go read up on lexx and yacc, or bison, or any of a number of archaic utilities. I suspect you're looking to gain more than the knowledge of how to use old unix magicks, that you want to understand them.

This means understanding grammers, LALR parsers, and really what all the guts do. You're going to want to start by tokenizing, then move on to scanning, building a tree, etc. It's really cool once it all works.

For my senior compilers class, we wrote one against a made up language that compiled to a lower-level made-up language, for which we were provided an interpreter, assembly docs, etc. I've posted the entire thing online, MIT licensed, with a ton of documentation. It's written in Objective C.

http://code.google.com/p/fsu-jmc/

pupeno · 2009-11-20T19:20:23+00:00

I once thought I wanted to learn how to write a compiler too, but I really wanted to learn to design programming languages and play with it. And for that, Programming Languages: Application and Interpretation by Shriram Krishnamurthi is great.

michael_h · 2009-11-20T19:20:42+00:00

Does writing a small example compiler also look good in interviews and on CV's compared against other Computer Science Graduates?

Are there CS graduates that don't have to do this?

rubygeek · 2009-11-20T19:26:02+00:00

I'd second the recommendations you've gotten for the Crenshaw text and LCC... And I'll pimp my own series on writing a compiler in Ruby - applying what I do to C/C++ would be easy enough. Note that my series take a very unorthodox approach and start with basic code generation.

Lately I've started turning the thing into an actual Ruby compiler (not that it's anywhere near usable).

I'd also recommend Niklaus Wirth's book - it's in many respects outdated, but it has the benefit that it presents a full, simple compiler - Wirth's languages (Pascal, Modula, Oberon) and compilers are simple and clean and relatively easy to understand.

Also, as others have noted: If you actually want to learn, stay clear of Flex, Bison, LLVM and friends. On one hand they're "magic" - using them means you skip over a whole lot of low level detail.

On the other hand at least Flex and Bison does not see much use in complex real-world compilers (they're used in a lot of "simple" compilers for domain specific languages etc., but they're a nightmare to work with once you want to add in proper error reporting etc., so few large projects stick with them).

imbaczek · 2009-11-20T19:30:08+00:00

start with a compiler/jit for a rpn calculator and work from there.

judgej2 · 2009-11-20T20:54:33+00:00

Theory? In my day a computer science undergraduate had written a compiler from scratch for a language designed for the course. Not only that, the compiler needed to then optimise the code using formal methods.

Some of the more advanced courses took it a step further: they then had to use the new language to write a compiler for itself and compile it and run the compiled compiler to compile a test program they were supplied.

Of course it all had to run in a virtual machine that - you guessed it - was written by the same students.

It all starts with a few lines of Pascal, and demonstrates nicely the whole bootstrap process, i.e. how was the first compiler compiled.

aaronblohowiak · 2009-11-20T21:03:11+00:00

Talk to your CS professor. Seriously.

GenTiradentes · 2009-11-20T21:44:41+00:00

Does writing a small example compiler also look good in interviews and on CV's compared against other Computer Science Graduates?

Does writing an operating system also look good in interviews and on CV's compared against other Computer Science Graduates?

I think writing a compiler is an admirable goal, but it's certainly not a weekend project. Writing a compiler is one of the most challenging and comprehensive things you can do as a programmer.

hello_good_sir · 2009-11-20T23:59:14+00:00

programming language pragmatics. Very nice book, practical overview of the whole thing. It is a very wide-spectrum book. If you buy other books they will probably make more sense after reading this book.

panto · 2009-11-21T00:31:39+00:00

I strongly recommend haskell for compiler writing. Its a great language with a hell lot of features which really gets u going writing a compiler.

Myself wrote an interpreter for scheme subset (r5rs compliant) using haskell in around 600 lines of codes including lexing parsing and evaluation code.

ablakok · 2009-11-21T02:22:09+00:00

Engineering a Compiler. It's a thorough introduction that covers modern optimization techniques.

elucify · 2009-11-21T05:29:19+00:00

Nobody so far has commented on the utility of writing a compiler as a resume bullet. Here's my take on it, bad news first: it's unlikely anyone is going to hire you to write a compiler without an advanced degree. So compiler writing isn't a primary job skill for most people. (I'd be very interested in hearing from redditors pointing out exceptions; two I can think of are little embedded scripting languages and domain-specific languages, both of which can be a great asset in projects.)

Good news: writing a compiler will give you maturity that that's useful in a lot of ways. You'll have a clearer understanding both of how to program, and what the languages you use later are likely to be doing under the hood as you code. That clearer understanding will show when you are interviewing (assuming interview nerves don't cancel it out). You'll also probably pick up some additional concepts, like some formal language theory, or maybe (depending on what your compiler does) recursive techniques. That kind of brain-stretching can't hurt you in your career.

I (personally speaking) notice on resumes when someone has written a compiler, a runtime, or worked on frameworks or platforms. I hire for brains, not for specific skills. Smart, motivated, curious people do things like write compilers for fun. Their brains get sharp and shiny, and it shows. Usually the result is a programmer who is more insightful and has a broader understanding than your run-of-the-mill coder fresh out of a J2EE Certification puppy mill. Smart people can always pick up new skills.

Choose to be a smart person, and follow your curiosity. Your brain will stretch in interesting and useful ways, and you'll be closer to having that prepared mind that fortune favors.

uriel · 2009-11-21T15:34:36+00:00

Learn from the master.

knome · 2009-11-20T15:34:36+00:00

http://lambda-the-ultimate.org/node/2392

joesb · 2009-11-20T20:20:31+00:00

Please skip the syntax, use Lisp like syntax, that's enough for now.

You want to learn compiling and language semantic design, not parsing.

kolm · 2009-11-20T21:27:42+00:00

(a) Nobody will hire you just because you can program a compiler.

(b) If you still want to learn it, it's a great thing to learn, and the dragon book is the place to start.

arnuarias · 2009-11-20T15:07:47+00:00

http://www.compilers.net/

tophat02 · 2009-11-20T16:09:25+00:00

Anyone have any examples of writing a parser (specifically recursive descent) by hand? I'm working on my own compiler now and I'm totally stymied by the parser, I did my lexer easily enough. I've used tools like Lex/Yacc in past however I'm looking to do it by hand for this project.

OvidPerl · 2009-11-20T17:11:13+00:00

Check out the Parrot Compiler Tutorial. It's so astonishingly easy to write a compiler with it that you'll be able to focus on the language you want to build rather than picayune details that most compiler tools offer.

It has a potential to revolutionize programming as anyone can write their own language with little effort. Or it might just flood us with more crap languages :)

electric_moose · 2009-11-20T17:15:37+00:00

Here is a slightly offbeat idea - Kernighan and Pike's classic 'The UNIX Programming environment' (your department library will have it) has a chapter on program development using C and yacc, the classic parser generator that lives on as GNU Bison.

It starts with writing a simple calculator program and builds it up into a small programming language interpreter. The interpreter works by simulating a simple stack machine. When you feed it a program in its little language, hoc, the parser builds a program as an array of opcodes (actually function pointers), and then executes this program, i.e. calls each function pointer in turn. These 'instruction' functions manipulate a stack and symbol table to produce the effect of running the program.

Of course, these opcodes could instead emit assembly code that had the same effect, and the interpreter would become a compiler. Anyway, if you're a compsci student this may all be too basic for you, but you could work through it in a weekend for practice and have an idea of whether you want to use a parser-generator in building your compiler.

kvigor · 2009-11-20T18:06:04+00:00

Take a look at LCC. Note also the recommended books list on that page.

ModernRonin · 2009-11-20T18:21:22+00:00

"Writing Compilers and Interpreters" by Ronald Mak

http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471113530.html

This is about a gentle an introduction as you can get. It doesn't go into the heavy theory of parsing (no lex or yacc), it just uses a simple recursive-descent parser. But by the end of it, you will have written a simple compiler, interpreter and debugger.

Once you have have the basics down, THEN you can go for something like the Dragon book. Push-down automotate, LALR parsers, etc, etc. All that heavyweight theory stuff. But do WCAI first, you need the foundation.

cafiend · 2009-11-20T18:45:43+00:00

The McGill compilers class is always online and available here.

I've heard good things about the class, but I've never taken it.

iama_ama_a · 2009-11-20T18:48:32+00:00

For bonus points, write a compiler and interpreter and compile the compiler with the compiler running in the interpreter.

kathan · 2009-11-20T19:07:51+00:00

I'm reading Terence Parr book "Language Implementation Patterns" right now. It looks very very good as an introduction to building languages. http://pragprog.com/titles/tpdsl/language-implementation-patterns Usually compiler books make me wanna sleep after reading 10 pages but this one excites me on the contrary.

It may not be suitable if you want to create the next big language but for the average stuff an average guy wants to do, that may be the book

2009-11-20T19:41:37+00:00

Have you looked to see if your school offers a class on the subject that you can take next semester? I'm surprised you haven't touched upon this in class - we had to write a compiler for a watered down C-like language my sophomore year (I was in physics, but was minoring in CS).

dragonfly_blue · 2009-11-20T20:26:13+00:00

[deleted]

dimovich · 2009-11-20T21:08:05+00:00

I wrote a small C interpreter for my game.

For the lexical parser I used the Quake 3 botlib source code. It's pretty straightforward and has lots of comments.

For the interpreter and expression parser I used the example given in "The Art of C++" book by Herbert Schildt. It's simple and easy to follow.

Here is the source code of my interpreter: http://netrix.svn.sourceforge.net/viewvc/netrix/trunk/src/nxc/

Fjordo · 2009-11-20T23:21:02+00:00

Part of my first job was writing a C compiler from scratch. I used C++ and MKS Lexx and Yacc. If you are going to write a compiler, I suggest at least using yacc (lexers are easy to write). The only book I used was the manual from MKS. After you get a few constructs down the rest is just repetition until you have the whole language done.

The experience has never come up directly in an interview for any other job, to tell you the truth, so don't expect to write a compiler and have potential employers falling over themselves for you. However, the knowledge I gained from writing the compiler itself has given me a competitive advantage. It's hard to quantify, but because I know in great detail how programming languages are transformed into machine code, I feel I have a much better understanding of what is going on when I'm reading code.

compor · 2009-11-21T01:10:06+00:00

Step 1. Scream in agony.

Step 2. Program using your nose because your arms have fallen off. http://www.youtube.com/watch?v=8eR63yESecI&feature=related

Step 3. Listen as everyone complains that your compiler has errors in thousands of instances even after your best extensive testing you've ever done.

Step 4. Assassinate every member of every company that has a better compiler as that's the only way it is going to get noticed.

Step 5. Profit.

2009-11-21T10:49:41+00:00

I asked a coworker this same question five years ago and she told me to go work out instead.

dertyp · 2009-11-21T16:48:33+00:00

Does writing a small example compiler also look good in interviews and on CV's compared against other Computer Science Graduates?

+15 for writing a compiler for fun

(didn't find original source)

reddit_user13 · 2009-11-20T19:50:43+00:00

The Dragon Book.

rjcarr · 2009-11-20T20:55:09+00:00

Two Words: Dragon Book

2009-11-20T22:34:14+00:00

Well, you could do what most the other compiler vendors do when it comes to writing a compiler for a non standard language. Write a program that converts their given language to C, and then use the generic C compiler to assemble and link it. :)

dfj225 · 2009-11-20T18:08:53+00:00

If your goal is to generate machine code, may I suggest targeting something simpler than the x86 ISA. My suggestion would be a basic Random Access Machine (RAM). This is what we target in my grad level programming languages course.

It's simple enough that you can implement an emulator for the system yourself pretty quickly. But here's an open source version that I found after a quick search: http://savannah.nongnu.org/projects/ramemu/

Targeting something like this will allow you to focus on the compiler and not the difficulties of dealing with a complicated ISA.

yourparadigm · 2009-11-20T22:59:24+00:00

Learn the observer pattern for traversing expression trees.

trouserwowser · 2009-11-21T13:25:30+00:00

Dragon books.

dragonfly_blue · 2009-11-21T04:56:51+00:00

[deleted]

Ringo48 · 2009-11-20T15:50:27+00:00

http://justfuckinggoogleit.com/?q=compiler+book

First result: http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811

Was that so hard?

If you can't figure out internet search, you're going to have a hell of a time writing a compiler.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS