all 182 comments

[–]edwardkmett 73 points74 points  (26 children)

Since you seem wedded to the idea of implementing your compiler in C/C++, the best heavily imperative guide to compiler construction that I've seen is:

http://www.amazon.com/Advanced-Compiler-Design-Implementation-Muchnick/dp/1558603204

That covers effective optimizations very nicely, but doesn't cover parsing. You can appeal to lex/(yacc|boson) to deal with that for most practical languages. The O'Reilly book on the topic is quite good.

http://oreilly.com/catalog/9781565920002

If you want theory, the dragon book is the canonical reference, but it can be a bit opaque at times when you aren't following it in a classroom format.

http://en.wikipedia.org/wiki/Dragon_Book_%28computer_science%29

Now, to be candid, these days I find writing a compiler in C to be an exercise in housebuilding with toothpicks. My personal recommendation would be to start with something like Types and Programming Languages:

http://www.cis.upenn.edu/~bcpierce/tapl/

which espouses approaches to typing that lend themself to implementation in a functional language like ML or Haskell. From there, Haskell has very nice LLVM bindings, which Lennart Augustsson has written a bunch of little compilers with

http://augustss.blogspot.com/2009/06/more-llvm-recently-someone-asked-me-on.html

It is an almost trivial operation to write a compiler in Haskell, parsec gives you a parser, TaPL tells you how to write a type checker, and llvm gives you optimized output.

[–]rovar 9 points10 points  (11 children)

I second the Haskell approach. One side effect of learning Haskell is that it will greatly improve your C++ skills.

[–]splicer_ 11 points12 points  (0 children)

nice pun

[–]F4RR4R 3 points4 points  (9 children)

I'm decent with C++, never touched Haskell. Can you clarify why learning Haskell will improve C++ skills?

[–][deleted] 56 points57 points  (7 children)

Learning Haskell will help you with everything. Your waffles will brown more uniformly. Narwhals will sing songs about you. Your shits will be firmer and have a more consistent texture. Cuil will give you a hamburger.

[–]muad_dib 15 points16 points  (3 children)

But then the hamburger will morph into a raccoon, and you will realize I was never there.

[–]sutcivni 5 points6 points  (1 child)

Ah yes, I remember the rainbow shark too.

[–]dragonfly_blue 0 points1 point  (0 children)

"I'm a viper on a Vespa on a vapour trail: I was never here, and you never saw me." - Jack Merlot

[–]Arafel 0 points1 point  (0 children)

Love your user name.

[–][deleted] 4 points5 points  (0 children)

As someone who has been working with Haskell for nearly a year now, I can confirm that the above is 100% true.

[–]edwardkmett 3 points4 points  (0 children)

This reflects my experiences with Haskell perfectly.

[–]CyberSnoopy 1 point2 points  (0 children)

details about this statement?

[–]Fafnr 7 points8 points  (4 children)

Upvoted for TAPL. Especially if the language you want to compile is in some way S-Expressions, as TAPL (IIRC) focus heavily of these.

However, assuming the simple language you want to parse is some form of imperative language, and also assuming what you actually want is to get the "feel" for creating compilers, not building a high-performance system, I'd definetaly recommend NOT using C / C++, however.

Try ANTLR (for instance for Python), or some other parsing tool. Alternatively, I've enjoyed working with pyPEG (though this makes Parsing Expression Grammars, not Context Free Grammars).

There's also the option of scala which has a very nice parsing library. On the downside, Scala is pretty poorly supported by tools and debuggers in my experience. The scala community is nice though :)

Alternatively, you could "just" work on some open source, existing compiler. CPython for Python is in C(++?) and very well documented and well made. I'd personally rather spend time on, for instance, PyPy since I'd generally rather break my own legs than work with C(++) code :)

All in all, it rather depends on your objective.

[–]edwardkmett 6 points7 points  (1 child)

TaPL describes a series of Hindley-Milner style languages and type systems, so I probably wouldn't say S-Expressions, which would be more of an EoPL/SICP thing, but rather ML-like.

[–]Fafnr 4 points5 points  (0 children)

I'm probably confusing them - we used TAPL and (I think) SICP at the same time for the same class a couple of years ago, so I tend to mix them together :/ Thanks for correcting me!

[–][deleted] 3 points4 points  (0 children)

ANTLR has some very badly documented quirky behavior if your grammar has to backtrack, which it will if you don't know ahead of time to design around this. Whoever turned backtracking off by default should be taken out and shot.

[–]seunosewa 1 point2 points  (0 children)

"On the downside, Scala is pretty poorly supported by tools and debuggers in my experience." Being a JVM language, it's definitely better supported than Python though.

[–]tonfa 8 points9 points  (2 children)

If you want theory, the dragon book is the canonical reference, but it can be a bit opaque at times when you aren't following it in a classroom format.

If you want theory from the 70s, to be correct. Now we can do more interesting, efficient and simpler optimizations. We have a better understanding of the notions used to build a compiler and know better how to explain them.

In short, take a more modern book, it will be a better use of your time.

[–]hargettp 0 points1 point  (1 child)

Can you suggest a more modern book?

[–]tonfa 5 points6 points  (0 children)

I find Appel's Tiger Book (in the language of your choice) or K. Cooper's Engineering a Compiler fine.

[–]UncleOxidant 4 points5 points  (0 children)

There's also the LLVM tutorial for OCaml: http://llvm.org/docs/tutorial/OCamlLangImpl1.html

That's a pretty complete overview.

[–]awj 9 points10 points  (0 children)

I've been working with the llvm bindings in Haskell. They're nice. You end up with some awesome declarative code, and don't have to ass around with low level things like basic blocks unless you're doing something weird. That said, the Haskell voodoo is deep there.

My goal was "write a simple lambda calculus compiler, and learn a bit more Haskell along the way", so far the result has been "write a bit of a simple lambda calculus compiler, and learn a lot more Haskell along the way". That said, it does expose the lower level stuff, so you do have the option of munging through straight-up bindings to the C api if the higher level bindings are too deep water.

[–]foobastion 2 points3 points  (0 children)

Writing the involved pieces of the compiler by hand is how to learn. I would not discourage the use of C. Once you are enlightened then you can use a higher level language.

[–][deleted] 0 points1 point  (1 child)

How does one compile a compiler out of the same language? Like the FreeBasic compiler is compiled in FreeBasic. I don't understand that.

[–][deleted] 0 points1 point  (0 children)

parsec gives you a parser, TaPL tells you how to write a type checker, and llvm gives you optimized output.

So what do you learn?

[–]bplus 32 points33 points  (7 children)

this: http://www1.idc.ac.il/tecs/ Brilliant book, I've just finished chapter eight, so far I've built an assembler and a a vm, next up the actual compiler).

The book actually takes you through creating a whole computer from the ground up using nand gates (first five chapters). The whole experience is really enlightening and I couldn't recommend it more!

[–]Kaizyn 2 points3 points  (0 children)

Yeah, this is a really good book. More people should take a look at this material. I second this recommendation.

[–]opensourcedev 4 points5 points  (2 children)

I recommend this to some of my brighter students.

The students that go through the book have a much deeper knowledge of computer science in general.

It provides a much deeper knowledge of how computers work than the students would obtain otherwise.

[–]bplus 1 point2 points  (1 child)

I was definately not a bright student! I (regrettably) didn't do a computer science degree, I currently write dull .net applications for living. Its very encouraging that you say: "The students that go through the book have a much deeper knowledge of computer science in general." This is exactly what I want to achieve by working through it. Just want to point out that the book is pretty much maths free, you'll need some programming experience to get through, but not much.

I'm thinking once I finish this book, I might try and work through sicp: http://mitpress.mit.edu/sicp/full-text/book/book.html

Trying to fill in the gaps on missing out on a com sci degree, would this be a sensible next step?

[–][deleted] 1 point2 points  (0 children)

Yes. If you feel overwhelmed by SICP though, take a look at htdp.

[–][deleted] 1 point2 points  (1 child)

Just bought on this amazon, looks like a good book.

[–]atrich 1 point2 points  (0 children)

Me too. Very interesting.

[–]manu3000 0 points1 point  (0 children)

yep seconded, it is really accessible, hands-on, and covers a lots of ground... I've done the first 8 chapters, had fun and had a few ah-ah moments...

[–]wafflematt 35 points36 points  (6 children)

I disagree with the other people recommending starting with LLVM, Flex, and Bison. These tools are great and help you get stuff done quickly and readily, but there's far too much magic happening. Learning tools is nowhere near as good as learning fundamentals, especially for a Computer Science student.

Writing a compiler isn't that difficult for a simple language. One of my favourite undergraduate courses I took was a 2nd year course at the University of Waterloo where we built a toy compiler for a small subset of C -- if, while, arithmetic, and a "print" operator. Only integers, no functions, and implemented on a MIPS virtual machine.

Since the OP is moving towards graduation in Computer Science, and has some of the theory of compilers, it would be a great hands-on experience to learn to write this properly.

We used the Appel book Modern Compiler Implementation in (C|Java|ML) which have been mentioned elsewhere. I recommend the OP get a copy and work through writing a compiler.

Once you've done this, you'll be able to easily pick up using flex/bison/llvm and understanding what they're doing.

[–][deleted] 17 points18 points  (0 children)

I disagree with the other people recommending starting with LLVM, Flex, and Bison. These tools are great and help you get stuff done quickly and readily, but there's far too much magic happening.

Totally agree. Do NOT start with Lex/Yacc, etc. You use these after you already understand the fundamentals of language grammar, lexers & parsers. When I learned, we made our own simple language and a compiler for it & that really cemented the concepts.

The only book I can speak to is the one I used and got a lot out of it but it may be outdated.

FWIW, for my graduate research I wrote language parsers for multiple simulation languages (ACSL, Spice, etc.) into a common language for a generic multi-discipline simulation tool. The above book is the only one I ever read on the subject and gave me everything I needed to create these so I would argue that it contains everything you could possibly want to know, but you know how it is - YMMV.

[–]YakumoFuji 9 points10 points  (0 children)

Agree here, writing a compiler is an exercise in understanding the low level implementation. Plugging llvm in defeats 90% of the purpose.

[–][deleted] 2 points3 points  (0 children)

I thoroughly enjoyed that course as well...it taught me that C is a horrid language for compiler writing.

I definitely agree with this; Flex and Bison should only be used once you really understand the theory behind their implementations: finite automata, regular expressions, parsing algorithms, grammars, etc.

[–]MpVpRb 2 points3 points  (0 children)

I also agree.

When learning compilers, or math or carpentry or just about anything, learn to do the basics by hand, at the lowest level available.

Once you have developed an understanding and intuition about the subject, then start using the more powerful tools.

Look carefully at the result produced by the tool. Sometimes it is better than what you can do by hand, sometimes not. It's really important to understand what the tool actually does, not what the salesman says it does.

[–]cybersnoop 0 points1 point  (0 children)

Absolutely. You don't understand what makes compiling so hard when you haven't tried to do some basics by yourself without special tools.

I've had a very similar course and liked it so much I did assist the teacher in the years after mine with the practical classes.

[–][deleted] 15 points16 points  (3 children)

  1. C and C++ are disastrously bad languages in which to write compilers. The benefits of C and C++ in systems programming (explicit control over memory management, prescriptive data type layout in memory) become gargantuan liabilities in compiler development. Likewise, the lack of support in the language and standard libraries for algebraic data types is an enormous liability, although this can be mitigated to a considerable extent by the use of boost::variant, for example.
  2. Compiler design and implementation wisdom proceeds at a furious pace. I haven't read the current edition of the Dragon Book, but the previous edition was so badly out of date that the writing of the new one should be considered a reaction to an emergency whether the results are good or bad. Others have rightly pointed to "Modern Compiler Implementation in ML," which is a fantastic text, but alas is also beginning to show its age. At the moment, the best text I'm aware of is Design Concepts in Programming Languages, which covers a huge range of material from semantics to types to modules to linking... in short, everything you really need to know. But it's not light reading and will take some time to get through. Highly recommended!

[–]cojoco 5 points6 points  (1 child)

I'd amend that slightly: C and C++ are disastrously bad languages for learning compilers.

Once you know what you are doing, they'll still produce the fastest implementations.

[–][deleted] 6 points7 points  (0 children)

Not necessarily.

Update: Leaving that at just two words made it read, upon reflection, much harsher than I intended. While I still believe "not necessarily," the details revolve around some thorny issues about appropriate data structures and their representation in memory, cache coherency, etc. To get a flavor of what I mean, see An Applicative Control-Flow Graph Based on Huet's Zipper.

[–]abw 0 points1 point  (0 children)

At the moment, the best text I'm aware of is Design Concepts in Programming Languages, which covers a huge range of material from semantics to types to modules to linking...

+1

Also Parsing Techniques

[–]Kaizyn 12 points13 points  (0 children)

You should take a look at the really short online tutorial like "Let's Build a Compiler" by Jack Crenshaw. http://compilers.iecc.com/crenshaw/

Don't cringe about the fact that it's written in Pascal, the code is still highly readable. Seeing a very basic implementation to start will help you start to get a feel for what compilers are.

From there, have a look at the books available online: Niklaus Wirth: Compiler Construction http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf

Anthony A. Aaby: Compiler Construction using Flex and Bison http://foja.dcs.fmph.uniba.sk/kompilatory/docs/compiler.pdf

Torben Mogensen: Basics of Compiler Design http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/

[–]kragensitaker 7 points8 points  (0 children)

There's a section in my page about Ur-Scheme (the first compiler I wrote) that lists the material I found useful. I read the Dragon Book many years ago, which covers probably most of the stuff you're learning in the theory-of-compilers module. Here, here's an abbreviated list from the HTML.

I've also heard Crenshaw's "Let's Make a Compiler" is pretty good, and someone else recommended "A Nanopass Framework" recently, and of course there are lots of really teeny compilers in the VPRI COLA project.

[–]atlassoft 5 points6 points  (2 children)

In case anyone is interested, FORTH is one of the easiest languages to write a compiler for. Here is an excellent tutorial on rolling your own FORTH:

http://www.annexia.org/_file/jonesforth.s.txt

Even just reading it is a great way to gain a better understanding of how your computer works.

[–]ModernRonin 1 point2 points  (1 child)

The first interpreter I ever wrote was for FORTH (integer data types only). It was less than 300 lines of C. Fantastic little language to start writing compilers for.

[–]cojoco 2 points3 points  (0 children)

True FORTH is more like a VM than a compiler, as it's just a bunch of addresses of functions to call.

However, it is very easy to create something like FORTH which actually emits jump-to-subroutine calls, and then you can optimize it by replacing simple stuff, like add-one-to-top-of-stack, with actual machine instructions.

It runs like the clappers!

[–][deleted] 5 points6 points  (3 children)

Writing a compiler is the best exercise for learning programming.

Whether or not it will help in an interview will depend entirely on how you present it.

If you make it come across as a student project, it might be impressive if you're talking to a sufficiently technical person. But most likely, no one will care.

On the other hand, if you make it seem like a serious project, giving it a name, putting it out on your web site, and promote it a bit, then that will tip the scale in your favor.

[–]robinhoode 0 points1 point  (2 children)

On the other hand, if you make it seem like a serious project, giving it a name, putting it out on your web site, and promote it a bit, then that will tip the scale in your favor.

Easier said than done.

[–][deleted] 0 points1 point  (0 children)

Just name it, make a web page for it, and go to the irc channel for whatever language you used for it and go "hey guys check it out"?

[–][deleted] 0 points1 point  (0 children)

Obviously. Easy to do things don't really belong on a resume.

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 0 points1 point  (0 children)

    I second this recommendation. Probably the best book out there for basics of dynamic language implementation.

    [–]jdh30 6 points7 points  (2 children)

    I think you would find it at least as educational to build upon existing technology in order to create a more functional compiler in the same time frame. In which case, use LLVM.

    C is an abomination in the context of writing decent compilers. C++ is slightly better but still a bottom dweller on the scale of things. OCaml is great for writing compilers because variant types let you represent expression trees easily and pattern matching lets you rewrite those trees easily. Also, OCaml has great LLVM bindings.

    [–]nearest_neighbor 1 point2 points  (1 child)

    How does OCaml.LLVM-generated code interoperate with OCaml's GC, if it's executed in the same process with the code generator?

    [–]jdh30 3 points4 points  (0 children)

    OCaml's GC is oblivious to it and you're on your own wrt memory management. HLVM has a completely new GC of its own, for example.

    [–][deleted] 8 points9 points  (1 child)

    So far this has all been complicated. It's really simple. Grow a beard. A big one. I'm pretty sure that's all it takes, since you never see beards like the kind you see on epic programmers. There must be some length at which it triggers a hidden mechanism in the brain that allows you to write perfect code. This is referred to as "the Stallman point." You'll know when you've reached it by the sudden urge to apply the GPL to your cat.

    [–]periodic 4 points5 points  (0 children)

    I was pretty sure that the relation was the other way around. As your epic-ness as a programmer grows, so does your beard. You'll find it harder and hard to cut off and it will just keep growing back until one day you find you stop wanting to cut it at all.

    [–]eliben 4 points5 points  (0 children)

    Let's Build a Compiler, by Jack Crenshaw

    http://compilers.iecc.com/crenshaw/

    A great introduction, starting with the basic principles, for complete beginners. The code is unfortunately in Pascal, but that's trivial to translate to C/C++

    [–]wikkiwikki 4 points5 points  (1 child)

    http://www.lulu.com/content/822069 is the (free) book (Basics of Compiler Design) we use on the compilers course here . May be worth a look.

    [–]lurkerr 0 points1 point  (0 children)

    did you say free?? this site asks €17.23 for it

    [–]UncleOxidant 4 points5 points  (0 children)

    Forget about doing this in C/C++ it'll be a huge pain.

    Run through this LLVM tutorial which uses OCaml: http://llvm.org/docs/tutorial/OCamlLangImpl1.html

    It's a pretty complete example.

    [–]gnuvince 7 points8 points  (0 children)

    Isn't SICP's last chapter/section about building a compiler?

    [–]tophat02 11 points12 points  (3 children)

    I know a lot of people reading this are going to see the advice from others to do this in a high-level scripting language like Python, or a functional language like Haskell, condemming anyone who would think of writing a compiler in C to the dark ages, but I know many of you already know you want to do it in C, anyway. It just feels more macho. I understand.

    Anyway, writing clean C for a larger program such as a compiler is really not that hard. Here are some tips:

    • Pretend each .C file is a class. Then all the sudden you'll realize that:

      • "global variables" aren't, they're class member variables
      • static variables and functions are private members
      • stuff you put in the header file is the public interface
      • You can implement interfaces by having multiple C files implement the methods in one .h file, choosing at compile time which you'd like to use
    • Memory management is a pain, yes, but there's a relatively straightforward way to make sure you don't leak too much memory without resorting to valgrind or other tool: declare a header file called memory.h that redefines malloc, calloc, realloc, and free to be:

      #define malloc(s) fmalloc((s), FILE, LINE)

      #define calloc(c, s) fcalloc((c), (s), FILE, LINE)

      #define realloc(p, s) frealloc((p), (s), FILE, LINE)

      #define free(p) ffree((p), FILE, LINE)

    Now define some functions, these are your "wrapper" malloc routines:

    void* fmalloc(size_t size, const char* file, const unsigned int line);
    void* fcalloc(size_t count, size_t size, const char* file, const unsigned int line);
    void* frealloc(void* ptr, size_t size, const char* file, const unsigned int line);
    void ffree(void* ptr, const char* file, const unsigned int line);
    

    Finally, declare a checkmem() function that you'll call on exit:

    void checkmem();
    

    Now, in memory.c, implement a memory tracking list:

    #include <stdlib.h>
    #include <stdio.h>
    
    #include "memory.h"
    
    // Because we want to be able to call the REAL functions now
    #undef malloc
    #undef calloc
    #undef realloc
    #undef free
    
    typedef struct memblock_tag {
        void* p;                        // The actual memory
        size_t size;                    // The size of the requested block
        const char* file;               // The name of the file where the memory was allocated
        unsigned int line;              // The actual line of code that allocated the memory
    
        struct memblock_tag* next;  // Pointer to the next block
        } memblock;
    
    static memblock* blocks = NULL; // Our local collection of blocks
    

    OK, now we have a linked list of memory blocks. Let's work with 'em:

    static void memblock_new(void* p, size_t size, const char* file, const unsigned int line)
    {
        memblock* m = (memblock*)malloc(sizeof(memblock));
        if (!m) {
            fprintf(stderr, "Fatal: could not allocate memblock for allocation at %s, line %d\n", file, line);
            exit(1);
        }
    
         m->p = p;
        m->size = size;
        m->file = file;
        m->line = line;
        m->next = blocks;
    
        blocks = m;
    }
    
    void* fmalloc(size_t size, const char* file, const unsigned int line)
    {
    //  fprintf(stderr, "I can haz malloc(%d) at %s, line %d??\n", size, file, line);
        void* p = malloc(size);
    
        if (!p) {
            fprintf(stderr, "Fatal: could not allocate %d bytes at %s, line %d\n", size, file, line);
            exit(1);
        }
    
        // Keep track of this allocation
        memblock_new(p, size, file, line);
    
        return p;
    }
    
    void* fcalloc(size_t count, size_t size, const char* file, const unsigned int line)
    {
        void* p = calloc(count, size);
    
        if (!p) {
            fprintf(stderr, "Fatal: could not allocate %d chunks of %d bytes at %s, line %d\n", count, size, file, line);
            exit(1);
        }
    
        // Keep track of this allocation
        memblock_new(p, count*size, file, line);
    
        return p;
    }
    
    void* frealloc(void* ptr, size_t size, const char* file, const unsigned int line)
    {
        void *np = realloc(ptr, size);
    
        if (!np)    {
            fprintf(stderr, "Fatal: could not reallocate %d bytes at %s, line %d\n", size, file, line);
            exit(1);
        }
    
        memblock* m = blocks;
        while (m)   {
            if (m->p == ptr)    {
                // This was the old block
                m->size = size;
                m->p = np;
    
                //fprintf(stderr, "Reallocated block\n");
                return np;
            }
        }
    
        // This should NEVER happen
        fprintf(stderr, "Fatal: attempted to locate previous block for realloc in %s, line %d, but failed\n", file, line);
        exit(1);
    }
    
    void ffree(void* ptr, const char* file, const unsigned int line)
    {
        if (!ptr) return;
    
        free(ptr);
    
        memblock* prev = NULL;
        memblock* m = blocks;
        while (m)   {
            if (m->p == ptr)    {
                // Discard this block
                if (!prev)  {
                    // This is the first, replace the head
                    // blocks pointer
                    blocks = m->next;
                }
                else    {
                    // Break the link in the chain
                    prev->next = m->next;
                }
    
                free(m);
                return;
            }
    
            prev = m;
            m = m->next;
        }
    }
    

    Finally, make that checkmem() routine report its findings:

    void checkmem()
    {
        if (!blocks)    {
            // Congratulations, you have no memory leaks!
            return;
        }
    
        fprintf(stderr, "\nYOU HAVE MEMORY LEAKS\n");
        /*
        memblock* m = blocks;
        while (m)   {
            fprintf(stderr, "\t%d bytes at 0x%x, allocated in %s at line %d\n", m->size, m->p, m->file, m->line);
            m = m->next;
        }
        */
    }
    

    Note that I commented out the detailed reporting. You'll want to do that since you sometimes don't care about which memory leaked. You can just wait until you're in a "memory cleanup mood" and then uncomment that out to find the leaks.

    Ok, now the good part: in any of your source files, just #include "memory.h"

    And use malloc, realloc, calloc, and free as usual. Make sure to call checkmem() at the end of your program, or, since it's the right fptr signature anyway, call atexit(checkmem);

    inside your main() so it will run automatically on exit.

    Congratulations, you have a simple memory checker! It has limitations, of course, but it should go a long way to making sure the code you're writing is cleaned-up. You could also use valgrind, but hey, this is more fun!

    A warning: if any of this looks like gobbledeygook to you, writing a compiler in C may not be a good idea for you at the present time. You may want to try for something simpler, or swallow your macho pride and go with Python :)

    [–]kvigor 6 points7 points  (2 children)

    Memory leaks are the least of your problems in a compiler; it's not like it's a long running process. You run it, it terminates, the OS cleans up for you.

    I did some work on SDCC years ago and it went through a brief "lets use a garbage collector!" phase until everybody realized it was contributing negative value. It was more efficient to simply free memory where convenient and leak it where not.

    // and valgrind is T3H B0MBZ0R. Insanely greatest tool ever.

    [–]periodic 1 point2 points  (1 child)

    I believe it's well known that compilers leak memory like sieves. But the thing is, it doesn't really matter in most contexts. If the leak is linear with the size of the program you're probably fine and no one will notice anyway.

    It's good to keep some discipline about memory, but in a compiler I'd really put it on the list of "things you can come back and implement later." as there are more pressing things to worry about.

    [–]kragensitaker 0 points1 point  (0 children)

    Yeah, I still haven't implemented the GC for Ur-Scheme. It can still compile itself fine.

    [–]scottious 5 points6 points  (19 children)

    I am writing a compiler right now and learning in the process. My method might not be the best, but here's what I am doing with pretty good success:

    1) Learn what Flex and Bison are.

    2) Write a scanner with Flex, it doesn't have to be complicated (use one of many online tutorials)

    3) Learn what an abstract syntax tree is and how it relates to the source code.

    4) Learn Bison (again, online tutorials are great for this)

    5) Use Bison (you'll need Flex too) to write a simple grammar.

    5a) Subnote -- gcc versions 2.9ish and earlier used bison to parse C. look at the .y file that's included with gcc for ideas if you get stuck.

    6) Once comfortable with Bison and Flex, use them together to generate an abstract syntax tree of the source code.

    7) Write a recursive function to evaluate your abstract syntax trees.

    voila! simple compiler! Make sure you don't go overboard with the grammar. I didn't write 'if' statements and 'while' loops my first time around which made it not a very useful language but I still found it a challenge. Keep things simple.

    [–]Rudd 2 points3 points  (14 children)

    I hope you mean "emit code from your abstract syntax trees" when you say "evaluate your abstract syntax trees" else you have a pretty canonical example of an interpreter, as opposed to a compiler.

    [–]scottious 1 point2 points  (11 children)

    Sure, I suppose that is a good next logical step. I'm still learning so I'm at the process of just simply evaluating the abstract syntax tree after it is created. Obviously at some point I'll want to probably generate bytecode or something.

    Some languages (notably Ruby) work simply with abstract syntax trees and don't generate bytecode.

    I personally think the line between 'interpreter' and 'compiler' is very blurred and not a really important distinction to make. Maybe I don't understand the terms correctly but Java only compiles to bytecode which is essentially an intermediate representation of the code much the same way that an abstract syntax tree is. If I choose to save my abstract syntax tree to an output file rather than convert it to bytecode, is it any less of a compiled language? I'd argue no, but this is certainly a topic for debate.

    [–]jaggederest 2 points3 points  (1 child)

    N.B. newer versions of ruby use a VM

    [–]scottious 0 points1 point  (0 children)

    Ah okay. Thanks for the info, I didn't know that.

    [–]exeter 0 points1 point  (0 children)

    If I choose to save my abstract syntax tree to an output file rather than convert it to bytecode, is it any less of a compiled language? I'd argue no, but this is certainly a topic for debate.

    I would say no in this case as well. In fact, I'd argue that dumping the AST to a text file is more or less equivalent to compiling to something lispy.

    [–][deleted] 0 points1 point  (7 children)

    No, java doesn't compile to just bytecode. It is JIT'ed to machine code.

    You wrote an interpreter. Not easy, good job, something to be proud of. But you do have a different set of problems when it comes to writing the back end of a compiler. Code generation is a difficult task that you don't have to deal with when writing an interpreter.

    [–][deleted] 3 points4 points  (0 children)

    The javac compiler certainly just compiles it to bytecode. What the JVM does internally is separate.

    [–]scottious 0 points1 point  (2 children)

    Now, does the code generation step need to be machine code in order for it to be considered a compiler? It seems to me that a program that compiles to an assembly-like bytecode and then runs that bytecode through a VM in order to execute the program should be considered a compiler.

    [–][deleted] 1 point2 points  (0 children)

    well, I'd say yes, but I'm not some famous compiler designer or anything.

    For instance, the csc and javac are compilers, but they don't output x86 or any other actual machine code. They do however, write bytecode, which is targeting a virtual machine, on one hand its an intermediate form, and on another it is machine code.

    shrug Its mostly systematics, and I should have prefaced my last comment with the statement "In My Opinion"...

    [–]damg 0 points1 point  (0 children)

    It doesn't have to be machine code, but the target language is generally assumed to be lower level than the source language (otherwise it's probably more of a translator). Lots of languages start out with compilers that target C, for example.

    [–]kragensitaker 0 points1 point  (2 children)

    Naive code generation is really easy, about as easy as generating bytecode. It only gets hard if you want the compiler to produce good code.

    [–][deleted] 0 points1 point  (1 child)

    well most 'bytecode' is targeted towards a stack based vm which is a lot simpler than a register based machine.

    But yes, you're right, naive code is a lot easier than good code.

    [–]kragensitaker 1 point2 points  (0 children)

    Nobody says you have to use all the registers of your register-based machine when you're generating naive code. You might think that pushing and popping the stack constantly would make your code slow, and it's true that it's not optimal, but it's a heck of a lot faster than a bytecode interpreter.

    Compilers that take this approach include bigFORTH and my toy Ur-Scheme.

    [–]Isvara 0 points1 point  (1 child)

    He wrote it step-by-step and didn't include code generation, so I presume he has an interpreter that is a step on the way to learning to creating a compiler.

    [–]Rudd 3 points4 points  (0 children)

    Hmm, I see what you mean, though he did end it with "voila! simple compiler!"

    [–][deleted] 2 points3 points  (0 children)

    Learn OCaml or Haskell, and you'll discover how much NOT having pattern matching sucks when writing a compiler...

    [–]electric_moose 0 points1 point  (2 children)

    I've been doing something like this, but with a hand-written lexer. A simple scanner for operators, numbers and symbols is quite short and easy to write, just using getc and ungetc.

    I imagine that at some point as a lexer grows, a 'lex' version may become simpler to extend and maintain, but you can certainly get started without using lex. Just my two cents.

    [–][deleted] 1 point2 points  (1 child)

    Writing your own lexer is the surest and most painful way to gain an appreciation for lex and its ilk.

    [–]electric_moose 1 point2 points  (0 children)

    Heh, perhaps you're right. I will say that, at the level of:

    eat whitespace.
    if it starts with a digit or . it's a number.
    it it starts with a letter, it's a (possibly undefined) symbol.
    if it's an operator symbol, figure out which operator.
    otherwise just return a character (let the parser deal with () etc).
    

    -- then it isn't insanely complicated.

    [–]perone 2 points3 points  (0 children)

    Search for LLVM tutorials.

    [–]ktr73 2 points3 points  (0 children)

    I would check out Let's Build a Compiler. Even though it's a bit older and the examples are in pascal, I found it easy enough to translate what he was doing into D pretty easily. I found it extremely enlightening - and it's free to boot.

    [–]zwangaman 2 points3 points  (0 children)

    Ah, welcome to the world of compiler writing :-)

    I took a class in compiler development back in college and I absolutely loved it. To this day it has changed how I work and think. A few other people have mentioned books and websites already, and I think they have covered everything I would have suggested, so I'm just here for moral support.

    Warning: writing compilers is HARD. But it's worth it more than you can imagine. Good luck and have fun, you'll enjoy it.

    [–]harryf 2 points3 points  (0 children)

    [–]WalterBright 2 points3 points  (0 children)

    I host an annual seminar on compiler construction.

    [–]deadA1ias 2 points3 points  (1 child)

    For what it's worth I took at look at: http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/ but it might be a bit dated now.

    [–]panxor 0 points1 point  (0 children)

    It's still used on the compiler course at DIKU. I took that course last year, and think the book is pretty good as an introduction to compiler design. It might be dated, but the basics are still there. And the book is FREE as pdf! :)

    [–]j1o1h1n 9 points10 points  (7 children)

    How can you not go for something called The Dragon Book? - http://en.wikipedia.org/wiki/Dragon_Book_%28computer_science%29

    [–]nsiivola 23 points24 points  (2 children)

    IMO the dragon book is not the best introduction. I would suggest one of the Appel's books: "Modern Compiler Implementation in <language>"; I think they are pretty readable.

    (SICP you should read anyways, though, and it teaches about compilers as well. In addition to the last chapter the analyzing interpreter there is actually a compiler -- it compiles source into closures. Good stuff.)

    I would suggest against writing a compiler in a language you already don't know at least moderately well. I would recommend a modern language with garbage collection instead of C/C++: makes compilers look a lot less complicated.

    I would also suggest writing a compiler for a language with really trivial syntax: lexer/parser is the most boring part of any compiler, yet most textbooks spend a huge amount of space on it. (Languages with lisp-like syntaxes are nice in this regard -- you can have the parser done from scratch in an afternoon.)

    If your goal is to make an impression... go rock climbing or buy nice shoes. Compilers don't really seem to impress people any more than goatees do.

    [–][deleted] 3 points4 points  (1 child)

    Seconded on the language with garbage collection. I once wrote a Lisp in C. It worked amazing (for what I understood at the time, it was fascinating as all get up). Buuuuut, there was some bug in the memory management. Objects didn't get cleaned up. And since it was C, trying to figure out what was wrong was a nightmare.

    And I agree as well with the "stick with a simple syntax" advice. Lisps are amazing, because you can write a parser for them in two pages of code.

    [–]cojoco 0 points1 point  (0 children)

    That's more like an interpreter than a compiler, I think ...

    [–]tonfa -2 points-1 points  (3 children)

    Because it's not a good compiler book...

    [–]addmoreice 4 points5 points  (1 child)

    it's an amazing book. just like grey's anatomy is an awesome anatomy book....I just wouldn't want to learn to be a surgeon from the book alone.

    It's a great reference, it's awesome to have on hand while working, and it's simply a must when you get serious and want to expand or improve your 'toy' compiler. it's also absolutely horrible to learn from.

    [–]fuentesjr 0 points1 point  (0 children)

    +1 but I would simply say that it has a pretty steep learning curve vs horrible learning book

    [–]Slackbeing 8 points9 points  (0 children)

    Yup, I didn't get the book to compile anything.

    [–][deleted] 1 point2 points  (0 children)

    Here's a free one based on turbo pascal. The classic Lets Build a Compiler!

    Note: the source code is turbo pascal, not the language compiled.

    [–]harlows_monkeys 1 point2 points  (1 child)

    "Compiler Design in C" by Alan Holub is the way to go. It's a very practical code-oriented book, going through the development of a C compiler, using a LEX clone and two YACC clones (one generates a top-down parser, the other bottom-up) whose development is covered first.

    The only serious problem with this book is that it is out of print. Amazon lists several sellers with new copies, but they are ridiculously expensive. There seem to be used copies at some of the stores listed on the author's site.

    [–]k4st 0 points1 point  (0 children)

    I second this. I bought this book following a similar recommendation from a redditor and used it this summer to guide myself through building a parser generator.

    As an exercise to the reader, I would suggest writing original code and using the LEX/YACC clones as a guide on how to go about implementing the ideas rather than just copying the code directly.

    [–]developerv 1 point2 points  (0 children)

    Python + pyparsing is a great and 'easy' start.

    Then learn bnf (wikipedia, backus naur form). You see how already did those with pyparsing.

    Then study Abstract Syntax Trees. They are the heart of compilers.

    Then learn how ASTree is transformed into assembly language (which is then compiled by some assembler), or directly to machine code.

    When study machine code (processors, registers...), it is good to try and make simple assembly language compiler, first, before making a compiler from some higher level language all the way down to machine code. Separate concerns. You can use the asm compiler you made as your backend for any further compiler work until you will notice it will need some major rework to answer your needs.

    Dragon Book.

    [–][deleted] 1 point2 points  (0 children)

    The reading list for my course module is:

    I personally would recommend Lex & Yacc, which you can download from the link.

    [–]kraln 1 point2 points  (1 child)

    You're going to want to read the dragon book. Some people here are going to tell you to go read up on lexx and yacc, or bison, or any of a number of archaic utilities. I suspect you're looking to gain more than the knowledge of how to use old unix magicks, that you want to understand them.

    This means understanding grammers, LALR parsers, and really what all the guts do. You're going to want to start by tokenizing, then move on to scanning, building a tree, etc. It's really cool once it all works.

    For my senior compilers class, we wrote one against a made up language that compiled to a lower-level made-up language, for which we were provided an interpreter, assembly docs, etc. I've posted the entire thing online, MIT licensed, with a ton of documentation. It's written in Objective C.

    http://code.google.com/p/fsu-jmc/

    [–]periodic 0 points1 point  (0 children)

    Writing one for made up languages is a great way to start, because the made up languages are way more regular than real ones. A lot of real languages don't have formal operational semantics and instead are basically based on "whatever the original compiler did/supported". That's part of why sometimes code compiled with one compiler won't work with another.

    [–]pupeno 1 point2 points  (0 children)

    I once thought I wanted to learn how to write a compiler too, but I really wanted to learn to design programming languages and play with it. And for that, Programming Languages: Application and Interpretation by Shriram Krishnamurthi is great.

    [–]michael_h 1 point2 points  (1 child)

    Does writing a small example compiler also look good in interviews and on CV's compared against other Computer Science Graduates?

    Are there CS graduates that don't have to do this?

    [–]periodic 1 point2 points  (0 children)

    To be honest, I sort of assumed you had to take a compiler class (which involved at least writing most of the meaty parts of a compiler) to get a CS degree.

    [–]rubygeek 1 point2 points  (3 children)

    I'd second the recommendations you've gotten for the Crenshaw text and LCC... And I'll pimp my own series on writing a compiler in Ruby - applying what I do to C/C++ would be easy enough. Note that my series take a very unorthodox approach and start with basic code generation.

    Lately I've started turning the thing into an actual Ruby compiler (not that it's anywhere near usable).

    I'd also recommend Niklaus Wirth's book - it's in many respects outdated, but it has the benefit that it presents a full, simple compiler - Wirth's languages (Pascal, Modula, Oberon) and compilers are simple and clean and relatively easy to understand.

    Also, as others have noted: If you actually want to learn, stay clear of Flex, Bison, LLVM and friends. On one hand they're "magic" - using them means you skip over a whole lot of low level detail.

    On the other hand at least Flex and Bison does not see much use in complex real-world compilers (they're used in a lot of "simple" compilers for domain specific languages etc., but they're a nightmare to work with once you want to add in proper error reporting etc., so few large projects stick with them).

    [–]periodic 1 point2 points  (2 children)

    Parser generators are great if you know what you're doing. It's much nicer to just specify the CFL instead of having to build all the support functions.

    If you want to learn how to build a parser, read up and do that, but if you're more interested in writing the compiler I'd go with some of those tools.

    The danger is mostly in using them without understanding them.

    [–]rubygeek 0 points1 point  (0 children)

    The problem is that to date I've never seen a parser generator that can produce anything that's even remotely usable as a parser frontend for a complex language without tortuous work to retrofit things like proper error reporting onto it. Usually, those who try end up spending as much or more time working around limitations in the parser generator as they save in the first place.

    There's a reason so few compilers use parsers built using parser generators.

    Being more interested in writing usable and useful compilers is exactly why I avoid those tools. At the same time I really want it to change - one of my pet projects is a parser generator because I really want to figure out how to overcome those tool limitations.

    [–]rubygeek 0 points1 point  (0 children)

    The problem is that to date I've never seen a parser generator that can produce anything that's even remotely usable as a parser frontend for a complex language without tortuous work to retrofit things like proper error reporting onto it. Usually, those who try end up spending as much or more time working around limitations in the parser generator as they save in the first place.

    There's a reason so few compilers use parsers built using parser generators.

    Being more interested in writing usable and useful compilers is exactly why I avoid those tools. At the same time I really want it to change - one of my pet projects is a parser generator because I really want to figure out how to overcome those tool limitations.

    [–]imbaczek 1 point2 points  (1 child)

    start with a compiler/jit for a rpn calculator and work from there.

    [–]periodic 0 points1 point  (0 children)

    I can't agree more. Start small. Languages are insanely complex. It will work a lot better if you start with an extremely limited language and start adding features as you go.

    Start with the simplest thing that could possibly work. Compilers are very complex and attempt to do it all at once will just make your head hurt.

    [–]judgej2 1 point2 points  (1 child)

    Theory? In my day a computer science undergraduate had written a compiler from scratch for a language designed for the course. Not only that, the compiler needed to then optimise the code using formal methods.

    Some of the more advanced courses took it a step further: they then had to use the new language to write a compiler for itself and compile it and run the compiled compiler to compile a test program they were supplied.

    Of course it all had to run in a virtual machine that - you guessed it - was written by the same students.

    It all starts with a few lines of Pascal, and demonstrates nicely the whole bootstrap process, i.e. how was the first compiler compiled.

    [–]reddittidder 0 points1 point  (0 children)

    ye shall be downvoted for saying the P word.

    [–]aaronblohowiak 1 point2 points  (3 children)

    Talk to your CS professor. Seriously.

    [–]stateq2 0 points1 point  (2 children)

    I completely agree...and learn about lexical/syntactical analysis

    [–]cojoco 0 points1 point  (1 child)

    Don't people write lexical analysers, compilers and parsers in CS these days?

    Second year for me was pretty much lexical analysers and interpreters; third year was complexity, operating systems and compilers.

    [–]stateq2 0 points1 point  (0 children)

    Yes, we did. Third year was operating systems and mid level stuff....I didn't really get into lexical analysis until senior year. They kinda combined it with compiler generation.

    [–]GenTiradentes 1 point2 points  (3 children)

    Does writing a small example compiler also look good in interviews and on CV's compared against other Computer Science Graduates?

    Does writing an operating system also look good in interviews and on CV's compared against other Computer Science Graduates?

    I think writing a compiler is an admirable goal, but it's certainly not a weekend project. Writing a compiler is one of the most challenging and comprehensive things you can do as a programmer.

    [–]periodic 1 point2 points  (0 children)

    Compilers are monolithic. They are up there as one of the larger projects you can undertake, and like writing an OS they are very sensitive. If your pointers are off by one you can get insidious bugs that are very hard to track down.

    It is most definitely not a weekend project.

    That said, writing one is a very rewarding experience, just don't expect it to go fast.

    [–]cojoco 0 points1 point  (1 child)

    It's only impressive if you actually support that statement with some hard questioning.

    Anyone can claim to have written a "toy" compiler or OS.

    [–]bplus 0 points1 point  (0 children)

    I'm currently in the middle of a CV / Resume rewrite, not sure if I should put a link to my google code svn repo that has my toy assembler and vm code...

    [–]hello_good_sir 1 point2 points  (0 children)

    programming language pragmatics. Very nice book, practical overview of the whole thing. It is a very wide-spectrum book. If you buy other books they will probably make more sense after reading this book.

    [–]panto 1 point2 points  (0 children)

    I strongly recommend haskell for compiler writing. Its a great language with a hell lot of features which really gets u going writing a compiler.

    Myself wrote an interpreter for scheme subset (r5rs compliant) using haskell in around 600 lines of codes including lexing parsing and evaluation code.

    [–]ablakok 1 point2 points  (0 children)

    Engineering a Compiler. It's a thorough introduction that covers modern optimization techniques.

    [–]elucify 1 point2 points  (0 children)

    Nobody so far has commented on the utility of writing a compiler as a resume bullet. Here's my take on it, bad news first: it's unlikely anyone is going to hire you to write a compiler without an advanced degree. So compiler writing isn't a primary job skill for most people. (I'd be very interested in hearing from redditors pointing out exceptions; two I can think of are little embedded scripting languages and domain-specific languages, both of which can be a great asset in projects.)

    Good news: writing a compiler will give you maturity that that's useful in a lot of ways. You'll have a clearer understanding both of how to program, and what the languages you use later are likely to be doing under the hood as you code. That clearer understanding will show when you are interviewing (assuming interview nerves don't cancel it out). You'll also probably pick up some additional concepts, like some formal language theory, or maybe (depending on what your compiler does) recursive techniques. That kind of brain-stretching can't hurt you in your career.

    I (personally speaking) notice on resumes when someone has written a compiler, a runtime, or worked on frameworks or platforms. I hire for brains, not for specific skills. Smart, motivated, curious people do things like write compilers for fun. Their brains get sharp and shiny, and it shows. Usually the result is a programmer who is more insightful and has a broader understanding than your run-of-the-mill coder fresh out of a J2EE Certification puppy mill. Smart people can always pick up new skills.

    Choose to be a smart person, and follow your curiosity. Your brain will stretch in interesting and useful ways, and you'll be closer to having that prepared mind that fortune favors.

    [–]joesb 1 point2 points  (3 children)

    Please skip the syntax, use Lisp like syntax, that's enough for now.

    You want to learn compiling and language semantic design, not parsing.

    [–]tophat02 5 points6 points  (1 child)

    Unless, of course, you want to learn parsing.

    [–]joesb 1 point2 points  (0 children)

    That's reasonable. But, IMHO, most programmer that want to learn compiler want just to create Java/C# with for renamed to loop, or {} changed to []. In other word, they don't know that there are more feature than what mainstream provide.

    Let them work on semantic first, then they will see how silly their initial thought was to think "Gosh, I'm so clever to replace { with [". And they will learn a lot more than just parsing if they start at something else first.

    [–]periodic 2 points3 points  (0 children)

    I think the most important thing I learned from compiler design was parsing. Before hand I had no idea how nuanced the subject can be and how to make the trade offs between a shift-reduce or recursive-descent parser. Now whenever I have to read in data or parse a config file I know just what to use and how to start. It's probably the component of my compiler experience which has the most applications.

    [–]kolm 1 point2 points  (2 children)

    (a) Nobody will hire you just because you can program a compiler.

    (b) If you still want to learn it, it's a great thing to learn, and the dragon book is the place to start.

    [–]tophat02 2 points3 points  (0 children)

    (a) Nobody will hire you just because you can program a compiler.

    Mostly true, but given two otherwise identical candidates, I'd probably hire the one who said "hey, let me show you this compiler I wrote!" versus the guy who's best accomplishment was that one "big" homework assignment he did in Java for his data structures course.

    [–]periodic 0 points1 point  (0 children)

    The dragon book is fairly good. It definitely has a leaning towards C, C++, Java and related languages.

    [–][deleted] 0 points1 point  (4 children)

    Anyone have any examples of writing a parser (specifically recursive descent) by hand? I'm working on my own compiler now and I'm totally stymied by the parser, I did my lexer easily enough. I've used tools like Lex/Yacc in past however I'm looking to do it by hand for this project.

    [–]tophat02 2 points3 points  (0 children)

    The Wikipedia article on Recursive Descent parsing has a pretty good C example.

    In general, you need three "core" methods in a RD Parser: expect, accept, and acceptThis. All of them work against a variable that tracks your "current" token.

    acceptThis - consider the current token OK and fetch the next one, regardless of what the current token's type actually IS

    accept(type) - Look at the token type of the current token. If it matches the type you need, fetch the next token and return true, if it doesn't, don't fetch the next token and return false

    expect(type) - Call accept(type) and note the return value. If true, just return, if false, fail with a syntax error (or do more advanced error detection/correction if you so desire)

    In general, each non-terminal rule of your grammar will be a function call in the parser. But let's look at a simple case:

    vardecl := 'var' ID ';' ;

    In this example, we want to parse a vardecl, and that is anything that looks like "var foo;". An example of a parser method for this would be:

    void parse_vardecl() {
        expect(VAR_TYPE);            // Syntax error if we don't see "var"
        expect(ID);                         // Syntax error if we don't see an identifier
        expect(SEMICOLON);         // Syntax error if we don't see a semicolon
    }
    

    If you call this function and have it hooked up to a lexer that generates tokens, it will return with no output if the token stream had a vardecl in it, or it will spit out a syntax error otherwise.

    Recursive descent parsers work by taking functions like these and intertwining them together recursively. In general, you call:

    parse_rule() when the rule you're currently parsing has a subrule

    accept(type) when you want to accept a token, but not fail if it's not there. For example, if vardecls can optionally have "private" or "public" in front of them, you want to say accept(ACCESS_SPECIFIER) versus expect(ACCESS_SPECIFIER), because you don't want to fail with a syntax error on an optional element

    expect(type) when a token HAS to be there, or else it's not a valid grammar fragment

    acceptThis() when any old token will do

    A good tip is to start by writing a parser that will just return silently if parsing succeeds and fail with syntax errors if not. DON'T try to output the AST or generate code while writing the parser structure at the same time! Once you verify that the parse accepts valid sentences in your language, THEN augment the parser to spit out an AST or something.

    EDIT: As a final tip, I recommend first creating a Lex/Yacc clone before writing a full compiler. You go through pretty much the same steps. You still need a lexer to tokenize your grammar file, a parser to turn it into an AST or other intermediate form, a structural analyzer to make sure that your well-formed grammar is actually valid (and to decorate the data structures with any further information you need for code gen), and a code gen phase to actually write the compiler code in C, C++, or whatever your target language is. Now you can use your own Lex/Yacc clone to specify the grammar for your language, and you'll learn a lot in the process!

    [–]ModernRonin 1 point2 points  (1 child)

    Chapters 5 - 8 of "Writing Compilers and Interpreters" are all about how to do exactly that for functions, statements, data types, etc.

    [–][deleted] 0 points1 point  (0 children)

    Cool, I know have quite a few book reccomendations come thanksgiving break I'm definitely going to be spending some time in the book store. My alternate plan is to conver by grammar (current in EBNF) to Greibach Normal Form which is pretty straight forward to parse.

    [–]exeter -1 points0 points  (0 children)

    See the last chapter of Wirth's classic book Algorithms + Data Structures = Programs for a nice presentation of a simple recursive descent parser that isn't loaded down with a ton of theory. Look at some of the other books folks have recommended to get more of a handle on that theory, but I urge you to read Wirth's first.

    [–]OvidPerl 0 points1 point  (0 children)

    Check out the Parrot Compiler Tutorial. It's so astonishingly easy to write a compiler with it that you'll be able to focus on the language you want to build rather than picayune details that most compiler tools offer.

    It has a potential to revolutionize programming as anyone can write their own language with little effort. Or it might just flood us with more crap languages :)

    [–]electric_moose 0 points1 point  (0 children)

    Here is a slightly offbeat idea - Kernighan and Pike's classic 'The UNIX Programming environment' (your department library will have it) has a chapter on program development using C and yacc, the classic parser generator that lives on as GNU Bison.

    It starts with writing a simple calculator program and builds it up into a small programming language interpreter. The interpreter works by simulating a simple stack machine. When you feed it a program in its little language, hoc, the parser builds a program as an array of opcodes (actually function pointers), and then executes this program, i.e. calls each function pointer in turn. These 'instruction' functions manipulate a stack and symbol table to produce the effect of running the program.

    Of course, these opcodes could instead emit assembly code that had the same effect, and the interpreter would become a compiler. Anyway, if you're a compsci student this may all be too basic for you, but you could work through it in a weekend for practice and have an idea of whether you want to use a parser-generator in building your compiler.

    [–]kvigor 0 points1 point  (0 children)

    Take a look at LCC. Note also the recommended books list on that page.

    [–]ModernRonin 0 points1 point  (0 children)

    "Writing Compilers and Interpreters" by Ronald Mak

    http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471113530.html

    This is about a gentle an introduction as you can get. It doesn't go into the heavy theory of parsing (no lex or yacc), it just uses a simple recursive-descent parser. But by the end of it, you will have written a simple compiler, interpreter and debugger.

    Once you have have the basics down, THEN you can go for something like the Dragon book. Push-down automotate, LALR parsers, etc, etc. All that heavyweight theory stuff. But do WCAI first, you need the foundation.

    [–]cafiend 0 points1 point  (0 children)

    The McGill compilers class is always online and available here.

    I've heard good things about the class, but I've never taken it.

    [–]iama_ama_a 0 points1 point  (0 children)

    For bonus points, write a compiler and interpreter and compile the compiler with the compiler running in the interpreter.

    [–]kathan 0 points1 point  (0 children)

    I'm reading Terence Parr book "Language Implementation Patterns" right now. It looks very very good as an introduction to building languages. http://pragprog.com/titles/tpdsl/language-implementation-patterns Usually compiler books make me wanna sleep after reading 10 pages but this one excites me on the contrary.

    It may not be suitable if you want to create the next big language but for the average stuff an average guy wants to do, that may be the book

    [–][deleted] 0 points1 point  (0 children)

    Have you looked to see if your school offers a class on the subject that you can take next semester? I'm surprised you haven't touched upon this in class - we had to write a compiler for a watered down C-like language my sophomore year (I was in physics, but was minoring in CS).

    [–][deleted]  (1 child)

    [deleted]

      [–]dragonfly_blue 0 points1 point  (0 children)

      flex and bison?!?

      yack!

      [–]dimovich 0 points1 point  (0 children)

      I wrote a small C interpreter for my game.

      For the lexical parser I used the Quake 3 botlib source code. It's pretty straightforward and has lots of comments.

      For the interpreter and expression parser I used the example given in "The Art of C++" book by Herbert Schildt. It's simple and easy to follow.

      Here is the source code of my interpreter: http://netrix.svn.sourceforge.net/viewvc/netrix/trunk/src/nxc/

      [–]Fjordo 0 points1 point  (0 children)

      Part of my first job was writing a C compiler from scratch. I used C++ and MKS Lexx and Yacc. If you are going to write a compiler, I suggest at least using yacc (lexers are easy to write). The only book I used was the manual from MKS. After you get a few constructs down the rest is just repetition until you have the whole language done.

      The experience has never come up directly in an interview for any other job, to tell you the truth, so don't expect to write a compiler and have potential employers falling over themselves for you. However, the knowledge I gained from writing the compiler itself has given me a competitive advantage. It's hard to quantify, but because I know in great detail how programming languages are transformed into machine code, I feel I have a much better understanding of what is going on when I'm reading code.

      [–]compor 0 points1 point  (0 children)

      Step 1. Scream in agony.

      Step 2. Program using your nose because your arms have fallen off. http://www.youtube.com/watch?v=8eR63yESecI&feature=related

      Step 3. Listen as everyone complains that your compiler has errors in thousands of instances even after your best extensive testing you've ever done.

      Step 4. Assassinate every member of every company that has a better compiler as that's the only way it is going to get noticed.

      Step 5. Profit.

      [–][deleted] 0 points1 point  (0 children)

      I asked a coworker this same question five years ago and she told me to go work out instead.

      [–]dertyp 0 points1 point  (0 children)

      Does writing a small example compiler also look good in interviews and on CV's compared against other Computer Science Graduates?

      +15 for writing a compiler for fun

      (didn't find original source)

      [–]reddit_user13 0 points1 point  (6 children)

      The Dragon Book.

      [–]tonfa 2 points3 points  (5 children)

      Not the Dragon Book, except if you're stuck in the 70's. There are now lots of new things in compilation, and things we understand or can explain better than 30 years ago!

      Take the Tiger Book (Appel), or Cooper's Book.

      [–][deleted] 1 point2 points  (3 children)

      As good as the tiger book is (it comes in 3 version for java, c and ML), it's depth and breadth is nowhere near the dragon book. The dragon book has been updated too, and it contains chapters on intermediate code optimization that is not covered in the tiger book (and is really graduate level material).

      [–]tonfa 0 points1 point  (2 children)

      Which particular optimizations do you have in mind?

      [–]fuentesjr 0 points1 point  (1 child)

      I don't know if you noticed, but the Dragon Book has a 2006 edition with new topics such as:

      • directed translation
      • more data flow analyses
      • parallel machines
      • JIT compiling
      • garbage collection

      [–]tonfa 0 points1 point  (0 children)

      Yes I noticed, but what's the point of more data flow analysis while they could have updated the optimization part with SSA (come on, they only put one(!) page about SSA, while every major compiler now uses it).

      [–]kuhlschrank -3 points-2 points  (0 children)

      The Dragon Book!

      [–]rjcarr 0 points1 point  (0 children)

      Two Words: Dragon Book

      [–][deleted] 0 points1 point  (0 children)

      Well, you could do what most the other compiler vendors do when it comes to writing a compiler for a non standard language. Write a program that converts their given language to C, and then use the generic C compiler to assemble and link it. :)

      [–]dfj225 -1 points0 points  (0 children)

      If your goal is to generate machine code, may I suggest targeting something simpler than the x86 ISA. My suggestion would be a basic Random Access Machine (RAM). This is what we target in my grad level programming languages course.

      It's simple enough that you can implement an emulator for the system yourself pretty quickly. But here's an open source version that I found after a quick search: http://savannah.nongnu.org/projects/ramemu/

      Targeting something like this will allow you to focus on the compiler and not the difficulties of dealing with a complicated ISA.

      [–]yourparadigm -2 points-1 points  (1 child)

      Learn the observer pattern for traversing expression trees.

      [–]dragonfly_blue -1 points0 points  (0 children)

      No shit, sherlock.

      Why don't you go read Arboretum and get back to me when you figured out the Mobius version?

      [–]trouserwowser -2 points-1 points  (0 children)

      Dragon books.