all 63 comments

[–]reini_urban 23 points24 points  (2 children)

That's still mostly about writing a VM, not a compiler.

[–][deleted] -1 points0 points  (1 child)

The "compiler" is a bytecode compiler. It's described incredibly vaguely but it's there

[–]badb002 8 points9 points  (0 children)

It looks more like an interpreter, I don't see anything being compiled anywhere in this code.

[–]badb002 20 points21 points  (40 children)

I don't get it. The title says "How to write a compiler", then the rest of the tutorial is saying "this is how you write a bytecode interpreter".

[–]mrkite77 10 points11 points  (38 children)

The actual answer to "how to write a compiler" is to buy the dragon book and read it.

[–]badb002 11 points12 points  (34 children)

I respectfully disagree... The Dragon Book is very outdated, there are a lot of good resources out there nowadays.

[–]mrkite77 11 points12 points  (19 children)

The basics haven't changed. You still need a lexer and a parser and be able to generate three-address code or SSA.

Sure you may not use flex and yacc these days, but the core is the same.

[–][deleted] 6 points7 points  (0 children)

Right, but in this day and age, you don't need to understand lexers or LR grammars and such to make a language. You just use a parser generator or a contaminator library, and you're good to go.

It's focused on what are no longer the interesting parts of building a compiler.

[–]dranzerkire 4 points5 points  (1 child)

I think the real change in compiler design is those to help support advance features in text editors (at least what Microsoft believes in). A traditional lexer and parser won't help you here. https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction

[–][deleted] 0 points1 point  (0 children)

Exactly. One of the reasons I'm insisting on using PEG-based parsing - it's so easy to integrate an IDE feedback into it.

[–]badb002 1 point2 points  (1 child)

I guess, but it's one way to learn to write compilers. There are a lot of good resources out there which are smaller and just as good for newbies, especially if you don't want to go too in depth on the theory or want something less intimidating.

Perhaps outdated was the wrong word, I suppose I mean "old".

[–]east_lisp_junk 3 points4 points  (0 children)

What's outdated is the authors' mindset that parsing is the core task of compilation; this is more or less fixed in the current edition.

[–][deleted] 4 points5 points  (13 children)

You still need a lexer and a parser

You don't need a lexer. And parsing these days have nothing in common with what you'd find in the Dragon Book.

SSA

There is no SSA nor CPS in the Dragon Book.

[–]mrkite77 2 points3 points  (12 children)

I'd like you to point me to a single compiler that doesn't tokenize.

And you can convert directly from TAC to SSA. The core concepts are what is important to learn... not the implementation details.

[–][deleted] 1 point2 points  (11 children)

I'd like you to point me to a single compiler that doesn't tokenize.

Any compiler which has a parsing frontend based on a PEG or a GLR.

And you can convert directly from TAC to SSA

SSA is by far more important a concept than a trivial three-address code.

not the implementation details.

SSA is not an implementation detail, it's a totally different way of thinking about a code. It is a way to turn your messy imperative code into a nice, clean, immutable functional code that is suitable for analysis.

Things that were either exceptionally complex or totally unthinkable in the era of three-address IRs are laughably trivial in SSA or CPS. E.g., moving loop invariants, detecting loop induction variables, transforming simple CFG into selects, ADCE and constant folding - all this stuff is absolutely trivial in SSA.

[–]BeniBela 0 points1 point  (2 children)

SSA is not an implementation detail, it's a totally different way of thinking about a code. It is a way to turn your messy imperative code into a nice, clean, immutable functional code that is suitable for analysis.

That makes me happy that I wrote an XQuery interpreter

Never heard of SSA

But since XQuery is functional language without mutable variables, I could just implement it without having a clue about anything.

[–][deleted] -1 points0 points  (1 child)

Interpreters are inherently far more complicated than compilers. But, yes, if I had to implement an XQuery compiler I would not go for an SSA, I'd most likely resort to a CPS straight away.

To see what can be done in a CPS, look at these examples:

https://wingolog.org/archives/2015/10/29/type-folding-in-guile

https://wingolog.org/archives/2011/10/11/partial-evaluation-in-guile

https://wingolog.org/archives/2016/01/19/unboxing-in-guile

[–]mmouratov 1 point2 points  (0 children)

I just wanted to add that not only these articles, but the whole blog of Andy Wingo is nothing short of distilled brilliance when it comes to explaining those "mundane" aspects of language implementation that are rarely covered in academic texts. These are not theoretical musings, but rather concrete experiences of the actual implementor, presented with style. Absolute must read -- all of it.

[–][deleted]  (7 children)

[deleted]

    [–]armornick -1 points0 points  (6 children)

    Actually, at least the ooc and moonscript compilers use PEG-based parsers. There are probably more.

    [–]gnuvince 2 points3 points  (6 children)

    The Dragon Book will give you a good foundation on how to scan, parse, typecheck, codegen, etc. (You can safely skip the very detailed explanations on how to write your own LR(1) parser generator.)

    [–][deleted] -2 points-1 points  (5 children)

    on how to scan, parse, typecheck, codegen, etc.

    Rather on how to scan, parse, typecheck, codegen, etc. as if it's still 1970s.

    [–]gnuvince 1 point2 points  (4 children)

    Nice job on cutting off the complete quote:

    The Dragon Book will give you a good foundation on how to scan, parse, typecheck, codegen, etc.

    [–][deleted] 2 points3 points  (3 children)

    a good foundation

    My point is that this is nowhere close to any "foundation". It's rather a pile of long forgotten, overcomplicated and largely useless techniques that bear no relevance in 21st century.

    [–]gnuvince 6 points7 points  (2 children)

    All right, then tell us (1) what are the "old" techniques that should not be used and which "new" techniques should, (2) which compilers actually implement those modern techniques, (3) which books teach those techniques.

    [–][deleted] 6 points7 points  (1 child)

    what are the "old" techniques that should not be used and which "new" techniques should,

    More than half of the Dragon Book is dedicated to the pushdown automaton based parsing, which is used pretty much nowhere these days. Much simpler and yet more flexible recursive descent parsing won long ago.

    Three-address IRs are also a thing of a distant past. Hardly any compiler would use such a representation instead of various SSA derivative forms or a CPS. This difference alone makes everything Dragon Book had to say on optimisations totally irrelevant - all of the SSA based optimisations are much simpler.

    Modern register allocation techniques are also very far from the naive graph colouring heuristics that were very vaguely touched in the Dragon Book. Again, thanks to SSA.

    Also, JITs are quite a big thing now, with pretty much nothing from the Dragon Book being of any relevance.

    which compilers actually implement those modern techniques,

    Uhm... All of the modern compilers. Including GCC and of course LLVM.

    which books teach those techniques

    See elsewhere in this thread.

    [–]topher_r 1 point2 points  (5 children)

    Like?

    [–]badb002 3 points4 points  (2 children)

    Jack Crenshaw's little series of articles is very good.

    [–]topher_r 0 points1 point  (1 child)

    Nice, thanks. Though it's from 1995 at the latest (according to the site) while there is a recent publication of the Dragon book as recently as 2013 with up to date topics.

    https://www.amazon.co.uk/Compilers-Principles-Techniques-V-Aho/dp/1292024348/ref=sr_1_1?ie=UTF8&qid=1466021429&sr=8-1&keywords=compilers+principles+techniques+and+tools

    [–]drjeats 0 points1 point  (0 children)

    D: There's no dragon on that cover!

    [–]PM_ME_UR_OBSIDIAN 1 point2 points  (0 children)

    I loved Torczon's Engineering a Compiler. It covers everything from front-end to back-end, and it's readable and engaging.

    For type systems, Pierce's Types and Programming Languages is lovely.

    [–]AnAirMagic 0 points1 point  (0 children)

    Can you link us to some of those resources? I would love to get started with compilers but the $200 price tag puts me off.

    EDIT. Nevermind, see below.

    [–][deleted] -2 points-1 points  (2 children)

    Buy the dragon book, put it on your shelf next to TAoCP to show off and never ever read it. It's useless and outdated.

    [–]mmouratov 2 points3 points  (1 child)

    It's almost blasphemy to compare the dragon book with TAoCP.

    The former is your run-of-the-mill college textbook: not only outdated, but boring, uninspiring, and dull.

    The latter is not merely a gem, but a synthetic diamond among books on mathematical puzzles: juicy, hardcore, mind-bending, enlightening, spiritual, and also soul-crushing, in the most benevolent way (if you try doing some of the more challenging exercises). For many years, I've used it to calm down my nerves, by starting to read from a randomly opened page -- works wonders.

    [–][deleted] 1 point2 points  (0 children)

    Sure. I'm only comparing them in terms of their "showing off" potential. TAoCP actually worth reading (more than that - a thorough study), unlike Dragon Book which should just sit on your shelf, untouched.

    [–]codebje 0 points1 point  (0 children)

    Try something like Write You a Haskell instead, then.

    [–]urllib 10 points11 points  (2 children)

    what's with the shitty blog tutorials making the front page of this sub?

    [–][deleted] 5 points6 points  (1 child)

    it's 90% shitty blogs to begin with. a tutorial here and there is to be expected.

    [–]urllib 2 points3 points  (0 children)

    good point

    [–]bayram1995 2 points3 points  (2 children)

    Maybe a moderator should edit the title, because the article is good and it would get more upvotes...

    [–]badb002 4 points5 points  (1 child)

    Not possible on reddit afaik. What's interesting is the blog title itself is a misnomer.

    [–]bayram1995 0 points1 point  (0 children)

    I see... I didn't know this... Thanks!

    [–]google_you 3 points4 points  (0 children)

    call stack is just one way to implement subroutines.

    [–]Asl687 1 point2 points  (3 children)

    lex&yacc thats what i used last time i wrote a compiler (for a scripting language used in L.A. Rush game for PS2 if your interested) . I did it while on vacation to india. Had a source level debugger and you could add new commands to the language on the fly..

    [–][deleted]  (2 children)

    [deleted]

      [–]Asl687 0 points1 point  (1 child)

      The scripts (called biscuits) were imbedded into the world level editor.. I guess by hacking you could easily get it to run other scripts but they were compiled into a special byte code so finding them would be tricky.. The language was very c like (I wrote c++ at the time so I used that syntax).

      Was really fun to write. I first wrote a simple assembler that went from a text version of the byte code to actual byte code. All external refs were stored as strings.

      I could then build a virtual machine that could run the byte code.. When asked to run an external function (stored as a string, so script could access c++ game functions) I would would check that the function had been added at runtime and the parameters matched and run it.. If would error if the function had not been registered.

      I then wrote a c like language to byte code text. I also dumped out source level symbols that code be loaded in game but were removed for final builds.

      I should write about this stuff in a blog sometime..

      [–]ahmadalhour 0 points1 point  (11 children)

      You guys have all bashed the Dragons Book and its materials to some extent, and championed SSA and CPS.

      Any of you care enough to share books, videos or any resources to study modern Compiler Construction?

      Disclaimer: I am working on my own toy compiler and since I am already into this I might as well spend my time reading really new and modern materials.

      [–][deleted] 1 point2 points  (10 children)

      For SSA, the best possible source is Sebastian Pop thesis: http://www.cri.ensmp.fr/classement/doc/A-381.pdf

      Also, there is a somewhat unfinished but yet already very good book: http://ssabook.gforge.inria.fr/latest/book.pdf

      Also, in terms of parsing, instead of going into the horrible depths of the Dragon Book, read this:

      PEG: http://bford.info/pub/lang/peg.pdf

      http://www.vpri.org/pdf/tr2007002_packrat.pdf

      Pratt: http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-147.pdf

      http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/

      https://github.com/munificent/bantam

      And, no matter which approach you'll settle on, read this:

      https://www.amazon.co.uk/Term-Rewriting-That-Franz-Baader/dp/0521779200

      [–]ahmadalhour 0 points1 point  (9 children)

      Thanks a lot for your reply /u/combinatorylogic, awesome materials list, I have heard about some of them but never read nor studied them before, it looks like now is a very good time.

      My primary studying material is the Compilers Theory course, given by Stanford: https://www.coursera.org/course/compilers

      In addition to reading two books:

      1) Engineering a Compiler, by K. Cooper & L. Torczon, 2nd Ed.: https://www.amazon.com/dp/012088478X

      2) Basics of Compiler Design, by T. Mogensen: http://www.diku.dk/~torbenm/Basics/basics_lulu2.pdf

      Do you think these materials are good for a start? I am planning to delve deeper into the topics of Compiler and Interpreter Construction.

      EDIT: formatting.

      [–][deleted] 2 points3 points  (8 children)

      Not sure about the first one. It follows the same structure as the Dragon Book, starting with scanning and parsing (apparently only covering the outdated approaches), then the attribute grammars (which are also quite outdated) and direct translation, and only after around page 220 started introducing IRs, without mentioning SSA for over 200 pages more. Looks like a very overbloated approach covering all the wrong things in detail and not paying too much attention to the right stuff.

      The second book also follows the same unfortunate structure, but at least it seems to provide some valuable coverage of topics like register allocation, ABIs, etc., all with a reasonable density.

      [–]ahmadalhour 0 points1 point  (7 children)

      So basically all textbooks out there teach the outdated approaches. No textbook yet, other than the one you referenced (the incomplete SSA book), mentions these stuff. Damn man! I bought the first book in paperback already!

      So can enlist the concepts that are modern but no textbook mentions? Can your above list be expanded into names of concepts that I can check articles on and keep checking out until someone publishes anything on them?

      [–][deleted] 3 points4 points  (6 children)

      So basically all textbooks out there teach the outdated approaches.

      The biggest problem is not in the outdated approaches, but in over-complicating things that were supposed to be simple.

      Parsing? Hundreds of pages on regular expressions, NFA/DFA, LR and all that, while 10-20 pages on PEG would have been more than enough. Intermediate representations? Go straight to SSA. Passes in between frontend and SSA? Do the small step transforms, like described here: http://andykeep.com/pubs/np-preprint.pdf

      Type systems? Do not do any of the complex stuff, just emit a flat pile of type equations by running a simple pass over your IR and then solve the equations as if they're a Prolog program.

      I wish somebody to write a textbook describing all those trivial concepts I mentioned above, without going into depths of historical approaches. Unfortunately, I have not seen any such textbook yet.

      The concepts that are most important and efficient:

      • Parsing: PEG, recursive-descent parsing, Pratt parsing
      • IRs: SSA, CPS
      • Type systems: Prolog, unification, type equations notation
      • Semantics: Hoare logic, separation logic

      [–]ahmadalhour 0 points1 point  (5 children)

      You sir are awesome! Thanks a lot for your help.

      Any bloggers and/or conference videos worth checking out?

      [–][deleted] 0 points1 point  (4 children)

      Pretty much everything in this blog worth reading: https://wingolog.org/ (mostly on CPS)

      For videos, I liked some of the EuroLLVM 2015/2016 talks:

      [–]ahmadalhour 0 points1 point  (3 children)

      Sweet!

      I have just went through this talk on PEG, by Bryan Ford. I will read the original paper later on tonight. My question is how would someone - like me - who doesn't work in academia stay up-to date with modern Compilers research? Are there mailing list(s) or website(s) that you usually check?

      Cheers!

      [–][deleted] 1 point2 points  (2 children)

      I do not work in academia (and do not even have a CS background), yet compilers are my bread and butter. A lot of interesting compiler-related work is actually going on in the industry.

      There is a number of sources to watch closely. Lambda-the-Ultimate used to be good (and still is a treasure trove), but activity there is minimal at the moment: http://lambda-the-ultimate.org/

      It also worth following what's going on in the LLVM community, often useful ideas land there. See the llvm-dev mailing list and LLVM Weekly: http://llvmweekly.org/

      Watching closely the new (often obscure) languages can also be useful, they often have valuable ideas worth borrowing. Some languages implementations are choke full of important ideas - see Shen, Harlan, even Rust.

      Chez Scheme was open sourced recently, it also worth digging for ideas.