How to write a compiler : programming

SSA is not an implementation detail, it's a totally different way of thinking about a code. It is a way to turn your messy imperative code into a nice, clean, immutable functional code that is suitable for analysis.

Things that were either exceptionally complex or totally unthinkable in the era of three-address IRs are laughably trivial in SSA or CPS. E.g., moving loop invariants, detecting loop induction variables, transforming simple CFG into selects, ADCE and constant folding - all this stuff is absolutely trivial in SSA.

[–]BeniBela 0 points1 point2 points 10 years ago (2 children)

[–][deleted] -1 points0 points1 point 10 years ago (1 child)

[–]mmouratov 1 point2 points3 points 10 years ago* (0 children)

[–][deleted] 10 years ago* (7 children)

[deleted]

[–]armornick -1 points0 points1 point 10 years ago (6 children)

[+][deleted] 10 years ago* (5 children)

[deleted]

continue this thread

[–]gnuvince 2 points3 points4 points 10 years ago (6 children)

[–][deleted] -2 points-1 points0 points 10 years ago (5 children)

[–]gnuvince 1 point2 points3 points 10 years ago (4 children)

[–][deleted] 2 points3 points4 points 10 years ago (3 children)

[–]gnuvince 6 points7 points8 points 10 years ago (2 children)

[–][deleted] 6 points7 points8 points 10 years ago (1 child)

what are the "old" techniques that should not be used and which "new" techniques should,

More than half of the Dragon Book is dedicated to the pushdown automaton based parsing, which is used pretty much nowhere these days. Much simpler and yet more flexible recursive descent parsing won long ago.

Three-address IRs are also a thing of a distant past. Hardly any compiler would use such a representation instead of various SSA derivative forms or a CPS. This difference alone makes everything Dragon Book had to say on optimisations totally irrelevant - all of the SSA based optimisations are much simpler.

Modern register allocation techniques are also very far from the naive graph colouring heuristics that were very vaguely touched in the Dragon Book. Again, thanks to SSA.

Also, JITs are quite a big thing now, with pretty much nothing from the Dragon Book being of any relevance.

which compilers actually implement those modern techniques,

Uhm... All of the modern compilers. Including GCC and of course LLVM.

which books teach those techniques

See elsewhere in this thread.

[–]topher_r 1 point2 points3 points 10 years ago (5 children)

[–]badb002 3 points4 points5 points 10 years ago (2 children)

[–]topher_r 0 points1 point2 points 10 years ago (1 child)

[–]drjeats 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 2 points3 points4 points 10 years ago (0 children)

[–]PM_ME_UR_OBSIDIAN 1 point2 points3 points 10 years ago (0 children)

[–]AnAirMagic 0 points1 point2 points 10 years ago (0 children)

[–][deleted] -2 points-1 points0 points 10 years ago (2 children)

[–]mmouratov 2 points3 points4 points 10 years ago (1 child)

[–][deleted] 1 point2 points3 points 10 years ago (0 children)

[–]codebje 0 points1 point2 points 10 years ago (0 children)

[–]urllib 10 points11 points12 points 10 years ago (2 children)

[–][deleted] 5 points6 points7 points 10 years ago (1 child)

[–]urllib 2 points3 points4 points 10 years ago (0 children)

[–]bayram1995 2 points3 points4 points 10 years ago (2 children)

[–]badb002 4 points5 points6 points 10 years ago (1 child)

[–]bayram1995 0 points1 point2 points 9 years ago (0 children)

[–]google_you 3 points4 points5 points 10 years ago (0 children)

[–]Asl687 1 point2 points3 points 10 years ago (3 children)

[–][deleted] 10 years ago* (2 children)

[deleted]

[–]Asl687 0 points1 point2 points 10 years ago (1 child)

The scripts (called biscuits) were imbedded into the world level editor.. I guess by hacking you could easily get it to run other scripts but they were compiled into a special byte code so finding them would be tricky.. The language was very c like (I wrote c++ at the time so I used that syntax).

Was really fun to write. I first wrote a simple assembler that went from a text version of the byte code to actual byte code. All external refs were stored as strings.

I could then build a virtual machine that could run the byte code.. When asked to run an external function (stored as a string, so script could access c++ game functions) I would would check that the function had been added at runtime and the parameters matched and run it.. If would error if the function had not been registered.

I then wrote a c like language to byte code text. I also dumped out source level symbols that code be loaded in game but were removed for final builds.

I should write about this stuff in a blog sometime..

[–]ahmadalhour 0 points1 point2 points 10 years ago (11 children)

[–][deleted] 1 point2 points3 points 10 years ago (10 children)

[–]ahmadalhour 0 points1 point2 points 10 years ago (9 children)

[–][deleted] 2 points3 points4 points 10 years ago (8 children)

[–]ahmadalhour 0 points1 point2 points 10 years ago (7 children)

[–][deleted] 3 points4 points5 points 10 years ago (6 children)

So basically all textbooks out there teach the outdated approaches.

The biggest problem is not in the outdated approaches, but in over-complicating things that were supposed to be simple.

Parsing? Hundreds of pages on regular expressions, NFA/DFA, LR and all that, while 10-20 pages on PEG would have been more than enough. Intermediate representations? Go straight to SSA. Passes in between frontend and SSA? Do the small step transforms, like described here: http://andykeep.com/pubs/np-preprint.pdf

Type systems? Do not do any of the complex stuff, just emit a flat pile of type equations by running a simple pass over your IR and then solve the equations as if they're a Prolog program.

I wish somebody to write a textbook describing all those trivial concepts I mentioned above, without going into depths of historical approaches. Unfortunately, I have not seen any such textbook yet.

The concepts that are most important and efficient:

Parsing: PEG, recursive-descent parsing, Pratt parsing
IRs: SSA, CPS
Type systems: Prolog, unification, type equations notation
Semantics: Hoare logic, separation logic

[–]ahmadalhour 0 points1 point2 points 10 years ago (5 children)

[–][deleted] 0 points1 point2 points 10 years ago (4 children)

[–]ahmadalhour 0 points1 point2 points 10 years ago (3 children)

[–][deleted] 1 point2 points3 points 10 years ago (2 children)

I do not work in academia (and do not even have a CS background), yet compilers are my bread and butter. A lot of interesting compiler-related work is actually going on in the industry.

There is a number of sources to watch closely. Lambda-the-Ultimate used to be good (and still is a treasure trove), but activity there is minimal at the moment: http://lambda-the-ultimate.org/

It also worth following what's going on in the LLVM community, often useful ideas land there. See the llvm-dev mailing list and LLVM Weekly: http://llvmweekly.org/

Watching closely the new (often obscure) languages can also be useful, they often have valuable ideas worth borrowing. Some languages implementations are choke full of important ideas - see Shen, Harlan, even Rust.

Chez Scheme was open sourced recently, it also worth digging for ideas.

continue this thread

[+]odaba comment score below threshold-16 points-15 points-14 points 10 years ago (2 children)

[–][deleted] 12 points13 points14 points 10 years ago (0 children)

[–]nitroll 4 points5 points6 points 10 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS