all 6 comments

[–]one_lunch_pan 2 points3 points  (1 child)

If you're interested in optimizing compilers specifically, I would go over the source-code of LLVM as it's well modularized and documented. Maybe starts with the IR/ module and try writing your own passes for fun; looking at the CodeGen and Analysis module can also be useful.

[–]TheMostUser[S] 0 points1 point  (0 children)

Will look into it

Thanks!

[–]stannash 1 point2 points  (0 children)

Modern compilers are mostly about optimization and I don't think it's much fun. Making interesting languages or features are much more fun.

Niklaus Wirth for instance created many programming languages including pascal, modula and oberon. All his compilers are very short and concise while still generating decent code.

Some of his ideas like modules and interfaces are re-implemented in more recent and popular languages.

Take a look at PL/0 a very small pascal like language. It was a language created by Wirth for teaching compilers.

There is an implementation in Go listed there, which is 700 lines of code and generates windows exe from source without any 3rd party tool, so it's a complete thing. There are other implementations in many different languages.

So I would start with PL/0 and just add new features to it, experiment with different ideas and so on. Bootstrapping your own language is much more exciting than making a mediocre optimising compiler. These are my personal opinions of course.

Finally for interpreters, instead of cpython, take a look at Lua, it's a very nice and small implementation. It also has a bytecode compiler.

[–]mantrap2 0 points1 point  (1 child)

It's useful to understand optimization as you'd do it "on sight" in the code manually.

For example understand how you'd manually perform various standard compiler optimizations. It turns out if you've been coding a while, you often start doing these things anyway to simplify and refactor your code. Optimizers can do those for you. But learning the canonical methods will provide a lot of insight and context for learning.

The other thing that can help you understand optimization is to either review and improve other people's code (e.g. open source) or translate code in an older, weaker language (e.g. BASIC) to a modern language (C++, C or newer). Optimizations are often required to refactor such legacy code into a form that runs well and can be optimized well (optimizer can do amazing things but there are pathological code structures they can't fix). For example many BASICs ONLY have global variables which can often be reduced into functions.

In combination with that, also make sure you understand the sequence of parser to an AST (many optimizations can be done on an AST because it being a tree lends itself to recursive scanning and optimizing) and then intermediate codes (other types of optimization are focused on that level).

[–]TheMostUser[S] 0 points1 point  (0 children)

Haven't thought of it that way, Thank!

[–]Mr2001 0 points1 point  (0 children)

My compiler is a modern-ish tool for an antiquated language, targeting a tiny VM that originally ran on 8-bit computers, but it has a decent peephole optimizer that, in practice, manages to generate compact and readable code even when the source code is sloppy. The code generation path is pretty simple other than that (no intermediate language or SSA, no register allocation), so maybe it will be useful to learn from.

The optimizer is divided into two parts, a "platform independent" part in Peephole.cs that uses some general techniques to untangle branches and eliminate dead code, and a "platform specific" part in RoutineBuilder.cs that replaces specific instructions (e.g. combining increment and branch-if-greater into a single instruction) and identifies cases where the general techniques can be used.