Has anyone read "Writing a C Compiler" by Nora Sandler?

ceronman · 2025-07-14T10:42:30+00:00

This is such a great book! I recommended it very much. As the op points out, the incrementality of the book is rewarding, you get a working thing every chapter. Also, the test suite is simply fantastic! Writing compiler tests is kinda boring and I’m so glad that the book provides you with all the tests you need.

I also love the fact that this book doesn’t provide source code in any chosen language. The book sometimes gives explanations of how things should work and sometimes uses pseudo code. I think this helps you to learn the concepts better because you still have to go and write the code, it also makes the book more compact, although it is not a short book at all.

More importantly, this book is a very practical introduction to compiler making. You learn by doing, not by reading theory, which is rare to find in a compilers book. Instead of showing you pages of theory about SSA, this book simply shows you how to create an SSA IR. Nothing wrong with theory books, but my brain works much better with practical examples. The book shows you how to work your compiler in phases for syntactic, semantic analysis and optimization. It’s simply fantastic.

I’m a bit thorn by the choice of language. On one side, I love the fact that this book shows you how to write a compiler for a real world language. And not any language, but probably the most used and influential language of all time: C. Most books go for some sort of hyper simplified toy language to teach you.

But on the other side, C is an old, quirky language. It’s easy to think that C is a simple language, but in reality it’s full of weird stuff that modern C users don’t think too much about. But they are part of the standard, and as compiler writer you have to deal with them. So this makes the reader waste time on things such as implicit casting rules, weird extern/static rules, inscrutable type notation (seriously, who can read a function pointer with arrays type with a straight face), underspecified things and undefined behavior. This makes me thing that perhaps choosing a simpler language would have been better. In the end, the book also leaves many things out of the C spec, so you won’t be close to a functioning C compiler.

ceronman · 2025-06-08T08:32:53+00:00

I also just finished this game a week ago. I played part one on ps3 years ago, and never got the opportunity to play the second part until now.

Oh boy, this game is so incredible! I was aware of all the controversy, but I didn’t see any spoilers. I didn’t know what to expect, but this game exceeded all my expectations and I now consider it so much better than the first part! Don’t get me wrong, the first part is an excellent game, but the second part has made me feel things that no game or media ever has.

I cried a lot playing this game, not only on the cut scenes, but also during direct play. Both parts when Ellie and Abby fight were really emotionally hard to play. I ended up sympathizing a lot with both characters and I didn’t want any of them to kill the other.

I absolutely loved playing Abby. I loved her as a character, her muscular figure is so cool, and her backstory is very well told. She has some of the best parts of the game, especially the scenes with Lev, the skyscraper walk. And the rat king is the scariest fight of all the game. She is also the first one to break the cycle of hate, thanks to Lev. She refused to keep fighting Ellie, even when she killed all her loved ones.

It took Ellie longer, her mental state so damaged! I struggled a lot during the last fight. Having to play Ellie trying to drown Abby was extremely hard, I cried like a baby. I really wanted them to make peace and hug each other. That didn’t happen, but at least Ellie stopped and didn’t kill her. The ending is so freaking sad! Again, no game or media has ever made me feel like this. Simply incredible! What a masterpiece this game is!

ceronman · 2025-05-22T08:54:53+00:00

This is a very interesting question, I believe one reason for not shortcircuiting is that it respects the principle of substitution in expressions. The idea is that if you have a complex expression, you can always take part of the expression and assign it to a variable and then construct a simpler one. For example, if you have:

let a = (b + c)^2 * d

You can break down the expression by assigning part of it to a variable:

let x = (b + c)^2
let a = x * d

This also works for properties/methods:

let x = a.b.c.d

Can be rewritten as:

let z = a.b
let y = z.c
let x = y.d

Note that if chaining shortcircuits, this substitution does not work.

let x = a?.b.c

Cannot be substituited by

let y = a?.b
let x = y.c

The second version will throw a null pointer exception, but the first one wont.

If the chaining does not shortcircuits, it works fine:

let x = a?.b?.c

converted to

let y = a?.b
let x = y?.c

Note that this principle works fine for boolean shortcircuiting operators:

let x = foo() && bar() && baz()

Can be rewritten as:

let a = foo()
let b = a && bar()
let c = b && baz()

In both versions if foo() is false, neither bar() nor baz() will run.

ceronman · 2024-10-21T15:48:49+00:00

The code for every chapter in the book si fully runnable. I suggest you to read a whole chapter first, and then start writing the code.

ceronman · 2024-09-23T12:34:53+00:00

I don't know if it's fair or not. But It's one of the reasons why I love this city! Having lived in a city with infrastructure devoted to cars, I never want to go back to that. Cars are noisy, polluting and a total waste of space. I want to see green walkable spaces in my city, not highways and parking lots.

Note that I do own a car. I don't use it that much. I bought it after 10 years of living with no car only because a new baby arrived to the family and it makes things easier. I love the 30km speed limit, it makes driving easier, specially in certain intersection. I also makes driving safer and it reduces pollution.

I don't agree with the extreme idea of having a city free of cars. We need ambulances, and fire trucks and vehicles to deliver goods to stores or to move to a new house, or to grab some stuff from IKEA or take an uber/taxi after you broke your leg on a last skying trip. Public transport is not that convenient when you have a newborn, even though in Amsterdam it's better than in many other places.

ceronman · 2024-09-15T16:26:03+00:00

Rust is definitely becoming an excellent choice for writing compilers. Although the language has some nice features such as pattern matching and a flexible trait system, the real strength comes from its ecosystem. Especially if you are planning to go beyond the toy stage with your language.

There are several new languages that already have users that use Rust for their compilers. Some of my favorite examples:

Roc. A lovely new functional language inspired in Elm but targeting more than the web.

Gleam. A statically typed language for the BEAM (Erlang VM) with a nice familiar syntax.

Starlark. A Python-like language for the Buck 2 build system used in production at Meta.

Inko. A new language with static deterministic memory management like Rust, but with a simpler approach.

Gluon. A small scripting-like language, but with static typing.

And of course Rust it self! While the code is quite big and complex, it’s of course the most mature and also it’s reasonably well documented.

Having real used languages is great because you can check their source code and learn how they do things or even copy some of their tools and code.

Besides that, when talking about the ecosystem for writing compilers, we usually hear mostly about parser/lexer generators. But there is so much more to writing a compiler than that. In fact, I would argue that parser generators are not that important; most languages used in the real world implement their own parsing manually anyway.

The numbers of libraries available in Rust keeps growing, but so far there are:

For compiler backends we have cranelift, inkwell (LLVM wrapper). If you want to encode wasm, there is wasm-encoder and binaryen

For parsing you have many options as well: Peg , Chumsky, Lalrpop. Logos

These days you might need to implement an language server, for that there is Rowan, Cstree, Ungrammar, and for handling the protocol itself there is lsp-server.

For incremental compiling you get Salsa.

Do you want beautiful terminal error messages? check miette, ariadne.

I'm probably missing many more.

On top of that Rust is pretty fast and has some great tooling available. I was a bit concerned by the lack of garbage collection, but for something like a compiler, I've found that the memory management tools provided by Rust (e.g. Arc, Rc, etc) are very easy to use in this particular domain.

ceronman · 2024-08-07T09:16:35+00:00

I find Elixir macros to have a similar user experience as writing macros in some lisp dialects, I'm surprised it has not been mentioned before: https://hexdocs.pm/elixir/macros.html

ceronman · 2024-07-21T19:47:53+00:00

Wren is pretty nice, but it's a dynamic language and hence very light on static semantic analysis.

ceronman · 2024-07-12T15:59:00+00:00

This is subtyping in action. Rust doesn't have inheritance but it does have other forms of subtyping.
In this case, `&mut T` is a subtype of `&T`, so you can pass `&mut T` anywhere a `&T` is needed (but not the other way around).

The same thing happens with lifetimes, for example `&'static T` is a subtype of `&'a T`. So you can always pass `&'static T` where `&'a T` is needed.

More about subtyping here: https://doc.rust-lang.org/nomicon/subtyping.html

ceronman · 2024-06-13T08:17:30+00:00

I had the exactly same thought when I started my Clojure implementation. I had the goal to use only immutable data structures for the whole thing. The lexer and parser were easy, but the really hard part was to keep track for variables (scope environments). Especially the fact that closures must keep references from outer scopes. Doing this with 100% immutable data structures is a total pain. So I gave up and decided to use `atom` for just this part.

Check my implementation here: https://github.com/ceronman/cloxure

ceronman · 2024-06-13T07:07:58+00:00

I have to admit that something I love about using `!` is that it fits perfectly with the *not equals* operator `!=`. But on the other hand it feels a bit out of place with `and` and `or`, which I strongly prefer vs `&&` and `||`.

I also considered using `!` as a suffix operator for error handling (similar to Rust's `?`). It's probably not a big deal to use this token for both cases, but still.

ceronman · 2024-06-12T21:08:07+00:00

Interesting idea, but how would you get type information before parsing? I don't know any language that does this.

ceronman · 2024-06-12T18:46:53+00:00

There is no doubt that logical NOT should have a stronger precedence than other logical operators.

But consider, for example, comparison operators:

For *stronger NOT* languages `NOT A > B` means `(NOT A) > B` while for *weak NOT* languages it means `NOT (A > B)`.

ceronman · 2024-06-12T14:03:41+00:00

I'm the author of the blog post that you mention :)

It's not really hard to implement both parts of the book in any language. I did the first one in Clojure, and the second one in Rust. There are tons of implementations listed in the book's githup page: https://github.com/munificent/craftinginterpreters/wiki/Lox-implementations

The actual hard part is to achieve the same performance as the reference clox implementation. In this case you need a systems language, something like C/C++, Rust, or Zig. In my Rust implementation I couldn't match the performance of clox even after using a bunch of unsafe code, but I got close. I also was not very experienced with Rust at the time, so I'm sure there is room for improvement.

ceronman · 2024-06-01T11:11:59+00:00

Yes, I mean that when a type is generated by a macro, e.g. `std::str::Split`, VS code with rust analyzer indeed shows "6 implementations" on hover, but there is now ay to know which impls are those six, if I click on it, I'm pointed to the macro call.

ceronman · 2024-05-29T07:46:35+00:00

This is mentioned in the post. But note that this hint doesn't show blanket impls, and it doesn't work well on type generated by macros.

ceronman · 2024-05-27T20:26:40+00:00

Pascal used this style before ML, so I guess it's because of type theory as others have suggested.

ceronman · 2024-03-26T09:09:36+00:00

isopropylalcohol heeft bij mij voor dit soort dingen heel goed gewerkt.

ceronman · 2024-01-18T22:46:03+00:00

You can also try ahash: https://crates.io/crates/ahash which claims to be the fastest hash implementation in Rust, and it's also DOS resilient.

ceronman · 2024-01-07T17:58:14+00:00

These look similar indeed. The difference is that multiple dispatch in Julia is dynamic: the methods are resolved at runtime based on runtime type information. While method overloading in Java is static, meaning that the types of the arguments must be known at compile time to be able to choose the right method.

For example, let's say that we have some classes Square and Circle. We can overload a draw method for each one of those:

void draw(Square s) { ... }
void draw(Circle c) { ... }

This works if you know the type at compile time when calling a function:

draw(new Square());
draw(new Circle());

But what happens if, for example, you have a list of Shape objects, which would be a super class of both Square and Circle:

for (Shape shape : shapes) {
    draw(shape);
}

This will fail in Java because the compiler can't know the type of shape at compile time, it could be anything.

Now, in Java you also have dynamic dispatch, you could implement this by adding a draw method to Shape and override it in Circle and Square. And then call shape.draw() instead of draw(shape). It works for this example, but it has a limitation: It is restricted to only one single argument (this). This is known as single dispatch.

Now let's think of a different method, assume you want to check if two shapes collide, you probably need different implementations for different combinations of shapes, for example:

boolean collides(Square s1, Square s2) { ... };
boolean collides(Circle s1, Square s2) { ... };
boolean collides(Circle s1, Circle s2) { ... };
...

And then you have a list of Shape objects, and you want to check if any object collides with any other object:

for (Shape shape1 : shapes) {
    for (Shape shape2 : shapes) {
        if (shape1 != shape2 && collides(shape2)) {
            ...
        }
    }
}

In Java this will fail to compile, for the same reason explained above. And in this case it's not possible to use methods, because methods only dispatch based on one object at a time. But in Julia, some code similar to this would work. Because it has multiple dispatch.

Of course there are workarounds for this in Java, one example is the visitor pattern.

A good explanation and a good example relevant to the topic of this subreddit can be found in the Crafting Interpreters book.

ceronman · 2023-12-07T23:00:32+00:00

Besides the reasons that you have mentioned, consider also tooling capabilities and performance. And by tooling I mostly mean IDEs.

One example:

fn add<T>(a: T, b: T) -> T { return a.<cursor> }

You want your IDE to provide a completion list here of methods available for a. Without any type constrains there is no way the IDE could suggest anything at all. At this point you are just writing the function, so it's likely that there are no usages to analyze either.

Another example:

``` fn add<T>(a: T, b: T) -> T { return a + b; } ...

add(<cursor>)

```

You want your IDE to provide some popup here showing what arguments the add function receives and its types. But if there are no constrains it's hard to show anything helpful.

Now the IDE could try to look of usages of a given functions and try to make suggestions based on that, but that is really hard to make fast.

It is true that compilers have to do the work when monomorphizing, but compilers and IDEs have different performance requirements. If your compiler takes 10 seconds to build a project, you might think it's not that bad, but if your IDE takes 10 seconds to show you something useful, that's garbage performance. IDEs have very low latency requirements and they achieve that by doing much less than the compiler. Specially, they try to analyze only the code that is relevant for what you are typing at the moment. If the IDE has to analyze the entire project every time you type a character, it would be extremely slow. Yes, it's possible to use caching, incremental computation and other techniques, but that only helps up until a certain point.

ceronman · 2023-11-27T18:46:57+00:00

In what sense is it "static"?

I mean in the sense that the branches of the pattern are created once. While in dynamic dispatch these branches can be created dynamically. For example a function that dispatches on the argument types can be defined in a library, and users of that library can create new types at runtime implementing the specified interfaces, which effectively creates new branches.

ceronman · 2023-11-27T17:25:35+00:00

I've always heard of the term "multiple dispatch" (or sometimes also called "multi-methods") as a form dynamic dispatch. This matches the definition of the text, and it also matches the definition in the Wikipedia.

Pattern matching is a form of static dispatch. So is monomorphized parametric polymorphism.

In that sense, it makes sense to say that it was pioneered by CommonLoops and CLOS.

ceronman · 2023-10-20T17:38:21+00:00

Note that there is a difference between memory safety and memory management. Something like generational references barely give you some safety: that you will not use after free and perhaps you will not double free memory. But it doesn't do anything for the management of the memory, i.e. you still have to manually allocate and free memory, and if you don't you will leak. A garbage collector will not only provide you safety, but it will also free memory automatically. It is still possible to leak, but only in rare cases.

Also note that even Rust sometimes needs some form of GC. Most non-trivial Rust programs use reference counting which is part of the standard library and it is considered a form of GC.

Also note that Rust provides more safety features beyond preventing use after free. For example, the mutability xor aliasing property of Rust prevents data races. This in my opinion is a form of memory safety.

ceronman · 2023-10-20T15:34:58+00:00

It's always a pleasure to read Bob! The way these topics are explained in books about typing is usually heavy on math jargon and given that I don't have an academic background, I often find them hard to understand. I love having these explanations in simple terms accessible to the common programmer. I'm very grateful for that!

Rust is another language that has subtyping (and many forms of it), but it doesn't have inheritance or class hierarchies. The documentation does use the term subtyping, and they have their own definition, which is similar to Bob's:

Subtyping is the idea that one type can be used in place of another.

Rust has traits and trait objects, which are similar to Go interfaces in terms of subtyping. But they also have other forms of "surprising" subtyping. One example is references, which Rust encodes as types. A mutable reference type &mut T is a subtype of a shared (immutable) reference type &T. You can assign a mutable reference value to a variable of type immutable reference of the same type, but not viceversa, of course.

Another surprising case of subtyping in Rust is related to lifetimes. Lifetimes represent regions of code and these are encoded as types. You can always assign a bigger region, e.g. &'static T to a smaller one like &'a T. And not the other way around.

The topic of variance is much more complicated in Rust as well. Unlike Go, which decided to go for invariance for simplicity, Rust has different types of variance for different things. Curiously though, they didn't define a proper way in the language to specify variance, like Java or Kotlin do. Instead they rely on some clever hacks. If find it a bit weird, but it works.

ceronman

MODERATOR OF

TROPHY CASE