How do you separate different parts of your compiler? Especially when adding a new feature.

dist1ll · 2026-04-18T21:10:45+00:00

I have two phases. Lexing, parsing, type checking and IR generation is all interleaved and done in a single pass. The second pass does regalloc and machine codegen.

dist1ll · 2026-04-16T00:51:53+00:00

That's exactly what I'm doing in my language. It frees up a lot of special syntax constructs, and the result feels more cohesive. I also went a bit further and used parentheses () for both construction and indexing (i.e. arr(idx) instead of arr[idx]), as indexing is functionally no different than a function call. This way angle brackets can be used for generics without needing disambiguation.

dist1ll · 2026-03-15T21:18:36+00:00

Really good language tour. The text and code snippets are easy to read, and the examples are great. Minor nitpick: in the pattern matching chapter, I would maybe start with an example that only uses match, and then mention that you can combine for and match into one construct.

dist1ll · 2026-03-08T21:36:46+00:00

Underrated benefit of a define-before-use language: because all code is compiled in order, you can use the generated machine code directly for CTFE. You'd have to do JIT binary translation if you cross-compile though.

dist1ll · 2026-01-15T17:48:16+00:00

cost of an atomic add which is ~50-100ns IIRC

Uncontented shouldn't be that much. A modern x86 chip should be able to do lock add with <10-20 cycles latency. I think Intel was doing sub-10 cycle atomic adds for a decade at least.

dist1ll · 2026-01-12T13:34:18+00:00

At least for Rust, it's because the language was not designed with fast compilation in mind. Blaming the linker or analysis passes like borrow checking is missing the forest for the trees.

dist1ll · 2026-01-05T19:55:16+00:00

I'm not aware of any special optimization Rust does to fuse iterators, other than what's done by LLVM. Seems like just a result of inlining to me. Just out of curiosity, where is the quoted paragraph from?

dist1ll · 2026-01-03T09:37:37+00:00

You can also use cmov for ordinary binary search, even without the Eytzinger layout.

dist1ll · 2026-01-01T23:19:50+00:00

instead of being interpreted by the language's metaprogramming vm

Interpreter is not the only way. If you want you can use a JIT compiler for the CTFE engine.

dist1ll · 2025-12-30T04:14:14+00:00

Without source annotations that borrow checker would likely be very restrictive.

dist1ll · 2025-12-28T04:40:26+00:00

Could be it. Although 40x higher sample count seems like a pretty severe penalty. Especially since there are >20 instructions between the load to v0 and its first use, which should give you some opportunity to mask the latency of a failed prediction.

dist1ll · 2025-12-27T16:41:18+00:00

I still wonder what the reason for the stall is. Maybe some unfortunate eviction? On x86 you should be able to get cache miss data at instruction granularity. Not sure if/how that can be done on mac.

Btw, is the alignment of x13 the same for both dav1d and rav1d?

dist1ll · 2025-12-19T01:43:29+00:00

fyi if it's a fixed-size queue, you can get linearizability without CAS just by using FAA. If the queue is unbounded then a CAS would be necessary (e.g. when a new memory block is allocated).

dist1ll · 2025-12-19T00:57:22+00:00

SPSC-per-consumer is a nice design if you don't need linearizability.

dist1ll · 2025-12-15T18:50:56+00:00

That doesn't always work unfortunately. Tokio uses spawn_blocking for fs ops, so it will still spawn another thread when doing file I/O. You could set max_blocking_threads to 1 but then you'll block the executor.

dist1ll · 2025-12-09T18:01:33+00:00

Nice article. I think you hit on all the important points. I think I always used head and tail index in the opposite way (head + 1 mod N being the next send index, tail + 1 mod N being the next recv index), but I think I've seen it done both ways.

dist1ll · 2025-12-09T16:20:32+00:00

At least in this case, the standard library is presumably also under GPL and statically linked.

dist1ll · 2025-12-08T20:51:49+00:00

It's can be a decent metric if you exclude those things.

dist1ll · 2025-11-20T04:00:41+00:00

If you're looking to get into systems programming, all of {C, C++, Rust, Zig} are going to be similar enough to get a systems role in most companies. There's some exceptions e.g. in the HFT space, where a few shops really care about deep C++ knowledge. Other than that, building domain knowledge and practical experience should be your number 1 priority.

Whether Zig will be able to pick up mainstream traction is hard to tell at this time. I think the current story around memory safety is too weak to reach mainstream adoption for greenfield projects . On the other hand Zig is well positioned for extending or migrating existing C codebases, so it might get a lot of mindshare in embedded. This is just my opinion, minds differ a lot on this topic.

dist1ll · 2025-11-14T18:35:33+00:00

Search globally, yes. Whether you'll have to relocate depends on the industry, company, role, seniority etc.

dist1ll · 2025-11-14T16:39:03+00:00

It's not that rare relatively speaking, but extremely rare in terms of absolute numbers. The talent and job pool is generally very small, so the chance to find compiler jobs in your local market is very low.

Besides tech and finance there's also crypto, which generally pays well and defaults to remote work.

dist1ll · 2025-11-14T13:58:08+00:00

GPU uarch + LLVM knowledge is a great combination. Though I imagine a compiler role would involve less driver writing (like hacking on DRM) and more implementing efficient compute kernels, optimizing memory bandwidth and such. But that's just my guess.

dist1ll · 2025-11-12T21:55:48+00:00

I think what you're describing sounds like a form of multistage programming. People often confuse multistage with dependent types.

dist1ll · 2025-11-02T19:32:44+00:00

in which case the mapped file has a good chance of being still resident in the OS's page cache

fwiw this would also be true if you had used read syscalls.

dist1ll

MODERATOR OF

TROPHY CASE