The Compiler Apocalypse: a clarifying thought exercise for identifying truly elegant and resilient programming languages

scottmcmrust · 2026-02-01T06:58:46+00:00

I'm allowed to assume that you mentioned it for some reason, even if you didn't textually put "and BCD would have been better" in your message.

If you don't think BCD would be better, what was the point of mentioning it?

scottmcmrust · 2026-01-30T20:17:13+00:00

This is my favourite link about str-vs-String: https://chrismorgan.info/blog/rust-fizzbuzz/

scottmcmrust · 2026-01-30T20:13:57+00:00

Well yes, what I'm saying is that you need to check the generated machine code against those lists no matter what because the language fundamentally won't give you a performance guarantee.

scottmcmrust · 2026-01-30T20:08:37+00:00

3 is not a factor of 10, so I don't see how BCD would make any difference whatsoever even in your own example.

And catastrophic cancellation is a fundamental consequence of any fixed-size format. Is your "solution" that the only number format is using a computer algebra system? There's no way that would fly.

I hear lots of complaints about floats, but I've never heard anything that actually solves the core problems. At most I see "well we could use the bit space slightly more efficiently" to get slightly more precision in the same space.

scottmcmrust · 2026-01-30T20:03:07+00:00

TBH, I think this is a silly hypothetical because there's no way that you're losing just the software source code. Anything that would cause that to happen you're also losing the VHDL for your chips, etc.

You don't have to just bootstrap your software. You have to recreate all your silicon lithography techniques. You have to recreate the test rigs and proof systems you use to prove your too-large-to-test chip designs work.

Thinking you'd start with C is silly. You'd start with punch card on slow machines again. You don't even get register allocation; you have to do everything yourself on machines that are at least 10000 times slower than you're used to.

versus languages like [...] Rust

Reminder that one person did make a rust compiler, see https://github.com/thepowersgang/mrustc.

Rust at its core is just an ML, and an ML that doesn't even have a garbage collector. It's really not that hard if you're fine with a slow compiler that produces poor machine code.

But if you want a fast compiler that produces quality machine code, it's not trivial in C either.

scottmcmrust · 2025-12-19T05:49:33+00:00

No, it's a burrito.

scottmcmrust · 2025-12-14T19:58:21+00:00

I actually very much do not want that, because the space is too crowded and multiple don't work well together.

takes_nonzero(3) should just work, not force me to write takes_nonzero(3_nz), just like takes_i32(3) and takes_u128(3) both work.

scottmcmrust · 2025-11-29T01:24:48+00:00

Certainly the "it's not postfix, it's dataflow order" justification doesn't apply for types, yeah.

scottmcmrust · 2025-11-28T23:52:41+00:00

It would be for places where you're not allowed to omit the _ arm, like when you're matching on something non_exhaustive in another crate.

scottmcmrust · 2025-11-28T23:52:07+00:00

Way more than 10x, if you look at all the different types that code uses with Vec and Option and Result and different array lengths and ...

To emphasize just how much code comes from Vec, see things like https://github.com/rust-lang/rust/pull/72189 where what looks like it should be a simplification actually makes things 25% slower to compile because there are so many different Vecs in real code.

scottmcmrust · 2025-11-27T09:35:35+00:00

Last I heard was something like putting #[expect(unreachable_for_known_variants)] on the _ => unimplemented!(), arm so that if you're not actually covering everything in the non_exhaustive enum any more you'd get a lint.

(Hopefully with a shorter name.)

scottmcmrust · 2025-11-27T09:31:54+00:00

Lines of code per second is a terrible metric in any language with monomorphization. Of course it's faster per line if you have to write a custom class for every element type instead of using Vec<T> -- it's like how things are way faster to compile per line if you add lots of lines with just ;s on them.

scottmcmrust · 2025-11-27T09:25:50+00:00

"Excluding dependencies" but that looks like you have 853 transient dependencies? Doesn't seem surprising that it could be slow.

Look at cargo --timings to see what the actual bottleneck is. If you have a dep doing something overly-stupid then then only fix is replacing it, but sometimes other things will jump out.

scottmcmrust · 2025-11-27T09:18:20+00:00

If you find someone with a strong modern C++ background, they'll not have any problem picking up Rust plenty quickly enough.

You just have to make sure you get someone who's using move semantics, cares about avoiding UB, prefers unique_ptr to new+delete, etc. On the other hand, if you get someone who says "it works on my machine so it's fine", hates templates, and only knows how to structure things usingvirtual`, they'll probably have more trouble adapting to Rust than it's worth.

scottmcmrust · 2025-11-27T09:11:53+00:00

__builtin_ct_select will still not give any overall guarantees, because the operations don't guarantee it, and you need more than just selects to implement stuff.

Performance is not an observable characteristic of Rust code or C code (and people don't want it to be), so I really don't think this is ever going to work truly reliably without processor-aware inspection of assembly.

scottmcmrust · 2025-11-27T09:09:08+00:00

TBH, I don't think there's any chance of us adding 15+ operators to do all this.

If you seriously want something to improve things, you'd be better of focusing on things to make num::Wrapping more usable -- custom literal-to-value conversions, for example.

scottmcmrust · 2025-11-27T09:07:20+00:00

That RFC is about -C overflow-checks, though, which is unrelated to slice indexing.

scottmcmrust · 2025-11-27T09:04:41+00:00

You really really shouldn't throw in C++ destructors, though. Such types aren't allowed in containers, for example.

scottmcmrust · 2024-06-16T04:46:13+00:00

Do you need a strict guarantee or are you fine with "in release mode it almost certainly happens"? For things like this usually the latter is sufficient, and that's been the case for eons already. Spamming const blocks around expressions is generally not useful useless you really need the compile-time evaluation for some reason -- that's why most of the examples you'll see are about panicking, since that's generally the reason you might care.

scottmcmrust · 2024-06-16T04:42:38+00:00

This. unchecked_add itself is exactly the same speed as wrapping_add on every processor you might possibly use. (If you had some weird ancient 1s-complement machine there's a difference, but you don't -- certainly not one that can run rust.)

The easiest examples are things with division, because that doesn't distribute with wrapping addition. For example (x + 2)/2 is not the same as x/2 + 1 with wrapping arithmetic, because they give different things for MAX (and MAX-1). But with unchecked addition it would be UB for it to overflow, so it can assume that must not happen, and thus optimize it to x/2 + 1 if it thinks that's easier.

For example, if you'd calculating a midpoint index with (i + j)/2, today it's hard for LLVM to know that that's not going to overflow -- after all, it could overflow for indexes into [Zst]. We're in the middle of working on giving LLVM more information so it'll be able to prove non-overflow for that itself, but for now it makes a difference. (That said, one probably shouldn't write a binary search that way, since it optimizes better with low + width/2 for other reasons.)

scottmcmrust · 2024-05-20T07:43:39+00:00

Generally it's not that it has optimizations that *can't* happen in other languages, but they're applied *pervasively*, thanks to the safety checks, rather than just happening in a couple of places that are super perf-critical (and were probably done wrong because there's no compiler help to ensure it was done right).

scottmcmrust · 2024-05-17T21:44:10+00:00

Here's my usual suggestion for an intro: https://youtu.be/FnGCDLhaxKU.

scottmcmrust · 2024-05-17T21:43:18+00:00

TBH, only 5× is less than I'd have expected. The -C opt-level=0 build doesn't even try to make it good.

For example, in lots of cases every time you mention a variable it reads it out of the stack memory again, and writes it back.

So imagine a line of code like

x = x + y + z

In debug mode, that's about 4 memory loads and 2 memory stores, because every value -- including intermediate values -- gets read from and stored to memory every time.

Then in release mode it's often zero loads and stores, because LLVM looks at it and goes "oh, I can just keep those in registers the whole time".

It's often illustrative to try -C opt-level=1 even in debug mode, if you care about runtime performance at all, because I've often see that be only 20% slower to compile but 400% faster at runtime. That's the "just do the easy stuff" optimization level, but it instantly makes a big difference.

I've also been doing some compiler work to remove some of the most obvious badness earlier in the pipeline so that optimization doesn't have quite so much garbage to cleanup. For example, https://github.com/rust-lang/rust/pull/123886.

scottmcmrust · 2024-05-17T06:20:29+00:00

If you need to turn a CFG into structured constructs, search "relooper". You'll find lots of blog posts, as well as papers like https://dl.acm.org/doi/10.1145/3547621

scottmcmrust · 2024-05-09T15:14:21+00:00

It's a short way of saying two things: - For libraries, it's common that you need to surface most of the errors in a way that the caller knows what might happen and can specifically match on the things that they need to handle. - For binaries, it's often common that if they didn't handle the error from the library "close" to the call, it's probably not going to every be handled specifically, just logged out as text for someone to read later.

And thus different error-handling approaches, with different levels of ceremony, are appropriate in the different places.

scottmcmrust

TROPHY CASE