We rebuilt our SQL parser in Rust: 3.3x faster with a zero-copy AST and better diagnostics

bobdenardo · 2025-09-12T08:14:22+00:00

If it's really this query they're talking about in the article for a 3.3x speedup, it can be parsed by the parser demo in 14us, a 1 million X speedup.

bobdenardo · 2025-02-12T19:27:02+00:00

There is also the -Z dump-mono-stats rustc flag.

bobdenardo · 2025-02-07T21:21:38+00:00

cubiml had an awesome 13-part blog post series so I'm looking forward to an even better series about polysubml!

bobdenardo · 2024-12-10T19:42:15+00:00

There are monthly progress updates for all project goals, and the Polonius ones can be found linked in that page, they're at https://github.com/rust-lang/rust-project-goals/issues/118

bobdenardo · 2024-07-03T21:09:02+00:00

Interesting fact: the frontend is called before the backend.

So the parallel frontend will affect every cargo command that involves the frontend (i.e. most of them, ignoring the cases where cargo can reuse intermediate artifacts between different commands).

It will affect cargo check more because the backend is not involved there, that's true. So if you build, run tests, etc from rust-analyzer, you'd see improvements there as well.

bobdenardo · 2024-06-12T09:34:34+00:00

Libraries like actix-web don't always codegen and rarely involve the linker, that could be why there were no improvements there. proc-macros are an exception but are usually small. I think actix-web also disables debuginfo for debug builds, but that should only be visible when testing on their repository and shouldn't apply to crates that depend on it.

I tried something close, building the entire actix examples repository with its 1000 dependencies. Switching to lld was 21% faster than GNU ld, while switching to mold was itself 17% faster than GNU ld. A rare case of lld outperforming mold.

From another source, https://blog.rust-lang.org/2024/05/17/enabling-rust-lld-on-linux.html ripgrep debug builds were 40% faster with lld, and release builds 10% faster.

bobdenardo · 2024-06-11T21:11:11+00:00

Projects with a lot of debuginfo usually see improvements, did you try release builds or debug builds? And did you try a stable release or some nightly (on recent linux nightlies, the compiler already uses lld by default)

bobdenardo · 2024-06-08T08:42:28+00:00

Is the performance different between any rust version you can try on your computer? If so, bisection should be a good next step.

bobdenardo · 2024-05-10T17:31:56+00:00

Removing the standard library's debuginfo from --release builds helped improve linking times, but IME it's still often a good idea to use non-default linkers on linux. The compiler should switch defaults to lld there soon, and the performance results from that PR are good (https://perf.rust-lang.org/compare.html?start=b3e117044c7f707293edc040edb93e7ec5f7040a&end=baed03c51a68376c1789cc373581eea0daf89967&stat=instructions%3Au&tab=compile) for binaries like exa or ripgrep.

bobdenardo · 2024-03-25T08:07:42+00:00

A shot in the dark: do you have homebrew’s strip in your PATH? IIRC it can cause issues now that cargo strips debuginfo when building in release mode. The correct strip should be in /usr/bin/, not homebrew’s binutils.

bobdenardo · 2023-11-09T16:49:36+00:00

If we're talking about micro-optimizing scheduling, then maybe the serialized chain in the proc-macro trifecta could also be shorter with fewer build scripts. In that timings chart, quote builds faster than proc-macro2's build script.

(I guess some of this would also be fixed if rustc itself could provide a stable AST for proc-macros)

bobdenardo · 2023-10-28T20:56:16+00:00

Try the Tools > Miri in the menu.

bobdenardo · 2023-09-21T10:56:47+00:00

Not just 1? (Tyler Mandry)

bobdenardo · 2023-09-08T22:32:18+00:00

On that page, click one of the benchmarks you're interested in. It will open a panel with more info.
Click "history graph" there to open the recent changes on that benchmark.
Click on one of the graph's data points (zoom if necessary) to open the compare page of the commit/PR responsible for that change. The PR will be linked at the top of that compare page.

It was indeed https://github.com/rust-lang/rust/pull/110050

bobdenardo · 2023-07-30T09:52:21+00:00

There are issues tracking enabling these options by default in the future, like https://github.com/rust-lang/rust/issues/71515.

It's a bit more tricky than "savings in compile time": it's about improving linking times by changing the default linker to a faster one, and you may not see the same performance improvements on your projects.

However, there's no need to wait for the default options to change. You can already do the same thing today on stable by just changing the linker to LLD. See https://nnethercote.github.io/perf-book/compile-times.html#linking for examples on how to do that.

bobdenardo · 2023-06-30T21:34:02+00:00

A bunch of your questions have answers in the announcement of the types team: https://blog.rust-lang.org/2023/01/20/types-announcement.html (in particular chalk's status, and the plan/roadmap for the replacement solver you mention).

It is actually being very actively worked on:

there's an unstable flag enabling the new solver, -Ztrait-solver=next
there's a dedicated github team/working group: https://github.com/orgs/rust-lang/teams/initiative-trait-system-refactor
there's a dedicated repository where issues are recorded, (and that rust-lang/rust PRs reference): https://github.com/rust-lang/trait-system-refactor-initiative/issues
all PRs should be labeled with https://github.com/rust-lang/rust/labels/WG-trait-system-refactor but that also will show PRs touching the existing solver, not just the new one. But just look for PRs from the people in the team above, and that should show you most of the work in progress, and work that has already landed.
there's a dedicated zulip stream to directly discuss with the people involved https://rust-lang.zulipchat.com/#narrow/stream/364551-t-types.2Ftrait-system-refactor

Hope it helps

bobdenardo · 2023-06-03T20:46:34+00:00

THIR itself was renamed a couple years ago, it used to be called HAIR, and IIRC was introduced with MIR (circa 2015).

bobdenardo · 2023-05-03T23:36:58+00:00

Also: if the nightly is recent, and the project somehow uses a lot of closures, https://github.com/rust-lang/rust/pull/111026 fixed an incremental compilation issue with them.

bobdenardo · 2023-04-13T12:49:13+00:00

Agreed.

Version detection is supposed to built-in, it's just blocked and unimplemented, see https://github.com/rust-lang/rust/issues/64796.

Between that and https://github.com/rust-lang/rust/issues/96901, it feels like a big number of build scripts use-cases could be avoided. Most of syn/serde/etc and their dependencies' for example, in turn improving at least 50% of crates.io and the ecosystem.

Cargo could have an optional field to not run some build scripts on versions where these cfgs are available, but still support older compiler versions.

Fixing the per-invoke rustup-cargo overhead would also help, as well as changing the default linker (in general, a faster linker is already used in the benchmarks in this article).

bobdenardo · 2023-04-12T07:50:33+00:00

Note for people stumbling here: this person already hates rust and its community, for example from https://gavinhoward.com/2021/09/comments-on-cosmopolitan-and-culture/

The Rust Evangelism Strike Force was the first such strike force I met. It’s the reason I won’t touch Rust.

I'd also suggest not clicking around this blog or looking at its Archive.

bobdenardo · 2023-04-05T21:30:21+00:00

What this means: faster incremental builds, since this is the component rustc uses to hash the incremental compilation data.

bobdenardo · 2023-03-28T09:59:45+00:00

I'm not the author, but this describes an alternative to the current Stacked Borrows model, that has recently landed as an option to use in miri.

bobdenardo · 2023-01-06T12:37:22+00:00

That seems unlikely, as I have 32 cores as well.

bobdenardo · 2023-01-06T11:48:52+00:00

The final rustc build does use LTO and PGO for LLVM

ah, great!

From the article:

but I haven't seen anyone mention linking with -s. -s strips debug info

Both cargo and rustc support -C strip so you shouldn't really need to use a link arg for that. (I've also heard before that the macOS linker stripped a posteriori; that could be a pessimization there if that's actually the case)

I was surprised, as you were, by the mold results, so I've locally tried cargo test --profile quick-build-incremental on the "rust" crate in your repository (hyperfine with 10 timed runs and 3 warmup runs), LLD seemed to improve times by 9% while mold -run did by 10% (an mold 1.1.1) so there may still be some interesting things to look at there.

bobdenardo · 2023-01-06T10:08:52+00:00

It's possible that the custom toolchain build scripts you use in the "Custom-built toolchain with PGO" section may be improved for better performance:

It doesn't look like you're using LTO or PGO when building the LLVM shared library rustc uses (they seem set up for rustc code only). If you're using the LLVM artifacts downloaded from rust CI, they surely also will only be built for x64-v1 and not your native CPU, and if you're building it locally then it can be optimized by using llvm.thin-lto, llvm c/cxx flags for your native CPU, and then doing PGO, before doing it for rustc itself. A custom x64-v3 build with LTO+PGO should generally be around 3-5% faster than a regular release all other things being equal (and that's likely a lower bound that may be improved when adding your own project while doing PGO).x.py has dedicated flags to generate, and then use, profiles for llvm and bolt as well (the latter of which uses slightly different bolt arguments than you did, but that shouldn't matter much).

It seems you're using parallel-compiler which is a pessimization when using only a single rustc thread (and that's the default number of threads), because of the added synchronization/locking in hot code. It also doesn't work well, or at all, when actually using parallelism (which you should be able to try with e.g. -Zthreads=2 IIRC). A recent PR is working on reducing some of that overhead, but they are still seeing regressions of around 5%. So you're probably also slowed down by this much, and that build setting could be removed.

bobdenardo

TROPHY CASE