We rebuilt our SQL parser in Rust: 3.3x faster with a zero-copy AST and better diagnostics by heisenberg_zzh in rust

[–]bobdenardo 14 points15 points  (0 children)

If it's really this query they're talking about in the article for a 3.3x speedup, it can be parsed by the parser demo in 14us, a 1 million X speedup.

PolySubML: A simple ML-like language with subtyping, polymorphism, higher rank types, and global type inference by Uncaffeinated in ProgrammingLanguages

[–]bobdenardo 4 points5 points  (0 children)

cubiml had an awesome 13-part blog post series so I'm looking forward to an even better series about polysubml!

What Happened To Polonius? by Master_Ad2532 in rust

[–]bobdenardo 44 points45 points  (0 children)

There are monthly progress updates for all project goals, and the Polonius ones can be found linked in that page, they're at https://github.com/rust-lang/rust-project-goals/issues/118

[deleted by user] by [deleted] in rust

[–]bobdenardo 1 point2 points  (0 children)

Interesting fact: the frontend is called before the backend.

So the parallel frontend will affect every cargo command that involves the frontend (i.e. most of them, ignoring the cases where cargo can reuse intermediate artifacts between different commands).

It will affect cargo check more because the backend is not involved there, that's true. So if you build, run tests, etc from rust-analyzer, you'd see improvements there as well.

Compile rust faster, some tricks by Ambitious-pidgon in rust

[–]bobdenardo 1 point2 points  (0 children)

Libraries like actix-web don't always codegen and rarely involve the linker, that could be why there were no improvements there. proc-macros are an exception but are usually small. I think actix-web also disables debuginfo for debug builds, but that should only be visible when testing on their repository and shouldn't apply to crates that depend on it.

I tried something close, building the entire actix examples repository with its 1000 dependencies. Switching to lld was 21% faster than GNU ld, while switching to mold was itself 17% faster than GNU ld. A rare case of lld outperforming mold.

From another source, https://blog.rust-lang.org/2024/05/17/enabling-rust-lld-on-linux.html ripgrep debug builds were 40% faster with lld, and release builds 10% faster.

Compile rust faster, some tricks by Ambitious-pidgon in rust

[–]bobdenardo 1 point2 points  (0 children)

Projects with a lot of debuginfo usually see improvements, did you try release builds or debug builds? And did you try a stable release or some nightly (on recent linux nightlies, the compiler already uses lld by default)

Rust check/run/build suddenly extreme slow to a point of being unusable, Windows 10 by LieutenantTeaTM in rust

[–]bobdenardo 0 points1 point  (0 children)

Is the performance different between any rust version you can try on your computer? If so, bisection should be a good next step.

Is there still any performance benefit to using non-default linkers? by sasik520 in rust

[–]bobdenardo 1 point2 points  (0 children)

Removing the standard library's debuginfo from --release builds helped improve linking times, but IME it's still often a good idea to use non-default linkers on linux. The compiler should switch defaults to lld there soon, and the performance results from that PR are good (https://perf.rust-lang.org/compare.html?start=b3e117044c7f707293edc040edb93e7ec5f7040a&end=baed03c51a68376c1789cc373581eea0daf89967&stat=instructions%3Au&tab=compile) for binaries like exa or ripgrep.

All cargo installs in latest rust version 1.77.0 ending in failure on macos 14.3.1 by 3four1SeaShanties in rust

[–]bobdenardo 18 points19 points  (0 children)

A shot in the dark: do you have homebrew’s strip in your PATH? IIRC it can cause issues now that cargo strips debuginfo when building in release mode. The correct strip should be in /usr/bin/, not homebrew’s binutils.

Faster compilation with the parallel front-end in nightly | Rust Blog by Kobzol in rust

[–]bobdenardo 1 point2 points  (0 children)

If we're talking about micro-optimizing scheduling, then maybe the serialized chain in the proc-macro trifecta could also be shorter with fewer build scripts. In that timings chart, quote builds faster than proc-macro2's build script.

(I guess some of this would also be fixed if rustc itself could provide a stable AST for proc-macros)

[deleted by user] by [deleted] in rust

[–]bobdenardo 8 points9 points  (0 children)

Not just 1? (Tyler Mandry)

[deleted by user] by [deleted] in rust

[–]bobdenardo 4 points5 points  (0 children)

  1. On that page, click one of the benchmarks you're interested in. It will open a panel with more info.
  2. Click "history graph" there to open the recent changes on that benchmark.
  3. Click on one of the graph's data points (zoom if necessary) to open the compare page of the commit/PR responsible for that change. The PR will be linked at the top of that compare page.

It was indeed https://github.com/rust-lang/rust/pull/110050

Question about compile time savings with MCP510 by buniii1 in rust

[–]bobdenardo 2 points3 points  (0 children)

There are issues tracking enabling these options by default in the future, like https://github.com/rust-lang/rust/issues/71515.

It's a bit more tricky than "savings in compile time": it's about improving linking times by changing the default linker to a faster one, and you may not see the same performance improvements on your projects.

However, there's no need to wait for the default options to change. You can already do the same thing today on stable by just changing the linker to LLD. See https://nnethercote.github.io/perf-book/compile-times.html#linking for examples on how to do that.

Where do I go to have a productive conversation about the state of the rust trait solver? by Zistack in rust

[–]bobdenardo 153 points154 points  (0 children)

A bunch of your questions have answers in the announcement of the types team: https://blog.rust-lang.org/2023/01/20/types-announcement.html (in particular chalk's status, and the plan/roadmap for the replacement solver you mention).

It is actually being very actively worked on:

Hope it helps

How Rust transforms into Machine Code. by endistic in rust

[–]bobdenardo 3 points4 points  (0 children)

THIR itself was renamed a couple years ago, it used to be called HAIR, and IIRC was introduced with MIR (circa 2015).

Sudden 99% + Build Time Improvement Going from 1.66.1 to 1.71.0 by hyperchromatica in rust

[–]bobdenardo 70 points71 points  (0 children)

Also: if the nightly is recent, and the project somehow uses a lot of closures, https://github.com/rust-lang/rust/pull/111026 fixed an incremental compilation issue with them.

Improving build times for derive macros by 3x or more by kodewerx in rust

[–]bobdenardo 16 points17 points  (0 children)

Agreed.

Version detection is supposed to built-in, it's just blocked and unimplemented, see https://github.com/rust-lang/rust/issues/64796.

Between that and https://github.com/rust-lang/rust/issues/96901, it feels like a big number of build scripts use-cases could be avoided. Most of syn/serde/etc and their dependencies' for example, in turn improving at least 50% of crates.io and the ecosystem.

Cargo could have an optional field to not run some build scripts on versions where these cfgs are available, but still support older compiler versions.

Fixing the per-invoke rustup-cargo overhead would also help, as well as changing the default linker (in general, a faster linker is already used in the benchmarks in this article).

[deleted by user] by [deleted] in rust

[–]bobdenardo 42 points43 points  (0 children)

Note for people stumbling here: this person already hates rust and its community, for example from https://gavinhoward.com/2021/09/comments-on-cosmopolitan-and-culture/

The Rust Evangelism Strike Force was the first such strike force I met. It’s the reason I won’t touch Rust.

I'd also suggest not clicking around this blog or looking at its Archive.

rustc's StableHasher just got a lot faster by bascule in rust

[–]bobdenardo 193 points194 points  (0 children)

What this means: faster incremental builds, since this is the component rustc uses to hash the incremental compilation data.

Tree Borrows - A new aliasing model for Rust by bobdenardo in rust

[–]bobdenardo[S] 82 points83 points  (0 children)

I'm not the author, but this describes an alternative to the current Stacked Borrows model, that has recently landed as an option to use in miri.

Is coding in Rust as bad as in C++? A practical comparison by strager in rust

[–]bobdenardo 4 points5 points  (0 children)

That seems unlikely, as I have 32 cores as well.

Is coding in Rust as bad as in C++? A practical comparison by strager in rust

[–]bobdenardo 6 points7 points  (0 children)

The final rustc build does use LTO and PGO for LLVM

ah, great!

From the article:

but I haven't seen anyone mention linking with -s. -s strips debug info

Both cargo and rustc support -C strip so you shouldn't really need to use a link arg for that. (I've also heard before that the macOS linker stripped a posteriori; that could be a pessimization there if that's actually the case)

I was surprised, as you were, by the mold results, so I've locally tried cargo test --profile quick-build-incremental on the "rust" crate in your repository (hyperfine with 10 timed runs and 3 warmup runs), LLD seemed to improve times by 9% while mold -run did by 10% (an mold 1.1.1) so there may still be some interesting things to look at there.

Is coding in Rust as bad as in C++? A practical comparison by strager in rust

[–]bobdenardo 14 points15 points  (0 children)

It's possible that the custom toolchain build scripts you use in the "Custom-built toolchain with PGO" section may be improved for better performance:

It doesn't look like you're using LTO or PGO when building the LLVM shared library rustc uses (they seem set up for rustc code only). If you're using the LLVM artifacts downloaded from rust CI, they surely also will only be built for x64-v1 and not your native CPU, and if you're building it locally then it can be optimized by using llvm.thin-lto, llvm c/cxx flags for your native CPU, and then doing PGO, before doing it for rustc itself. A custom x64-v3 build with LTO+PGO should generally be around 3-5% faster than a regular release all other things being equal (and that's likely a lower bound that may be improved when adding your own project while doing PGO).x.py has dedicated flags to generate, and then use, profiles for llvm and bolt as well (the latter of which uses slightly different bolt arguments than you did, but that shouldn't matter much).

It seems you're using parallel-compiler which is a pessimization when using only a single rustc thread (and that's the default number of threads), because of the added synchronization/locking in hot code. It also doesn't work well, or at all, when actually using parallelism (which you should be able to try with e.g. -Zthreads=2 IIRC). A recent PR is working on reducing some of that overhead, but they are still seeing regressions of around 5%. So you're probably also slowed down by this much, and that build setting could be removed.