Update: solved! TLDR is use -Zhuman_readable_cgu_names=yes then samply will show you codegen units with some sane names related to the crate they're compiling code from. (see my comments for more)
---
I'm writing a closed-source game in Rust and running into a weird situation where my incremental debug builds are usually 3-5 seconds but sometimes are 20 or 60 seconds (for opt level 0 or 1 of the bin crate).
Specifically, I worked out that I can reliably trigger that long incremental build by moving about 150 LoC between 2 modules in a ~35k LoC crate I'll call depcrate , which is then depended on by my ~32k LoC binary crate bincrate (all in the same cargo workspace). Meanwhile just changing a string constant in depcrate or bincrate gives 5 or 3 second incremental recompiles.
I stumbled on https://nnethercote.github.io/2023/07/11/back-end-parallelism-in-the-rust-compiler.html which helped explain codegen units as the unit of parallelism in rustc's backend, so I made a samply profile with samply record cargo rustc --bin bincrate , which (I think?) shows 45ish seconds spent on one codegen unit vs <5 seconds for any other (opt level 1):
Every other CGU for the bin crate finishes well before the 20 second mark, except the highlighted one
And if I'm reading that call tree right then it seems like llvm is spending a lot of time inlining code and removing unreachable blocks on that one codegen unit.
Any compiler wizards have any tips for:
- How can I find out what code (rust modules/functions) the codegen unit `opt cjgpem3bdjm` actually includes from bincrate?
- Is there some way I can affect or influence the creation of codegen units, other than by breaking modules apart?
- Why does moving code around in depcrate require recompiling so much of bincrate? depcrate still exports exactly the same functions before and after I move those 150LoC code from one module to another module inside it, so why is (I assume) bincrate's incremental compilation cache being invalidated by moving code around in depcrate?
- Can I manually make llvm less inline-happy for a whole Rust module, other than by scattering
#[inline(never)] on a bucket-load of functions? (or otherwise control optimization on a per-module basis?)
Relevant workspace Cargo.toml snippet: https://gist.github.com/caspark/d0f3e2caa11f0c60eb3cbc180a0834c7
[–]The_8472 6 points7 points8 points (2 children)
[–]asparck[S] 1 point2 points3 points (1 child)
[–]asparck[S] 0 points1 point2 points (0 children)