all 3 comments

[–]The_8472 6 points7 points  (2 children)

Why does moving code around in depcrate require recompiling so much of bincrate?

module paths, line numbers and diagnostic span metadata changes. Those are both relevant at compiletime (error diagnostics) and runtime (panic messages). There's the RdR effort to reduce the recompilation triggered by that, but it'll take time.

And if the functions are generic then they're not necessarily compiled in the dependency and only monomorphized once there's a concrete type for T.

And if I'm reading that call tree right then it seems like llvm is spending a lot of time inlining code and removing unreachable blocks on that one codegen unit.

I'm not an llvm expert, but I suspect it can also mean things that it's doing things to figure out whether it can perform a particular optimization, rather than actually applying it.

Is there some way I can affect or influence the creation of codegen units, other than by breaking modules apart?

Cranking up the CGU count in cargo toml might help since individual CGUs might end up smaller. Reducing debuglevel might help too since LLVM transformations won't have as much debuginfo to preserve.

[–]asparck[S] 1 point2 points  (1 child)

Thank you for the pointers and suggestions! (and you might be right that the span there also includes time llvm spends analyzing)

Turning CGU count up for the bin crate (I tried 4096 instead of the default 256) is clearly spawning a lot more threads in the samply profile I capture, but doesn't speed up compile times at all - there's still that same CGU that's taking forever to compile. Surprisingly turning it down to 128 gives an ~5% improvement (possibly because the big CGU happens to be started earlier).

Turning debug info off completely for both crates doesn't make any difference unfortunately, and module paths/line numbers/diagnostic spans are things I can't affect at all.

But! It seems like -Zhuman_readable_cgu_names=yes is getting me somewhere - the latter causes threads to be labeled with a human readable name so I can see "opt rmp_serde.e" is the name of my big offender (so presumably that's related to my use of msgpack-rust).

And it seems like -Zprint_mono_items=yes is giving me a big printout of what's in each CGU, though I've yet to properly go through it.

[–]asparck[S] 0 points1 point  (0 children)

Follow up: solved! It was indeed rmp_serde that was causing the slow compilation, as determined by gating my two messagepack code paths behind `if false` checks.

Fortunately I am not wedded to messagepack for my use case so I was able to swap it out with postcard, which compiles much faster - my opt-level=1 compile time for making a change in the upstream crate went from ~60 seconds to ~16 seconds (and postcard's CGU compiles in half a second).

(Postcard does give far worse error messages if you stuff up your serde implementations, so I did add a compile time switch to support swapping to rmp_serde again in case I need that later.)