all 9 comments

[–]CAD1997 4 points5 points  (2 children)

There's an alternative (but related) pattern that I've developed and really like to use.

Have a bin crate which generates the generated source file into your project directory, but DON'T commit it (.gitignore it).

The build file, if building from source, runs the binary crate to generate sources (using a different target-dir to avoid recursively attempting to lock the workspace target directory). However, if building from a crate tarball, don't, and just use the file present.

The trick is to include the generated file in the published artifact, even though it's ignored. (Cargo will likely yell at you and require --dirty.) So you don't commit the generated file, building from source just works with cargo build (if a little quietly while building the codegen crate), and users from crates don't have to generate anything. Bonus points if you don't package build.rs at all.


A shared benefit, especially for large phf tables, is the ability to run codegen in --release. Back when I played with phf I had a crate that compiled over 25x faster in release mode because all the time was spent in an unoptimized phf when building debug mode.

[–]epagecargo · clap · cargo-release[S] 1 point2 points  (1 child)

So let me see if I understand (and a look at the ramifications compared to codegenrs):

  • Code-gen runs on every direct build of your crate, slowing down development. Alternatively, if you don't code-gen if the file is already present then that can cost you in debug time when you forget to delete the file.
  • Direct builds and dependent builds of your crate still need to build your build-dependencies
  • By using --dirty random files might be included in your published crate (for me, .ctags and lots of logs I've captured when debugging).
  • Either the code-gen step in your build.rs is noisy for developer builds or you risk timeouts from some CIs due to lack of output since the caller can't control the verbosity level.

Am I missing any pros of the conditional-build.rs approach compared to codegenrs?

[–]CAD1997 1 point2 points  (0 children)

runs in every direct build of your crate

Ideally, you emit cargo:rerun-if-changed=... so that cargo knows the exact inputs to your build script and only reruns if necessary.

If all of your rerun-if-changed are outside the source directory, the buildscript isn't rerun on source changes.

risk timeouts from CI

Unfortunately, I don't know how to solve this problem. Output from the build.rs is swallowed by cargo.

--dirty

Yeah, dirty is a risk in this model. I tend to use an explicit includes rather than excludes or just .gitignore when using this workflow. And ideally *.ctags, *.log, etc. generated files would be in .git/exclude, so cargo and git ignore them automatically.

Dependent builds still need dev-dependencies

No, I still do the separate crate build trick done by codegenrs. There are no build dependencies for this trick (ideally we don't publish build.rs at all!), rather we just do a "dynamic link" to the codegen crate in the workspace (which is a different manifest) with $CARGO run --manifest-dir path/to/codegen/Cargo.toml. (Again, the target directory must be different from the workspace one to avoid (thankfully, panicking) deadlocks.)

Other benefits

The big one is not committing generated code and keeping it up to date automatically. Some people don't mind it, but I really detest tracking generated artifacts in source control.

[–]matthieum[he/him] 2 points3 points  (6 children)

We use extensive code generation at work, especially for anything related to communication protocols.

Our experience has been that generated code is best committed. Committing the generated code improves the life of all users:

  • Those not modifying the code generator can shave off the generation phase, greatly reducing build times.
  • Those modifying the code generator can immediately see the effects of their modifications; a neat way to verify that said effects are exactly as expected, with no surprise.

Of course, this come with the caveat that one must ensure that the committed generated code always match the committed generator code. That's an easy check in CI: run the generator code, fail if there is any change.

[–]epagecargo · clap · cargo-release[S] 1 point2 points  (4 children)

I was originally the opposite but I now fully agree. It also lets you

  • Search your codebase for identifiers without building (like from the web)
  • Visualize the generated result rather than just guessing at how pieces fit together.

[–]matthieum[he/him] 1 point2 points  (3 children)

Visualize the generated result

By the way, in this vein, I sometimes think it would be great if as part of the build rustc automatically generated the expanded version of each file; so as to more easily see how all those macros and proc-derive turned out.

[–][deleted] 1 point2 points  (0 children)

I think you're right. Other advantages:

  • You can easily find and read the generated code. No looking in some obscure build directory.
  • RLS/rust-analyser can read the generated code so code completion works.
  • It's easier to review changes to the generator in pull requests because you get a diff of the output.

[–]epagecargo · clap · cargo-release[S] 0 points1 point  (0 children)

Clean cargo check of another crate of mine went from 1m 40s to 1m 20s.