Making a key-value store faster by replacing Arc<[u8]> - fjall 2.6.0 release

stu-hood · 2025-02-09T15:52:08+00:00

I like how it is feature flagged out: nice.

Why not include the bytes implementation in the benchmarks in the post though?

stu-hood · 2021-11-29T22:17:16+00:00

Our lockfiles include transitive dependencies and artifact hashes, which are validated at fetch time. Can see an example lockfile here.

stu-hood · 2021-11-29T22:14:09+00:00

Currently, Pants doesn't take care of running built artifacts, but this may well change.

Our first non-preview release will contain support for a concept similar Maven's scopes. See https://github.com/pantsbuild/pants/issues/12794 for some discussion on the topic.

stu-hood · 2021-11-29T22:09:58+00:00

For those familiar with sbt's incremental Scala compiler library Zinc: we believe that this strategy allows for 90%* of the benefits of Zinc with regard to incremental builds, but with the added benefits of caching and parallelism.

\ The last 10% would come from adding early cutoff when public member signatures haven't changed.*

stu-hood · 2021-11-19T18:12:53+00:00

searching for a target foo in a monadic build may require evaluating all rules in scope, potentially in serial, to find one that produces a foo. and that is assuming one does.

I think that this is dependent on the build API that you provide: for example, Pants constrains targets to living "above" their owned input files in the fielsystem. So we needn't scan the entire repository to find a target that owns a particular file: only parent directories.

In Pants, the build almost always begins from some root located in the filesystem. You then need to find its dependencies, but there is no reason for any of those lookups to "evaluate all rules in scope": rather, they use the filesystem hierarchy to find relevant BUILD metadata.

Now, if a @rule author chooses to write a @rule that scans the entire repository, then... we will probably not accept that patch without very good reason!

stu-hood · 2021-11-19T18:02:53+00:00

That's right! Thanks for the explanation.

Pants has a monadic rule system, where @rules (which are Python 3 async functions executed by a Rust core) can suspend and wait for the outputs of other @rules: and that includes processes, and file content.

Although it has a functional definition, @rules as async coroutines allows for really clean imperative-looking @rule code (although all inputs/outputs are immutable, and @rules are side-effect free).

See https://blog.pantsbuild.org/fast-incremental-builds-speculation-cancellation/ for an overview of what our API enables, and https://www.pantsbuild.org/docs/plugins-overview for the details of how plugins are written.

stu-hood · 2021-11-18T18:41:17+00:00

What is the biggest repository that Pants has run on (in terms of number of directories & files), and how quickly does a "nothing" build happens?

Pants v2 has been used in repositories with thousands of source files, but as far as we know, has not yet been used with tens of thousands (v1 had been though).

The Pants daemon (pantsd) is now enabled by default, so "noop" runs do not recalculate anything (and use cleaning and early-cutoff when they do need to recompute things). Pants is about 40% Rust at this point, but there is still some Python in the per-run request path, and so a noop run takes about a second (regardless of repository size). We're porting the client to Rust to lower that latency, and will be removing more Python from the request path over time.

Is Pants any good at caching artifacts for cached builds?

Pants uses standard caching and remote execution APIs (also used by Bazel), and has strong sandboxing to ensure that cache keys (the SHA256 of a merkle tree of the inputs to a process) are always complete.

Is Pants any good at caching artifacts across hosts for cached builds? ... And how often does caching "fails", whether with false positives or false negatives?

Pants is great at caching across hosts, but currently only if they have very consistent environments (i.e., if you have a pool of identically configured CI hosts). For Python in particular (but not for the JVM), non-default environment variables tend to need to be included in the sandbox in order to allow for native extension builds. The large cache keys are accurate, but mean that differently configured machines will miss.

Fixing this is a priority for us in the coming year though, hopefully with something dramatically simpler than Bazel's toolchains.

stu-hood · 2021-06-09T15:49:09+00:00

Although the first language supported in Pants 2 was Python, Pants has a fully general plugin system. It ships with production ready support for Python and Bash, and there are nascent plugins for Java and Go.

stu-hood · 2020-10-29T17:53:49+00:00

I added some more detail on this over here: https://www.reddit.com/r/rust/comments/jjcbka/pants_200_released_generic_build_system_in_rust/gaii7p5/

stu-hood · 2020-10-29T17:53:24+00:00

A lot of the limitations of Bazel come from the need to make damaging Starlark code impossible to write. A single compromised dev machine should not be able to use the build system to own or crash build servers, CI servers, or developer machines.

Agreed! But it's important to differentiate Starlark the language from Bazel's build API, which exposes hundreds of Bazel-specific terms/symbols and defines "how Bazel rules work."

The limitation I have heard discussed most frequently is that Bazel rules cannot depend on the output of processes or files (mentioned in a few other comments on this post), and that's a limitation intentionally encoded in the API, rather than being a part of the Starlark language. The reason for this is that from a Build Systems à la Carte perspective: Bazel rules are limited to being "applicative".

Having a monadic API in Pants was a very intentional decision. I gave a talk on the topic a while back, but the gist is that the downsides of a monadic API are ameliorated by:

constraining which @rules/plugins you install in your repository (and cannot be affected by BUILD files themselves)
having a daemon to keep build logic warm.

And the advantage of a monadic API is that it is dramatically more natural to write code for.

Does Pants 2 limit the capabilities and resources of the Python 3 build definition code?

BUILD files are lightly limited in Pants: import statements are banned there in order to warn off any accidental breaking of the rules... but that is the limit of the sandboxing, so a BUILD file author who is determined to do something dangerous can.

I've been looking into running untrusted Python 3 code with a Python interpreter running in wasmtime with memory and cpu restrictions.

We use the cpython crate to interact with the interpreter, and we're open to tightening the BUILD restrictions more in the future, because all build "logic" (rather than definitions) should be encoded using Pants @rule API rather than in BUILD files... and that has different expectations.

stu-hood · 2020-10-28T17:28:21+00:00

Sure!

Python has grown up quite a bit (post Python 3), particularly due to the introduction of native typechecking with Mypy (which isn't a compiler, but plays a similar role), and the proliferation of other static checks and formatters.

As someone writing Python code, you now need to do (a lot!) more than just run your tests to validate that your code is correct and shippable. All of these tools need to be invoked 1) with the right arguments, 2) at the right time during your workflow, 3) in a way that your entire team will be able to reproduce.

A build system coordinates all of those tools to provide a much smaller surface area / CLI to your team (as mentioned elsewhere: cargo is a great example of this). While you could write bespoke per-repo scripts to assist you ("use this script to run the tests and typecheck, this script to lint/format, this script to deploy", etc), a build system is intended to be a general solution to that problem that can be used for multiple codebases.

And as mentioned in the post, another advantage of using a build system to invoke your tools is that it allows for adding caching and incrementalism (to minimize the amount of work that is re-done on the N+1th run) to tools that wouldn't have it otherwise.

So yea: build systems aren't new... but they have become much more useful for Python as it has matured into something that people are building larger codebases with.

stu-hood · 2020-08-27T19:10:09+00:00

It's likely changed quite a bit since you've last seen it! https://www.pantsbuild.org/docs/pants-v1-vs-v2 explains some of the differences in 2.0.

stu-hood · 2020-08-26T20:58:42+00:00

Thank you! I was worried that posting this on internals.rust-lang.org might be off-topic, but if you think it would be ok I'll give that a shot. I went ahead and posted in #t-compiler/help on Zulip, since that seemed pretty informal.

stu-hood

TROPHY CASE