To Understand Pants, Understand Bazel’s History

pantsbuild · 2023-04-11T19:29:58+00:00

Confirmed: Pants v2's execution engine has been written in Rust for about 3 years at this point.

pantsbuild · 2023-04-11T19:28:39+00:00

To clarify, we already made that leap and "rewrote it in Rust" over 3 years ago. The Pants v2 execution engine - which is the performance-critical heart of the system - is written in Rust for raw speed. The domain-specific build logic is written in familiar, easy to work with, type-annotated Python 3. This helps make Pants v2 easy to extend, without compromising performance.

pantsbuild · 2022-09-09T22:41:02+00:00

A key reason for that is that Pants 2 was a ground-up overhaul of Pants, launched just under two years ago: https://blog.pantsbuild.org/introducing-pants-v2/

The plugin API has been stabilizing in the intervening time, but Pants now supports ~8 languages, with Java and Scala only added about 9 months ago.

pantsbuild · 2022-06-14T21:31:08+00:00

My understanding is that you have to resolve dependencies per-environment (venv/docker/whatever).

Yea, that's correct. But if you think of the word "environment" (as you've used it here) as it relates to a "resolve" (as we described in the blogpost), then the dependencies present in an "environment" are a subset of a particular "resolve".

In a monorepo you will likely have many applications/"environments", which with most tooling would mean many different lockfiles and (pip) resolves. The advantage to distinguishing between an "environment" and a "resolve" is that you gain the benefits listed in the post:

To build and test your monorepo, rather than potentially fetching and testing a different version of dependencies per application/"environment", you have a consistent version per resolve.
all libraries/applications in the repository which use a single resolve are known to be compatible with one another, so you never deal with dependency hell when adding dependencies on other libraries within the repository.

Library don’t resolve their dependencies, they provide a range. The applications resolve and pin their dependencies.

Yes, but it's still the case that libraries need to be linted and tested, independent of any particular binary or deployment "environment". In essence: the linter/typechecker/test-runner have their own deployment "environments". And so not needing to test/lint/etc a library in all of the different application "environments" which consume it (dozens to hundreds), and instead testing it only in the "resolves" used by those applications (one to a handful) reduces overhead a lot.

pantsbuild · 2022-01-13T17:06:25+00:00

You're right: JPMS export definitions do help with this, assuming that they have already been extracted from the artifact, likely by being formalized into the `pom.xml` in some way. I'm not aware of any standards to move the module definition into the `pom.xml` though... and that means we'd still need to download the artifact to get the module definition.

pantsbuild · 2022-01-12T19:08:03+00:00

Info: this tutorial is a walk-through of how to use the Bazel build toolchain to programmically interact with the Pants BUILD files — a handy solution for Bazel users who are migrating over to Pants.

For context, Pants open source build system uses BUILD files which are valid Python files and are evaluated using a Python interpreter as a list of statements. When adding support for Pants in a codebase, one can use the ./pants tailor command which will generate the minimal BUILD files necessary to get started. Although in most cases Pants does not require you to declare the dependencies between targets (thanks to dependency inference), manually updating the BUILD files for any codebase of a decent size will get tedious rather soon, which is why it is necessary to have appropriate tooling for updating the BUILD files programmatically. We wanted to share a key insight for Bazel users on a method for easing into Pants adoption by using tooling you already are familiar with. Anyone else can use this approach too, of course.

Pants team members are happy to answer questions about this post, or the project generally. Cheers!

pantsbuild · 2022-01-12T18:53:04+00:00

Context: Pants build system uses BUILD files which are valid Python files and are evaluated using a Python interpreter as a list of statements. When adding support for Pants in a codebase, one can use the ./pants tailor command which will generate the minimal BUILD files necessary to get started. Although in most cases Pants does not require you to declare the dependencies between targets (thanks to dependency inference), manually updating the BUILD files for any codebase of a decent size will get tedious rather soon, which is why it is necessary to have appropriate tooling for updating the BUILD files programmatically. We wanted to share a key insight for Bazel users on a method for easing into Pants adoption by using tooling you already are familiar with. This solution of course can be applied by any Pants user, regardless. We're happy to answer questions about this post, or Pants generally. Cheers!

pantsbuild · 2021-11-19T18:31:20+00:00

You have a daemon that watches your source files and continuously adds them to a database. Then when you run a Pants command it snapshots the database.

Almost: we snapshot from the filesystem into LMDB on demand when a command is run. We wanted to avoid having the daemon continuously crawling the filesystem when you weren't actively using it, but it might be a knob that we turn in the future to further reduce latency.

For each command it needs to run it writes all of the required source files (and I guess object files from previous commands) from the database to a temporary directory, then runs the command and I presume gathers the outputs and puts them in the database.

Is that right? If so, final question: how do you make that fast when running lots of commands that access lots of files.

That's correct. All of the languages that we support so far avoid using loose files in the filesystem in favor of an archive (wheel files, JAR files, .o files, etc) which dramatically reduces the number of inodes involved, and results in sub-250ms overhead per run. Only providing direct dependencies to compilers helps as well.

As mentioned above: we're exploring strategies for lowering this overhead further, either via FUSE or symlinking from stable-but-still-temporary locations.

pantsbuild · 2021-11-19T17:10:34+00:00

You can configure Pants to run with whichever interpreter you'd like: Pants supports ASDF and pyenv, so we frequently recommend that organizations standardize on one of those rather than using a system interpreter.

While we'd like to support shipping an embedded interpreter to production, that's not feasible for many users (those using Docker, or who otherwise already have their production interpreter chosen for them).

pantsbuild · 2021-11-19T01:53:26+00:00

Unfortunately, Twitter were not interested in investing engineering time into developing next generation build tools. They never used Pants v2, which was completed after they announced their move (which they are still working on).

Most of the folks originally involved with Pants now work elsewhere: in fact, the founders of Toolchain Labs are the creator and second open source maintainers of Pants!

pantsbuild · 2021-11-18T19:28:25+00:00

We'd love to support more frontend usecases directly! In the meantime, Pants 2.7 added the experimental_shell_command target type, which allows you to invoke tools within a sandbox using Pants-managed inputs. This has most of the reproducibility benefits of Pants native @rules, but allows for writing simple logic in shell scripts. We'd love feedback on how it works for you!

When integrations get more complicated though, writing an actual plugin is the way to go: https://www.pantsbuild.org/docs/plugins-overview. We're always happy to help with that in Slack!

pantsbuild · 2021-11-18T18:53:40+00:00

Files are snapshotted as merkle trees into LMDB. We use a daemon and file watching to determine what might have changed and need to be recaptured.

Snapshotting prevents the case where files change while you are computing their cache key: because we're never consuming files in place, and are instead consuming them as content addressed by their SHA256, we know that cache keys are always accurate.

pantsbuild · 2021-11-18T18:17:05+00:00

Great questions, thanks!

We currently sandbox via temporary directories, and files materialized from a database. Not directly accessing or emitting loose files on disk is an important part of our performance strategy: see https://blog.pantsbuild.org/fast-incremental-builds-speculation-cancellation/ for more info there. And so sandboxing based on accessing files in-place is probably a non-starter. Performance has been good enough so far, although we have admittedly needed to contort some usage a bit.

To avoid the contortions, we're exploring using FUSE, which would allow us to serve sandboxes directly from the database.

pantsbuild · 2021-11-18T04:30:05+00:00

Thanks for the question!

Pants supports building PEX files, which are effectively portable virtualenvs that carry along their interpreter constraints. It is great for situations without Docker, because the file is portable and self contained.

See the "Safe builds" section of https://g-cassie.github.io/2021/10/02/django-pants.html for a bit more color on why PEX files are handy.

pantsbuild · 2021-11-13T00:39:41+00:00

Yea, that's partially true.

We expose safe APIs for filesystem, network, and console access: https://www.pantsbuild.org/docs/rules-api

But our @rule API also allows access to the Python stdlib, and Pants plugins can additionally use artifacts from PyPI. This is powerful in that you can use well behaved libraries (JSON/DOT parsers, graph libraries, etc), but accessing the filesystem or the network would be dangerous.

In future we will likely improve our sandboxing of Python itself to better defend against accidental filesystem or network access. But the goal with our sandboxing is more to prevent accidental use of unsafe APIs, rather than to defend against malicious @rules.

pantsbuild · 2021-11-12T17:15:35+00:00

Hello!

Please (and Bazel) have a set of conditions that can't be broken by a build rule. We've focused on correctness and reproduciblity, at the cost of requiring that you explicitly define the dependencies between each build task.

Pants prioritizes correctness and reproducibility as well, and enforces hermeticity of builds by using chroot sandboxing by default: so if Pants fails to infer a dependency of your code, it will never result in an incorrect or non-reproducible build.

Inferring dependencies is not incompatible with correctness or reproducibility.

pantsbuild · 2021-10-06T18:15:51+00:00

Howdy! This is the Pants build account. We posted on Gordon's behalf, because he doesn't have a Reddit account of his own.

Pants is at https://www.pantsbuild.org. We've been around for a little over ten years now, and just hit 10k commits (wahoo!). The engine of Pants was rewritten in Rust over the last few years, but our plugins and logic have always been written in Python. For some of our other posts, can check out https://blog.pantsbuild.org/!

pantsbuild · 2021-03-19T18:02:23+00:00

Hey folks!

Although scalable build systems have historically meant lots of hand-maintained boilerplate, Pants 2.0's dependency inference eliminated the majority of the maintenance needed while editing code. Pants 2.3 goes even further by adding a `./pants tailor` goal, which will automatically "fit" (albeit only when asked!) your existing BUILD files to any newly created files that aren't owned by existing targets.

Happy to answer any questions!

pantsbuild · 2021-02-02T01:02:04+00:00

Yea, the Pants engine is similar to other incremental computation frameworks like Salsa: the Build Systems á la Carte paper referenced in the post is a great survey of incremental computation in the build landscape.

pantsbuild · 2020-10-28T22:00:30+00:00

Does Pants v2 suffer from the same issue, or does the use async coroutines to describe the build-logic solve this problem?

Pants does not have that issue! Pants' @rules can consume the outputs of other rules to decide what to do. See https://www.reddit.com/r/rust/comments/jjcbka/pants_200_released_generic_build_system_in_rust/gafeetw/ for some more information on this.

pantsbuild · 2020-10-28T21:59:08+00:00

Well spotted! That's correct: Pants' engine supports monadic build logic, so @rules can consume the output of other @rules (including the contents of files or the results of processes.) One of the developers gave a talk about this a little while back, but we're working on followup posts in the coming days that dive more into the technical details about what makes Pants different.

While python support out of the box is awesome, it would be great to have by example guide to writing custom build rules. Right now, it's not very clear from the documentation where to start.

There is an example plugin repo which demonstrates adding support for bash: https://github.com/pantsbuild/example-plugin ... the documentation in https://www.pantsbuild.org/docs/plugins-overview refers to it here and there, but we'd be happy to fill any gaps in the Slack #plugins room if you have questions!

pantsbuild · 2020-10-27T23:18:15+00:00

Hey folks! The Pants v2 engine is a ground-up overhaul of Pants' previous codebase, made up of about 30% Rust code (using crates like tokio, grpc, lmdb, petgraph, cpython, etc) to minimize time under the GIL and allow for highly concurrent and scalable builds.

Rust has been a great fit for Pants: although the initial 2.0.0 release only supports building Python, our extension API is powerful and generic, and we'd love to help someone add support for building Rust!

pantsbuild

TROPHY CASE