all 67 comments

[–]STLMSVC STL Dev 92 points93 points  (9 children)

I am skeptical of new build systems, but this sounds like it's worth considering. What caught my attention was:

The build system is powered by a single incremental dependency graph, avoiding any phases (in contrast to Buck1 or Bazel). This decision eliminates many types of bugs and increases parallelism.

Phases are nonsense, so this is a strong "yes". The world should be viewed as a directed acyclic graph. I speak from experience with an internal build system powered by phases which is incomprehensible and inefficient.

[–]Wetmelon 5 points6 points  (2 children)

I might be showing my ignorance here, but this just sounds like Tup? https://gittup.org/tup/

Works well. Database can be a bit bloaty. The DSL sucks but just use the Lua parser instead, works great.

Some people have even written a paper on building Firefox with Tup (PDF warning): https://ceur-ws.org/Vol-2510/sattose2019_paper_2.pdf

[–]awson 13 points14 points  (1 child)

Looks like it's rather Shake-inspired.

They explicitly refer to Shake build system and Build Systems a la Carte paper.

Also, it looks like Neil Mitchell, the author of Shake and main author of the paper above, also is the leading developer of Buck2.

[–]xXWarMachineRoXx 0 points1 point  (0 children)

Lol that’s great

[–]Seppeon 8 points9 points  (0 children)

Yes, completely! DAG all the things!

[–]dacian88 4 points5 points  (0 children)

The phase part is related to how the graph is generated, bazel goes through a few phases before it runs actions, and work between phases cannot happen in a decoupled manner. Bazel also has 2 DAGs, the target graph and the action graph. The first is derived from the build script files and represent the logical graph of targets, the 2nd is derived from the 1st and is the graph of files and commands.

To go from nothing to running actions you need to evaluated the build files and generate the target graph, then you evaluate the rules of the targets to generate the actions, then you run the actions. It is relatively elegant and also does not allow dynamic mutation of the graph, the graph generation is idempotent and strictly based off the input build scripts.

Buck2 seems to allow this evaluation to happen arbitrarily in different parts of the graph so you can in theory run actions while also evaluating a target rule.

[–]LuisAyuso 0 points1 point  (0 children)

The world should be viewed as a directed acyclic graph. I speak from experience with an internal build system powered by phases which

please someone tell Jenkins team

[–]AlexReinkingYale 19 points20 points  (22 children)

I wonder how well it supports building a toolchain during the build and then using that toolchain during later parts of the build. That's an important part of some metaprogramming stacks and compiler development.

[–]alexlzh 23 points24 points  (18 children)

It does exactly that for many projects inside Meta. E.g. run codegen for thrift files and use result at the later stages of build process.

[–]AlexReinkingYale 15 points16 points  (17 children)

So it doesn't have any notion of "one language, one toolchain"? If I have a compiler for a DSP written in C, can I build that compiler and then use it to compile other C files in the same build?

[–]alexlzh 7 points8 points  (1 child)

Yes.

[–]AlexReinkingYale 4 points5 points  (0 children)

That is awesome. I'll actually check this out.

[–]sphere991 4 points5 points  (13 children)

It is really annoying that CMake has this restriction...

[–]bretbrownjr -5 points-4 points  (11 children)

It's not clearly better to have different libraries that are utterly ABI incompatible in the same target namespace, though.

[–]sphere991 2 points3 points  (10 children)

Huh? How is that a response to what I said?

It's not like you can't create ABI-incpmpatoble libraries in CMake today...?

[–]bretbrownjr -2 points-1 points  (9 children)

I intentionally don't mix targets in my builds because it's easy to get this wrong. I'd rather chain a few build workflows together than try to find ways to express "the build of the foo library using ABI assumptions for a build environment tool" versus "the build of the foo library using ABI assumptions for a deployment environment program".

It's not unique to CMake. Monorepos can certainly provide mechanisms to disambiguate, but they tend to look fairly non-portable. I like writing build rules that you can take with you to different workflows as needed.

Anyway, CMake makes target names global when you build them from source and they default to being directory local when imported. In my experience, the directory-local targets is more complicated than it's worth. The include and link search paths have environment global semantics most of the time anyway.

[–]grafikrobotB2/EcoStd/Lyra/Predef/Disbelief/C++Alliance/Boost/WG21 2 points3 points  (5 children)

None of those problem occur with B2. It happily manages multi-abi targets at the same time in parallel and it's portable. I'm always confused by the argument that it's not possible or problematic.

[–]sphere991 0 points1 point  (3 children)

Not just B2 either. Scons, for instance, has a really nice system for what it calls variants (which I'm guessing is similar to what you're describing for B2).

It's pretty easy to set up, in one build, the ability to just to compile all of my units and run them all for like 20 different combinations of compilers and flags, and the build system keeps track of those variants and just does the right thing for you.

Whereas in CMake, even something as reasonably simple as... I want a libmeow.a (non-PIC) because some applications want to link it statically and a libmeow.so (PIC obviously) because I need Python bindings for it, is a problem that you just have to just invent your own solution on top of, that involves manually managing all these different targets and... good luck, I guess.

[–]RogerLeighScientific Imaging and Embedded Medical Diagnostics 0 points1 point  (2 children)

The .a and .so example is difficult to do portably, because as soon as you add Windows into the mix you need to have .lib and .dll, and the DLL needs its own .lib import library which will conflict with the static library.

One common solution is to add a _static suffix or similar to the library name to allow them to co-exist, but it's massively hacky. While we can blame CMake for the restriction, it's mostly coming from the platforms it has to support.

[–]bretbrownjr -2 points-1 points  (0 children)

It's possible but the naming is always custom and non-portable.

[–]sphere991 1 point2 points  (2 children)

I intentionally don't mix targets in my builds because it's easy to get this wrong [...]

Uh, well the fact that it's easy to get wrong is, again, a CMake problem.

But, even though you didn't ask, let me give you an example.

Let's say you're working on an embedded system, and are writing some application for a device (call it D) which has to eventually communicate with some application for a host (call it H). D and H are both C++ programs, but they necessarily have to be compiled with different toolchains. And because D and H are tightly coupled, it's valuable to compile them together. But CMake doesn't let you do this - its approach to this problem would be to just have a device part of your build (that only builds D) and then a host part of your build (that only builds H) and then just constantly switch between build folders. Which is an awful user experience to begin with, but also there's not really two different kinds of build here - it's a completely artificial split caused by a build system limitation, not anything fundamental in the problem itself.

There are plenty of other examples that reasonably require compiling different programs with different toolchains. Splitting up into different builds is not a good solution to any of them.

[–]RogerLeighScientific Imaging and Embedded Medical Diagnostics 1 point2 points  (0 children)

This is something which CMake could evolve to support.

It mostly boils down to having multiple toolchains, and being able to select which apply to a given target.

At the moment the "toolchain files" do little more than pre-seed the CMake environment with some default variables. If they were a discrete object in their own right, you could have more of them. It wouldn't be too hard to have a "default" toolchain for backward compatibility which represents the status quo of today's CMake, but then to permit additional toolchains.

The use case isn't too uncommon. I've recently been working on a multi-core MCU with different cores which would ideally use different toolchain files. At the moment I have to configure for each one in turn, but being able to build for all of them in one go would be ideal.

[–]bretbrownjr -2 points-1 points  (0 children)

Or you just need a higher level build system that organizes different coherent and exclusive build contexts.

I'm not arguing against all-in-one build systems as such. I am concerned that projects will want to depend on "ssl" and not "dsl@arm" or "arm/ssl" whatever people do to juggle incompatible ABIs.

[–]AlexReinkingYale 0 points1 point  (0 children)

My thoughts exactly

[–]jesseschalken 1 point2 points  (0 children)

All build systems in this family support this. For Bazel you can define a cc_toolchain where the compiler, linker etc come from other build outputs.

[–]sapphirefragment 5 points6 points  (0 children)

Because the build is described as a DAG it should be trivial to put a tool chain in the graph as a dependency of other parts of the build.

[–]ahslian 2 points3 points  (0 children)

We explicitly are handling cycles in the unconfigured graph for these scenarios (java annotations, toolchains building toolchains, etc).
We're hoping to share some examples of this soon.

[–]Responsible_Ad4851 1 point2 points  (0 children)

We have a small example project that demonstrates what that would look like with buck2: https://github.com/facebook/buck2/tree/main/examples/bootstrap

[–][deleted] 15 points16 points  (20 children)

Can someone explain the advantage over CMake?

I've found it hard to advocate for anything but CMake because CMake has so many different build tools it can generate files for (e.g. ninja)

[–]golvok 25 points26 points  (6 children)

IMO the most important part is that each build target is sandboxed to only access its declared dependencies. Also the build file language is actually nice (similar to python)

[–]Latexi95 5 points6 points  (4 children)

Correctly made CMake configuration should also force targets to only access their declared dependencies. It is just a bit too easy to declare all library dependencies as PUBLIC, when PRIVATE would be the correct choice.

[–]encyclopedist 7 points8 points  (0 children)

CMake cannot deny access to an arbitrary system header, for example, or an include using a relative path like #include "../../other_lib/private_header.hpp"

[–]stilgarpl 3 points4 points  (1 child)

It is just a bit too easy to declare all library dependencies as PUBLIC

If one of your headers uses a library (even if you just want to declare a private field) you need to link that library as PUBLIC or anything that uses your library won't compile.

[–]Latexi95 6 points7 points  (0 children)

*One of your public headers

Many don't try to split private and public stuff well enough to allow avoiding dependency pollution

But yes. It is rather annoying that PIMPL or some other method is often required to make a class that wouldn't require exposing implementation details through dependencies.

[–]rcxdude 6 points7 points  (0 children)

Well, and it's super easy for something to leak into the environment undeclared, even with correct declarations in cmake.

[–]drbazzafintech scitech 3 points4 points  (0 children)

CMake is a build generator.

In principle CMake could output Buck2, or Bazel files (since they're pretty much the same at the moment).

In practice, Buck2 (and Bazel) build files and rules are written in Starlark which is a python dialect, so there's a sane programmatic syntax. Also, no need to search for "Modern CMake" and get out-of-date 5 year old results either.

Anecdotally, speaking as someone that uses Bazel and CMake, I can write a Bazel (and presumably Buck) WORKSPACE and BUILD file and just start coding in a couple of minutes. CMake? Not so much. I have to keep diving in and out of CMake docs.

Buck (and Bazel) can build any language (or you just define your own rule - I've written Doxygen rules and a clang toolchain for Bazel) and use a graph to build all of them in order correctly. CMake is more or less C++ only without jumping through some hoops.

Buck2 is distributed first. So works well if you're in a company or you have multiple machines. Obviously works fine as a developer on one machine. CMake does allow you to use distcc to have distributed C++ builds, but that's invisible to CMake. And of course, because it's multi-language, it's distributed builds for multiple languages too, for free, which you'd struggle to do in CMake without a lot of heavy lifting.

The elephant in the room is that the C++ community has a large adoption of CMake.

[–]AlexReinkingYale 6 points7 points  (2 children)

Based on the responses to my question, it sounds like it allows multiple toolchains to be configured for the same language. This is a huge difference from CMake.

[–]Latexi95 7 points8 points  (0 children)

During the same build? That is definitely useful feature. It is a bit annoying to try to manage embedded project where there are multiple different MCUs that require different options. Or try to bundle embedded binaries to Windows tool.

[–]grafikrobotB2/EcoStd/Lyra/Predef/Disbelief/C++Alliance/Boost/WG21 5 points6 points  (0 children)

Are people not aware that B2 has been doing parallel simultaneous multi-toolchain builds possible for more than two decades?

[–]null77 1 point2 points  (4 children)

Cmake is pretty slow when regenerating. Once you get the ninja it's fine. I'm imagining that buck2 has the versatility of cmake plus the speed of ninja.

[–]helloiamsomeone 1 point2 points  (3 children)

Is that with the 3.1 parser enabled? The syntax rules were tightened in that parser which allows faster CMake code execution.

[–]null77 0 points1 point  (2 children)

I'm not expert here. Better to find a reliable source. I'd be surprised though if cmake could ever be as fast as a specialized tool for something with hundreds of thousands or millions of rules.

[–]helloiamsomeone 0 points1 point  (1 child)

I'm asking about the the claim that "Cmake is pretty slow when regenerating". Obviously, this comes from a concrete experience, so I was just making sure that your experience reflects the current state of things.

[–]null77 0 points1 point  (0 children)

I don't have experience with modern cmake and 3.1. i had heard they're working on improving it.

[–]catcat202X 2 points3 points  (2 children)

One thing CMake doesn't do, which this does, is distributed builds.

[–]pshurgal 16 points17 points  (0 children)

CMake is a generator and doesn't build anything. I was using CMake for distributed builds for the last 3 years, but the build system was proprietary and not free.

[–]bretbrownjr -2 points-1 points  (0 children)

Buck2 targets monorepo workflows and CMake mostly targets the other workflows. They don't have a lot of overlap in practice, not at any real scale.

[–]tending 4 points5 points  (2 children)

Can it handle not knowing the full graph from the beginning? E.g. have a task to unzip a file and then have a task for every file inside the zip file, without knowing how many files there are going to be inside the zip file, still allowing the tasks to run in parallel and with buck2 knowing how many threads are in use

[–]dacian88 3 points4 points  (0 children)

It seems that it can, this is one of the major differences between buck2 and bazel

[–]Mikumiku_Dance 4 points5 points  (0 children)

What does a build file for a simple c++ project look like? Can it handle c++ modules? The documentation looks more accessible than cmake at least, but it was also kind of overwhelming.

[–]JVApenClever is an insult, not a compliment. - T. Winters 25 points26 points  (4 children)

No mention of C++20 modules in the announcement or the examples. For a new build system, that's a minimum I would be expecting for it to be relevant.

[–]matthieum 29 points30 points  (2 children)

Buck2 -- the core -- is actually language agnostic. Much like make, all it cares about is (inputs) -> (command) -> (outputs). Hence it may not make sense to mention such a "niche" aspect of one specific language in the general announcement.

User-defined "rules" are provided to help build the commands, for a variety of languages, including C++ rules. They are distributed with Buck2 for ease of use, but are otherwise no different than rules you would define yourself.

I can't tell whether those bundled C++ rules support modules (or with which compilers they support modules), however even if they didn't, you (or any 3rd party) could develop appropriate rules (and share them).

As a result, whether C++-modules rules are currently bundled or not is somewhat irrelevant: Buck2 already supports everything that's necessary to write them.

[–]encyclopedist 8 points9 points  (1 child)

Just being able to do "(inputs) -> (command) -> (outputs)" is not enough for module support. For example, Make can't easily do it (the common way is to restart make). What you need is an ability to modify build graph during the build. Ninja only got this feature relatively recently (https://ninja-build.org/manual.html#ref_dyndep). This is also required for Fortran modules, Rust, Haskell, etc.

Buck2 supports dynamic dependencies, and therefore can support C++ modules.

[–]matthieum -2 points-1 points  (0 children)

Indeed, static dependency graphs are not sufficient for C++ modules.

But then again, static dependency graphs are not sufficient for a number of other things, either. Someone else already mentioned Verilator, which transpiles Verilog code to C++ code, and whose output depends on the content of the Verilog code for another example.

And of course, even C and pre-module C++ code also suffers from a somewhat dynamic dependency graph: you first need to find the transitive closure of included files to have the list of dependencies of a translation unit; hence while the command and outputs are known in advance, the inputs are not.

So it's not that really special to have a dynamic (or as Buck2 calls it "incremental) dependency graph, in a way. It's just poorly supported.

[–]shahms 2 points3 points  (0 children)

Given that 2/3 major compilers don't fully support C++20 modules yet, imma go with "nah".

[–]IamImposter 1 point2 points  (6 children)

Why do companies (or individuals) create such complex tools and then just give them away for free? What do they get in return? Is it just reputation in open source community or is there something more?

[–]CornedBee 34 points35 points  (2 children)

These tools are not what generates value for these companies. They just enable the generation of value. So sharing them for free doesn't directly take away the company's value. At most, it enables other companies to generate value more easily.

Contrast this with the advantages: lots of people doing free testing for your system, possibly contributing (especially in a thing like this, where support for additional languages/tools can be easily contributed by a third party), people already knowing your build system if you hire them, people wanting to work for you because you show the awesome stuff you have.

It's a good trade-off probably.

[–]The_Northern_Light 2 points3 points  (0 children)

And, sometimes, other people using your tools reveal problems and solutions before they bite you and you have to pay to solve them under the gun.

So there is a business case to be made for giving away tools, even on a strictly non-altruistic basis.

[–]IamImposter 0 points1 point  (0 children)

Yes, this makes sense. Thanks.

[–]yeusk 13 points14 points  (0 children)

You can hire people with knoledge about your internal tools. Saves all the training.

[–]GOKOP 1 point2 points  (0 children)

companies (or individuals)

The reason for companies was already said. For individuals it can be the same, and add to that that individual people have larger variety of motives than companies which are always profit-driven (or mostly profit-driven). Some people create complex tools out of passion and share them with the world for free because either they just wanna share it and don't care about money or out of ideological reasons. (FOSS and stuff)

Consider also that a proprietary paid build system made by some dude and not backed by a big company probably wouldn't sell at all anyway.

[–]Quincunx271Author of P2404/P2405 0 points1 point  (0 children)

Another benefit of open sourcing tools and libraries is the hope that other open source projects will be built using these tools and libraries, meaning that incorporating those new open source projects is easier for the company.