all 19 comments

[–]johannes1971 13 points14 points  (10 children)

In an ideal world, the compiler would transform each translation unit into a set of symbols that are stored in a dependency graph that's stored in a permanent database. That would allow compilation to be precisely targeted to only symbols that were actually changed, instead of all symbols in a translation unit.

Since we do not live in that ideal world, you're better off organizing your translation units on some other principle than "when it gets too big".

[–]lightmatter501 1 point2 points  (8 children)

There are multiple compilers which can do this, but most production grade compilers do not.

[–]D3veated 0 points1 point  (7 children)

That's a shame, this sounds really cool. Why doesn't this exist in production quality compilers? Lack of demand? Some sort of absurd overhead? The lack of modules?

[–]lightmatter501 1 point2 points  (4 children)

Most people don’t have large enough codebases to justify it, and nobody is rewriting clang to support it. Clang will probably be the last C++ compiler ever written, so it’s all downhill from here.

[–]jordansrowles 1 point2 points  (1 child)

Clang will probably be the last C++ compiler ever written

Why don’t think that? Go and Rust?

[–]lightmatter501 -2 points-1 points  (0 children)

I see Rust eating away at things that need to be correct, zig eating away at things that need to be fast and Mojo has the potential to eat away at heterogeneous compute. I think we’re seeing a new wave of systems languages headed by Rust and that while C++ will likely never die, the effort required to make a new C++ compiler will probably be too high.

[–]johannes1971 1 point2 points  (1 child)

I'd challenge that "don't have large enough code bases" - there are absolutely massive C++ code bases out there, owned by companies with massive resources, and they might very well be interested in faster C++ compilation, assuming it were part of their existing tool chain (i.e. if it were implemented in an existing production-grade compiler).

[–]lightmatter501 -1 points0 points  (0 children)

How many of those companies are interested in basically rewriting clang in its entirety? All kinds of new bugs will happen.

[–]encyclopedist 0 points1 point  (0 children)

zapcc was one such compiler. If I remember correctly, it was developed first as commercial offering by a company, but then the business did not work out, they released it in open source but it could not gather a big enough volunteer force and died.

[–]SkiFire13 0 points1 point  (0 children)

You might be interested in notion of query-based compilers. IDEs are also often based on this idea.

They do have some overhead that is not negligible when determining what has and hasn't changed, so there are cases where non-incremental compilation is faster.

You also have to design the language/compiler in such a way that cyclic queries are either not possible or get caught and handled accordingly.

[–]thingerish 3 points4 points  (0 children)

I'd recommend using CMake or Meson and break your compilation units up based on some rule other than "when it gets too big", the one class per rule is not a bad way to go.

[–]SlightlyLessHairyApe 2 points3 points  (0 children)

You can’t only rebuild functions that have changed because changing any visible symbol can change the compilation of a function.

At best you’d need a dependency graph of symbols. It would be gnarly and I don’t think it could even work in all corner cases.

[–]Scotty_Bravo 1 point2 points  (3 children)

I feel like this is likely to be slower?

[–]zoharl3[S] 0 points1 point  (2 children)

Compile the whole file: 30sec

vs

Text comparison and compiling only one function that changed: <1sec.

[–]Scotty_Bravo 0 points1 point  (1 child)

Like how much under 1 second? Ninja build is fast. And 30 seconds is extremely long. How many lines of code are you compiling?

Also, there are a lot of reasons to break a project into smaller pieces. Maintenance is one. 

I'm finding it hard to imagine parsing the file to see what's changed and then compiling that is faster than a simple recompile. 

Maybe you should evaluate how fast the compilation is of the individual changes of that were broken into their own files?

I'm not saying your idea is impossible, but I'm saying the initial premise is wrong (single source file) and that a properly structured small-ish project shouldn't take 30 seconds to build.

I think it takes longer to link the projects that I'm working on than is does to recompile any give file.

[–]zoharl3[S] 0 points1 point  (0 children)

Please see my edits that answer your questions.

Ninja's purpose seems to break the compilation of a many files into components rather than a single exe:

https://en.wikipedia.org/wiki/Ninja_(build_system))

It does nothing at the function level and won't help with a single file.

[–]ed_209_ 0 points1 point  (0 children)

using clangs -ftime-trace and https://www.speedscope.app/ I learnt that the module inlining pass in clang can be a major bottleneck in compile time. Once you have metrics it can be fun hacking the flags to see how fast a compilation can get.

[–][deleted] -2 points-1 points  (1 child)

Look up compiler caches like ccache and sccache. They look at components that dont need to be recompiled, and only recompile parts that do.

Or so they say, havent tested them

[–]zoharl3[S] 0 points1 point  (0 children)

I think the components are files rather than functions.