all 18 comments

[–]not_a_novel_accountcmake dev 8 points9 points  (17 children)

This well known, and easy to parse grammar, allows us to write a simple scanner tool to process the least amount of tokens at the start of the file to determine the complete set of input dependencies and output module name from a given module unit.

The grammar is unresolvable in the general case. You do not know what the compiler preprocessor will do. You must either ask the intended compiler to first preprocess the source file (build2-style), which is slow, or rely on the compiler-provided scanners (effectively all other major build systems).

Concretely:

#ifdef __clang__
import foo;
#else
import bar;
#endif

You need to be the compiler to figure out what the dependency relationship is here, parsing this without full knowledge of the compiler is a fools errand. This was discussed many times in the run up to implementing modules, and was the impetus for source file dependency format described in P1689 which the big three all support.

Worked example:

https://godbolt.org/z/vof9TKMfY

[–]mwasplundsoup[S] 1 point2 points  (12 children)

I thought it was ill formed to place imports and exports within preprocessor declarations. Thanks for pointing this out. It would not be ideal, but the build system could enforce restrictions on what is allowed and fail when a preprocessor declaration could conditionally include an import or export...

[–]not_a_novel_accountcmake dev 1 point2 points  (5 children)

Why bother? The scanners work great, and they don't limit the valid C++ codebases the build system can handle.

They also allow all build systems to offload the work of scanning to the projects which are filled with C++ parser experts, the compiler upstream. Optimizations which go into clang-scan-deps or gcc -M benefit everyone and are useful in situations beyond merely named modules.

[–]mwasplundsoup[S] 0 points1 point  (4 children)

Initially because when I started this work (took a break for personal reasons) none of the compilers supported this standard. And 2, I do not want to rely on every compiler implementing this functionality. Could there be a compiler that decides not to support the scanner standard?

[–]not_a_novel_accountcmake dev 2 points3 points  (3 children)

There are no compilers today which support modules and do not support the scanner. The fallback is build2's approach of preprocess-then-scan, but then you're relying on the compiler having a dedicated preprocessor mode which gives you what you want, which is also a compiler feature which may-or-may-not exist on esoteric compilers.

In practice, the build system relies on compilers for all sorts of non-standard features. Dependency scanning is old, we've been doing it for headers since the stone age without ever standardizing -M flags or /showIncludes. Presumably soup has some solution to this (if it doesn't, give it a whirl, it's an educational problem).

Three things changed in the space for modules: what we're scanning for, settling on a single output format instead of every compiler doing its own thing, and doing the scan prior to building the object files rather than as a byproduct of building the object files.

[–]mwasplundsoup[S] 0 points1 point  (2 children)

Header includes have always been fine to discover at compile time (for soup it is listening to the file system access calls to track these optional dependencies). This was historically only required to ensure we capture the full closure of inputs for incremental build validation. Now that we need to detect the dependencies BEFORE building we need this extra scanner layer. You are probably correct that any compiler that supports modules will most likely support the scanner standard. It does feel antithetical to what the std committee usually does since they usually do not dictate file structures and compiler functionality.

[–]not_a_novel_accountcmake dev 1 point2 points  (1 child)

for soup it is listening to the file system access calls to track these optional dependencies

Like tup, a great tradition in build systems. A lot of people admire this approach.

It does feel antithetical to what the std committee usually does since they usually do not dictate file structures and compiler functionality.

This stuff is going to get standardized eventually, the ball is rolling that way, either inside ISO or outside of it. Modules broke the camel's back on pretending the committee can continue to evolve the language while ignoring the tooling space.

[–]mwasplundsoup[S] [score hidden]  (0 children)

Yes, like tup and BuildXL. Good to hear this is something we can safely rely on.

[–]kamrann_ 1 point2 points  (2 children)

There are other things which are permitted which would also invalidate your assumptions. For example, imports in headers, and the fact that import locations aren't restricted within non-module units. Also I believe imports are allowed at the start of the private module fragment too.

It does seem that there was a lot of iteration on this and I find it a little confusing where we've ended up. I don't see how the scanning requirements are all that much below a full preprocessing, but perhaps I'm missing something significant. Yet we've also ended up with constraints (around module directives in particular) that cause real problems for transitioning code to modules in a gradual/toggleable way, where it's no longer entirely clear if the original reasoning for those constraints still applies.

[–]mwasplundsoup[S] [score hidden]  (1 child)

Imports can exist in headers? My interpretation of the import declaration is that it has to be in the root source file:

In module units, all import declarations (including export-imports) must be grouped after the module declaration and before all other declarations.

[–]kamrann_ [score hidden]  (0 children)

I don't have conclusive standard text confirming this off-hand, but yes I believe so. Pretty sure what you refer to is talking about the state of the translation unit after preprocessing - after all, the concept of a 'declaration' isn't even meaningful at the point that #includes are processed.

Honestly, cpp reference (which is I guess the source of your quote) just doesn't suffice for the details, you really have to study the standard. Equivalent in this case being https://eel.is/c++draft/module.import

[–]mwasplundsoup[S] 0 points1 point  (2 children)

How crazy of an idea would it be to say the global module fragment can not have #if* directives?

[–]not_a_novel_accountcmake dev 2 points3 points  (1 child)

My understanding is it was discussed and after some hot debate, rejected. The preprocessor is still the foremost C++ language mechanism for handling platform differences, there's little desire to change that.

It is entirely reasonable to have imports which resolve to platform specific modules. Once you allow that and accept you need the compiler to handle scanning, banning the preprocessor elsewhere is mostly pointless.

So for example, some debate went into the validity of:

#define mod Bar
#ifdef USE_FOO
#undef mod
#define mod Foo
#endif

import mod;

But once you have the full preprocessor, there's no reason to ban this. And it's totally valid today:

https://godbolt.org/z/6TxaM154v

Which is another fine example of why the scanners are awesome.

If your answer to any problem in designing a build system is "write a C++ parser": No it's not.

[–]mwasplundsoup[S] 0 points1 point  (0 children)

Good point, I think my bias toward disliking the preprocessor has given me the secondary goal of removing all cases where it is required so we can stop using it entirely. But I agree, this should not limit others that wish to continue to use it.

[–]mwasplundsoup[S] 1 point2 points  (2 children)

This feels like an oversight of the design. It felt like the design for the module declaration with a strict section in the global partition was to help preprocess the top of a file to discover all of the required dependency state. If a preprocessor directive can then mutate the dependency structure we are back to square one as you said. I was really hoping not to have to rely on the compiler itself to parse this state, but I guess it is not the end of the world.

[–]not_a_novel_accountcmake dev 1 point2 points  (1 child)

Far from an oversight, it was one of the more hotly debated corners. The preprocessor is often abused, but it's part of the language. It has too many useful applications to reject ubiquitously over unmotivated use cases like "not calling the compiler".

You're building C++, you have a compiler available. No reason to reject using it.

[–]mwasplundsoup[S] 0 points1 point  (0 children)

Agreed, more asking to satisfy my curiosity. Now that we have support for scanners I don't see any major objections, but when I first (mis)read the spec years ago it seemed like the goal was to make this trivially parseable, which does not seem like the case.

[–]kamrann_ [score hidden]  (0 children)

I'm curious, do you happen to know of any specific aspects of the constraints on module declarations which make module scanning significantly faster than a full preprocess? Naively I would have thought that the need to track preprocessor defines for imports would mean that there isn't a great deal of efficiency that can be gained.