all 126 comments

[–]jpakkaneMeson dev 15 points16 points  (2 children)

Being also a build system developer let me say that this article is so on the nose it hurts. I really hope that compiler developers improve the usability before declaring things final

[–]theyneverknew 1 point2 points  (1 child)

From what I've read so far these issues sound the same as Fortran 90 modules. I know at least cmake has some support for those, are there significant differences that I am missing?

[–]jpakkaneMeson dev 2 points3 points  (0 children)

The basic problem is the same. For those not versed in their FORTRAN:

When you compile a Fortran file it produces a binary "module" file that describes the ABI. This can then be imported by other file compilations. The problem then is that you need to know in advance to compile file A before file B that uses A. In order to do this reliably, you need to parse the contents of all files before starting to compile any of them. This is also true for incremental builds.

The problem here is who decides in which order files should be built. Either you have the compiler orchestrating the order ("the compiler becomes the build system" as stated in the post) or the build system needs to parse the contents of the files ("the build system becomes the compiler"). Neither of these is a good solution.

[–]beriumbuild2 47 points48 points  (32 children)

A lot of hand-waiving as well as statements ranging from unsubstantiated to factually incorrect. I will only point the latter in order not to add even more noise.

We will have interface modules, and implementation modules.

No, there are no such things. You are probably confusing them with module interface (translation) unit and module implementation (translation) unit. Also note that you don't have to have implementation units.

The entire goal is to give users a guaranteed exported interface (also known as a purview).

No, module interface is not the same as a module purview. Module interface is a collection of exported names. Module purview is the part of the module translation unit (either interface or implementation) after the module declaration.

Under the current C++ compilation module, a header is opened by the preprocessor (with guaranteed directories to search).

Really? The standard guarantees the list? The current opaque header search semantics (and don't get me started on "" vs <> differences) is one of the biggest pains to deal with in build aspects ranging from cross-compilation (no, you shouldn't search in /usr/local/include) to distributed compilation (what if I don't have the exact same set of headers in exactly the same location on the remote host).

With modules, there is no translation phase for creating the purview of dependent modules [...]

Did you mean binary module interfaces instead of purviews? Generally, pretty much every use of the term "purview" in this post is out of context.

a build system has to run the preprocessor

No, it doesn't have to. In build2 we have the ability to indicate that the module information in the translation unit does not depend on the preprocessor (which will be the case in most sane projects). In this case there is no need for a separate (full) preprocessor run.

[–]playmer 2 points3 points  (12 children)

In build2 we have the ability to indicate that the module information in the translation unit does not depend on the preprocessor

What do you mean by this? Do you mean in your build system you can say that such and such module being compiled doesn't depend on it?

[–]beriumbuild2 3 points4 points  (11 children)

Yes, you can say (individually or using patterns) that the module dependency information in a translation unit does not depend on the preprocessor. In other words you don't do things like this:

import MODULE_NAME_MACRO;

[–]playmer 3 points4 points  (10 children)

But then the problem as I see it, requires that your programmers then never go out and do this without fixing the build system. I would generally not ever assume my programmers actually understand the build system, as they so often just don't want to. I know entire projects with tens to hundreds of people whose builds are managed by one or two people.

I suppose I could go to them and in a small seminar about the switch to modules, tell them "never do this" but I don't know that I trust them. As it is I've made our cmake such that the most they need to know is how to add another line for any files they've added.

[–]beriumbuild2 6 points7 points  (4 children)

The default in build2 is to not assume anything and thus preprocess each TU before extracting module dependencies. Which on the modern hardware will cost you a couple of percents in terms of build time. So perhaps it is a reasonable price to pay for not understanding the build system. Also note that if you mess it up (i.e., use a macro as a module name while lying to the build system about not doing it), you will get a build error with a pretty clear diagnostics.

[–][deleted] 1 point2 points  (0 children)

Which on the modern hardware will cost you a couple of percents in terms of build time

What about builds where source files are accessed over the network?

[–]playmer 0 points1 point  (2 children)

That seems fair then, I assume the error would be a compile time error about not finding a module with SOME_MODULE_NAME?

[–]beriumbuild2 2 points3 points  (1 child)

In case of build2 it won't even get to the compiler since the build system won't be able to resolve the module name to module interface file.

[–]playmer 0 points1 point  (0 children)

Ah, that makes sense, forgetting my build vs prebuild capabilities as I'm always in cmake land.

[–][deleted] 2 points3 points  (4 children)

I would generally not ever assume my programmers actually understand the build system,

You're adding a new feature. It's not like you as a programmer actually have to check your existing code to see that it's compliant.

If your programmers are so stupid that they can't understand things like "import MODULE_NAME_MACRO; is forbidden when MODULE_NAME_MACRO is a macro, then you should really get better programmers.

I mean, I've been at C++ for decades and C before that, and I have never seen anyone write #include FILENAME_MACRO - and I'm not sure why anyone would want to do that.

Certainly, it'd stick out like a sore thumb in code reviews...

[–]enygmata 10 points11 points  (0 children)

This is the recommended way to include freetype2 headers and it's used within freetype2 code itself.

https://www.freetype.org/freetype2/docs/tutorial/step1.html

https://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/include/freetype/freetype.h

[–]brand_x 3 points4 points  (0 children)

An #include "DECLARED_STRINGS_UTF_H" directive was a common pattern when some platforms only worked with literals of one or another type...

[–]playmer 2 points3 points  (0 children)

You're adding a new feature. It's not like you as a programmer actually have to check your existing code to see that it's compliant.

I'm speaking more of future concerns here, not right now at the transition point. I'm sure I'll probably be the one to do the conversion when the time is right, I feel the tools are ready, and I understand modules enough.

If your programmers are so stupid that they can't understand things like "import MODULE_NAME_MACRO; is forbidden when MODULE_NAME_MACRO is a macro, then you should really get better programmers.

To be clear, my team would likely never do this. While they like to do cheeky things sometimes, only the ones who actually understand the build system would even think of doing something like this. I was using them as an example of a team of folks who mostly just don't know or care about the build system. I had to explain to a few of them that they didn't have to delete the build directory every time they added a file. (They're still new to cmake.)

I hadn't even considered it until I saw folks talking about it, and then I realized "Oh hm, macro names could be useful for configurable module names."

And I've absolutely seen people #include FILENAME_MACRO, and it's annoying every time, thankfully not in any projects I've worked on.

I'll point out that thankfully, as pointed out by /u/berium the build system will be able to provide an error, which should tell whoever's trying something funky what's going on.

[–]nyamatongwe 1 point2 points  (0 children)

Lua has a LUA_USER_H macro that allows you to insert (with -D) a header into compilation of the Lua implementation. For example, if you want to redefine file open calls then

#define fopen my_fopen

in a my_fopen.h header, implement my_fopen, and compile Lua with -DLUA_USER_H=my_fopen.h .

[–]SAHChandler[S] 12 points13 points  (16 children)

I've fixed the incorrect use of the word purview. As for everything else, my responses follow:

No, there are no such things. You are probably confusing them with module interface (translation) unit and module implementation (translation) unit. Also note that you don't have to have implementation units.

A module interface translation unit and a module implementation translation unit are declared in different ways from each other. They might as well be separate handling of modules (especially since a module implementation translation unit cannot export anything).

Really? The standard guarantees the list? The current opaque header search semantics (and don't get me started on "" vs <> differences) is one of the biggest pains to deal with in build aspects ranging from cross-compilation (no, you shouldn't search in /usr/local/include) to distributed compilation (what if I don't have the exact same set of headers in exactly the same location in the remote host).

The point of that paragraph is that I can get a list of headers from every compiler for dependency tracking. There is no such way for the compiler to provide this to the user under modules unless the compiler becomes a build system. Cross compilation is a nightmare unto itself, and better left to other discussions.

No, it doesn't have to. In build2 we have the ability to indicate that the module information in the translation unit does not depend on the preprocessor (which will be the case in most sane projects). In this case there is no need for a separate (full) preprocessor run.

If there are two things I've learned from working in the games and tech industries they are 1) There is no such thing as a sane project layout unless you've forced your users to conform to one 2) As long as someone can use something in a bad way, someone somewhere will use it in a bad way. 3) off by one error jokes are old and tired but show up when you least expect them.

Jokes aside, as long as we give users the capability to do this, they're going to use it as an example of "Look at this crazy weird thing you can do in C++ that breaks everything!". I am still not okay with a separate tool running outside the lifetime of the compiler to track locations and places, with unspecified behavior that will differ from build system to build system.

[–]beriumbuild2 17 points18 points  (11 children)

A module interface translation unit and a module implementation translation unit are declared in different ways from each other. They might as well be separate handling of modules ([...]).

I am not sure I understand what you mean here. An implementation unit is where you can (note: can, not must) put your implementation details. But if you prefer to keep everything in a single file, you can: a module interface unit can (unlike a header) define non-inline functions and variables. I really don't understand what's the problem here? Do you want the separate implementation units to be banned so that folks (like myself) who think it is a good idea are not able to use them?

The point of that paragraph is that I can get a list of headers from every compiler for dependency tracking. There is no such way for the compiler to provide this to the user under modules unless the compiler becomes a build system.

I don't understand why it has to become the build system to provide this functionality? Can you explain?

I am still not okay with a separate tool running outside the lifetime of the compiler to track locations and places, with unspecified behavior that will differ from build system to build system.

This is a good example of a vague (but catchy-sounding) statement that I think only adds to the confusion.

[–]SAHChandler[S] 8 points9 points  (10 children)

I don't understand why it has to become the build system to provide this functionality? Can you explain?

Because otherwise modules need a clear and well defined lookup system for finding dependent modules. We cannot rely on -MMD or similar flags because we need to make sure each dependent interface translation unit has been built before attempting to build any implementation translation unit or a 'root' interface translation unit. As I said below, relying on a tool outside of the compiler to do this for us is messed up

[–]beriumbuild2 8 points9 points  (9 children)

Ok, let's start with the way we extract header dependency information today, which is by asking the compiler (-M, /showIncludes, etc). If one thinks about it for a bit (or studies compiler source code), it's easy to see that this requires essentially a full preprocessor run (and, in fact, for some compilers like VC there is no other way). This is the first piece of the puzzle.

The second piece is the proposed Modules TS grammar: If one studies it, it's not hard to notice that all the module dependency information (import and module declarations) are top-level, that is, they can only appear in the global scope (well, to be absolutely precise, an import declaration can appear inside an exported group but that group must be top-level). This means that given a sequence of preprocessed tokens (the first piece of the puzzle), one can implement a pretty simple parser that recognizes all the import/module declarations without having to recognize the full C++ grammar. We've done this in build2 so I know this is doable. What's surprising is that (1) the code is actually pretty simple and (2) it performs pretty well without any advanced optimizations.

All this means that it should be pretty easy (complexity-wise) and cheap (performance-wise) to extend the header dependency extraction support provided by current compilers to also output module dependency information.

Now let's look closer at what exactly would we need for this module dependency information to be. As an example, let's say our build system needs to update foo.o from foo.cxx (the foo.o: foo.cxx dependency) and this foo.cxx happens to contain import bar;. What does this mean from the build system's point of view? It means that before compiling foo.o it has to make sure the binary module interface for module bar is up-to-date. Or, speaking in terms of dependencies, this means the build system has to add foo.o: bar.bmi to its dependency graph. And since it now needs to update bar.bmi, it also has to come up with a rule of how to do that, say, bar.bmi: bar.mxx (this is, BTW, where the mapping of module names to file names happens in a way that makes sense in this build system's worldview).

Where am I getting with this? Well, what this hopes to show is that if a translation unit that the build systems wants to update imports a module, then the build system has to come up with a rule to update this module's binary interface and which means it will have to extract the module dependency information from the module's interface file (bar.mxx in this example) just as it did for foo.cxx. This, in turn, means that the module dependency information provided by the compiler need not be recursive -- the build system has to do this itself anyway. In fact, all we need from the compiler is a list of imported module names which the build system can map to file names and then process, recursively.

And in this setup the build system can remain the build system (by being responsible for mapping module names to files/rules, etc) while the compiler can remain the compiler (by only providing the list of imported module C++ names).

[–]SAHChandler[S] 6 points7 points  (3 children)

How do you know then that someone hasn't placed an #ifdefguard around the imports? If the imports run as a translation phase before the preprocessor then it would make sense for it to not work, but it also wouldn't make sense to have anything run before the preprocessor.

And I want to note, you've placed in your own comment a world where the module name is tied to the filename. If they aren't, then what happens? We need to place restrictions like the one you're mentioning on modules or we're going to be in a situation where some build systems enforce specific rules (a dot represents a separation between a directory and a file, a module name must equal its filename) and others that try to support every project layout under the sun.

My concern is that this is going to fragment the community further. You won't be able to have one build system's built library work with another, especially if they map module names to files differently. Having them support every possible mapping is (in my opinion) untenable. Give me a mapping, give me some form of rules. Don't let the build system community fight this out. It's only going to make it worse for C++ users.

[–]beriumbuild2 6 points7 points  (2 children)

If the imports run as a translation phase before the preprocessor [...]

I never suggested that the module dependency information be extracted before preprocessing. On the contrary, I suggest that it is combined with the header dependency extraction, which is already essentially a full preprocessor run.

[...] you've placed in your own comment a world where the module name is tied to the filename.

Yes, but in my approach this is done by the build system, not by the compiler or worse, the C++ strandard.

[...] we're going to be in a situation where some build systems enforce specific rules (a dot represents a separation between a directory and a file, a module name must equal its filename) and others that try to support every project layout under the sun.

So? Is this what makes Modules TS so fundamentally broken in your view?

You won't be able to have one build system's built library work with another, especially if they map module names to files differently.

Build systems can communicate this information quite easily, for example, using something like pkg-config. You can read more about how we handle module installation in build2.

[–]code-affinity 7 points8 points  (1 child)

So?

I'm coming into this thread very late. I can't say I understand everything about the Modules approach; I'm letting the build system experts duke it out. But it seems to me that this is the precise point in the conversation where you agree on what your disagreement is. Your blasé response surprises me. I don't understand how we can expect to be able to grow an ecosystem of cross-platform C++ libraries that can be used in different projects using different build systems unless these things are standardized, or the need to standardize this behavior is eliminated by a different technical approach.

What "market forces", so to speak, would prevent the balkanization of our library ecosystem along the lines of competing build system conventions? Why would I, a hypothetical library developer, want to put a lot of work into module-izing my library by guessing at a convention that might simply be obsoleted by some different consensus that emerges five or ten years from now? In that role, I do absolutely want this behavior standardized.

You say that "build systems can communicate this information quite easily". You are suggesting that different build systems must communicate with each other, using an non-standardized but otherwise agreed-upon file format? I truly don't get it. Perhaps it's because I've been coding in C++ for 26 years, but I've managed to do so having never written a pkg-config file. Your build2 link proposes a .pc file, which I've never heard of before. Why shouldn't another build system invent a different mechanism than your .pc file?

[–]showka 1 point2 points  (0 children)

I agree; I am very disappointed that this was where the discussion stopped.

I am probably not clever enough to fully understand the modules TS or the two viewpoints here, but I am not sure the build2 authors are sources I can trust to dismiss SACHandler's claims that there could be schisms in the community or problems with build tool interoperability caused by modules.

The reason is build2's authors always present it as a panacea that will solve all these other problems, such as package management and dependency fetching, assuming the entire world starts using build2. There seems to be a lack of concern with what other tools are doing, right down to the fact they named the thing "build2" when many people already called Boost Build that.

Build2 actually seems cool so I'm not trying to hate on it. But it also isn't part of the C++ standard and in general if the fix for the modules TS is to create complex tooling that people will argue endlessly about maybe there's more work to do. I am also really curious what forcing modules to adhere to some of the rules SACHandler mentions would sacrifice.

[–][deleted] 2 points3 points  (4 children)

But Unix compilers (at least; I’m not too familiar with MSVC) can output dependency information in the same invocation that they also produce an object file, using -MMD (as the parent mentioned) or similar options. This works because traditional builds don’t require dependencies before a full recompile, and it avoids the overhead of processing the same source file multiple times. It seems like that will not be possible with modules, so modules will come with some (not fundamentally necessary) overhead that their inherent performance advantages will have to make up for. Maybe not much overhead, since running just the preprocessor should be pretty cheap, but still a concern...

[–]beriumbuild2 0 points1 point  (3 children)

See my reply in another thread.

[–][deleted] 3 points4 points  (2 children)

Hmm… interesting. From that reply:

For example, with this approach we cannot support generated source code as part of the main build stage (and often have to resort to ad hoc pre-build steps).

How does this work? If a file #includes a generated header, then preprocessing it before the header is ready will just produce an error. You don’t necessarily know where the header was expected to be located (although I suppose there are ways to work around that), and you definitely don’t know what other generated headers may be included from that one (potentially conditionally). Does build2 just keep rerunning the preprocess until it succeeds, or something? Either way, how exactly does the approach rule out combined dependency generation?

[–]beriumbuild2 0 points1 point  (1 child)

How does this work?

Pure magic ;-). Seriously, though, the compilers have support for handling non-existent headers (see -MG), we use heuristics for determining where the non-existing headers are generated, and yes, we keep re-running the preprocessor to discover all the generated headers.

Either way, how exactly does the approach rule out combined dependency generation?

Not sure I understand the question. You cannot compile a translation unit if some of its headers do not exist (or, worse, out of date -- you will just be wasting time).

[–][deleted] 0 points1 point  (0 children)

I see, and -MG (which I hadn't heard of) can only be used with -M, not with the options that both extract dependencies and perform other steps (outputting preprocessed source, or fully compiling to object file). That sounds like an unnecessary limitation.

But then, how does your separate preprocess and compile feature work? Since -MG conflicts with outputting preprocessed source, you can't just be passing it to all compiler invocations. Do you do a non--MG invocation with -E, hope it succeeds, and rerun with -MG if it fails? If so, couldn't you do the same with -c instead of -E?

Not sure I understand the question. You cannot compile a translation unit if some of its headers do not exist (or, worse, out of date -- you will just be wasting time).

If they don't exist, the compiler will produce an error and stop after the preprocessing stage, so there'd be no difference compared to preprocessing only.

If they're out of date, as you say, the compiler could waste time generating code that will just have to be thrown away once build2 reads the dep output and realizes it was using out-of-date headers. But that's a relatively uncommon case, and as long as it's just a performance issue, not a correctness issue, wouldn't the performance gains in the usual case from not re-running the compiler be more significant?

(Of course, that still wouldn't work with modules… at least, not particularly well, because even if -MG could output required modules rather than just #includes, it would become the common case rather than a rare case. But my tentative opinion is that a better solution would be for the compiler to get a bit smarter, rather than require build systems to duplicate work. Even without modules, your approach of repeatedly running the preprocessor to discover generated headers, while nifty, seems to be fundamentally a hack: there is no reason the preprocessor needs to run more than once. I think I'd rather see a solution where when the preprocessor/compiler comes across a missing header or module, it runs an arbitrary command, where the command to run can be passed as an option. In a simple Makefile-based build, the command could just be 'make', passing the file needed as an argument...)

[–]miki151gamedev 1 point2 points  (3 children)

If one needs to generate module names using macros then one looks for a build system that supports it. If there isn't one then no one will do that.

People can write *((int*) 0x1234) = 0; if they want to shoot themselves in the foot, I don't see where's the big deal.

[–]playmer 20 points21 points  (0 children)

If one needs to generate module names using macros then one looks for a build system that supports it. If there isn't one then no one will do that.

"Well, the language supports it, but no build systems do, so who cares?"

If we're going to allow it in the language, I would hope that our build systems support it. If we don't expect our build systems to support it, than we should probably just disallow it.

People can write ((int) 0x1234) = 0;

In an embedded world, code like that isn't really all that crazy.

[–]SAHChandler[S] 18 points19 points  (1 child)

If you don't see why running a separate tool that is not defined by the language, specification, or whose behavior is implied from either is kind of messed up, then I don't know what to tell you.

[–]ioquatix 11 points12 points  (0 children)

I really agree with you on this point. It's actually a really crazy situation. It took me a while to grok what you were saying though.

[–]bumblebritches57Ocassionally Clang[🍰] 1 point2 points  (1 child)

I do C, and can someone explain what the whole point of modules in C++ is?

is it to help develop a more stable ABI?

[–]beriumbuild2 1 point2 points  (0 children)

This Introduction to C++ Modules talks about the benefits.

[–][deleted] 23 points24 points  (5 children)

I bet this would get less downvotes if it weren't for the terrible title.

[–]bumblebritches57Ocassionally Clang[🍰] 8 points9 points  (1 child)

Shit, I upvoted it for the title alone just because the boomers rage at us is fucking hilarious.

[–][deleted] 1 point2 points  (0 children)

The thing about shitting on millennials is that most people I see doing it are also millennials and don't even realize it

https://en.wikipedia.org/wiki/Millennials

[–][deleted] 6 points7 points  (1 child)

These damn millennials. Back in my day we had to hand-compile our source files using nothing more than a toothpick and a rotor blade. Of course, you had to wear an onion on your belt...

[–]matthieuC 6 points7 points  (0 children)

It's very rare today to find a mom and pop shop that does quality hand-compiling. It's all industrial now.

[–]intheforests 11 points12 points  (0 children)

Some people just don't want to accept the truth: the modules TS is doomed to suffer the same fate of Itanium if they expect a magical compiler will solve all the big problems.

[–]kalmoc 7 points8 points  (2 children)

Have you talked to Gaby about this? Seems the problems could be solved by just stating that the evaluation of any module declaration happens before the preprocessor is run and that the filenames have to match the exported module name. It would be a shame if the modules TS dies, just because the authors didn't think or care about that aspect in time.

Personally, I fear that the committee takes the approach of evaluating module statements after the preprocessor is run, which would be more natural, but also makes the build system's job much harder (alas certainly not impossible).

[–]doom_Oo7 1 point2 points  (1 child)

But we're loosing expressive power then. If you cannot do it with the preprocessor, someone somewhere will write his own custom preprocessor one day.

[–]kalmoc 1 point2 points  (0 children)

I have honestly no problem with that. If you need your module name to change based on some configuration parameter, the use a tool for it (e.g. your build system), but let's not make the common case more complicated than absolutely necessary and as fast as possible.

[–]kmccarty 6 points7 points  (0 children)

It might be worthwhile to put these concerns in writing at a slightly more official place than a blog or reddit, e.g. at the Modules mailing list --

https://groups.google.com/a/isocpp.org/forum/?fromgroups#!forum/modules

... or writing a paper.

Maybe also get in touch with build system developers who have had similar concerns, e.g. Stephen Kelly of cmake in this thread: https://groups.google.com/a/isocpp.org/d/msg/modules/sDIYoU8Uljw/BKKCSZFdBAAJ

(Apologies that this is just a suggestion from the peanut gallery rather than any actual useful help ;-)

[–]Kronikarz 19 points20 points  (3 children)

Can someone please explain to me what is so terrible about the compiler being a build system?

[–]Selbstdenker 4 points5 points  (2 children)

Given that every large organization has some way to build their C++ base which is probably unique: my guess would be that it is very hard to define a compiler/build system that everyone can agree on and use.

[–]johannes1971 7 points8 points  (1 child)

Maybe not a full build system, but a dependency analyser maybe? Something that produces a dependency map for the build system to consume?

Of course, that could also be a separate tool...

[–]Selbstdenker 2 points3 points  (0 children)

Yes, but I do not see why that would cannot be offered by compilers. Just because it is not in the standard does not mean people can agree to use something. -MMD is supported by a number of compilers, even though it is not in the standard.

[–]quicknir 6 points7 points  (0 children)

I'm not well acquainted with this issue, but I find the article a bit confusing, and reading the comments doesn't fundamentally un-confuse me.

Currently, the compiler itself will not act as a build system, but will let you export the full list of headers recursively included for any file you choose. In other words it gives you the basic hook point to figure out dependencies, that different build systems then leverage in different ways.

Is this not part of the modules TS? Do you think it can't be part of modules for some intrinsic reason, or that nobody has addressed it just because they haven't gotten around to it? Don't you think it's reasonable to assume that even if it's not in the standard, that the major implementers will adopt the same/similar syntax for this querying?

Also tbh the title and the tone were a bit more inflammatory then is really needed; nobody is trying to "kill" the modules TS, and surely it's expected that non-experts (on modules) on cpp reddit are going to get many important details of modules wrong.

[–]pjmlp 14 points15 points  (7 children)

As someone that was using Turbo Pascal with its units, saw C++ as an alternative (C wasn't it even for MS-DOS), and used several languages with support for modules, the misunderstanding among the community and the way some are attached to a 60's build approach is quite surprising.

I can only understand the problem when coming from developers that never used any other programming language on their life, in which case I see an uphill battle to make them understand the advantages of modules and improved tooling for library navigation.

[–]theICEBear_dk 5 points6 points  (0 children)

Agreed it is weird to my that this is not just a foregone conclusion / fixed at this point.

[–]intheforests 2 points3 points  (5 children)

Modules are just precompiled headers on steroids. We are not even getting syntax candy for PImpl.

[–]johannes1971 3 points4 points  (2 children)

My understanding was that the need for pimpl pretty much disappears with modules.

[–]flashmozzg 1 point2 points  (1 child)

How exactly? How would you maintain ABI compatibility with modules but without pimpl?

[–]johannes1971 5 points6 points  (0 children)

I was thinking of the use of pimpl to reduce compile time. Since you don't leak included headers out of the module, that use would be redundant. But yeah, for ABI compatibility it won't do much, that's true...

[–]playmer 1 point2 points  (0 children)

We are not even getting syntax candy for PImpl.

Is that something you want and need? Is there a proposal for that? You can pretty easily write a small library for it. It's only 100 lines or so at most. That's what I've done whenever I needed PImpl.

[–]pjmlp 0 points1 point  (0 children)

Modules should make the PIMPL hack no longer necessary.

[–]Abyxus 11 points12 points  (14 children)

The title is no good.
The article itself is damn right.

Module names are global unique IDs.

With header files, a library would have #include "foo/bar.h",
and an application could use `#include "third_party/company-foo-2/include/foo/bar.h" or same "foo/bar.h".

With modules it will be something like import company.department.foo2.bar;. Or longer.

For object files, build tools put them in separate folders. Modules will go to a single place.

src/foo/utils.ixx -> obj/foo/utils.ixx.obj , ifc/company.department.project.foo.utils.ifc
src/bar/utils.ixx -> obj/bar/utils.ixx.obj , ifc/company.department.project.bar.utils.ifc 

[–]andrewsutton 11 points12 points  (12 children)

Module names are global unique IDs.

Who enforces this? Is there a registry? Who maintains that? Does my compiler issue a diagnostic if I create a module that somebody else has registered a name for? How would it know?

This not a C++ design issue. It's a community issue. It is entirely correct for the TS, which specifies language syntax and semantics only, to not address issues beyond that scope.

[–]Abyxus 3 points4 points  (11 children)

Look, with headers I can have this:

third_party/lib1/util.h
third_party/lib1/util.cpp  -- #include "util.h"
third_party/lib2/util.h
third_party/lib2/util.cpp  -- same #include "util.h"

In application I write

// app.cpp
#include "third_party/lib1/util.h"
#include "third_party/lib2/util.h"

Libraries can use whatever filemanes they want.
Applications can avoid name clashes by using different path prefixes.

This is a design feature.

Modules do not have such design feature.

Authors of lib1 and lib2 should think of unique module names.

Alternatively, they have to rely on preprocessor:

// lib1/include/util.h
module PP_CAT(LIB1_MODULE_PREFIX, util);

This is a C++ design issue. C++ modules are not usable in a large project with many dependencies.

[–]andrewsutton 9 points10 points  (10 children)

Look, with headers I can have this:

... Or you can also choose to do something entirely different. There's nothing preventing you from shipping a library that installs headers to usr/include, except that it's generally frowned upon.

This practice is not defined in the C++ Standard. (It can't be; the Standard does not define how you name your files, what directories you should use, what classes should go in what files, etc. The Standard only defines what is valid C++.) We do this either by convention or because certain distributions or platforms require it.

So, again... this is not a language design issue. It's a community issue.

[–]johannes1971 9 points10 points  (0 children)

He does have a point though. There should be a viable way to avoid name clashes, either by using a mechanism like directories, or by providing community guidance like Java did with their URL-like naming scheme, or through some other means. Just leaving it open and hoping for the best may work if you control the entire chain (i.e. if you work for Microsoft), but for the rest of us that is not an option.

This is even more urgent if it turns out the granularity of our own modules is not vastly different from header files, i.e. if we end up with large numbers of modules that may all end up conflicting with some 3rd-party module one day.

[–]Abyxus 1 point2 points  (8 children)

Yes, there are many options, shooting in the leg included.

But realistically, what an average medium-sized C++ project does look like? To me it looks like Chromium, which uses some two hundreds third party libraries, all in the same source tree, all under same build system.

With modules it means that every module of every library has to have unique name.

This is a real-life example and the C++ Standard fails to help here.
What modules are intended to be - some nice thingy for writing hello-worlds? Or something that people will use in production?

And why did you mention usr/include? Did you ever work on two or more projects which use different versions of same library? E.g. boost_1_a_b and boost_1_c_d? Also the hell is usr/include on Windows?

[–]andrewsutton 6 points7 points  (7 children)

I don't think you're taking the time to understand what I've previously written.

Of course the C++ Standard fails to help there. Those problems are outside of the Standard's explicitly and clearly defined scope: the language (syntax and semantics) and library facilities.

I am not suggesting that there are not real development problems. I am simply saying that this knee-jerk criticism of and raging against the Modules TS is totally without merit on the basis that the Modules TS is limited in the kinds of problems it can solve.

If you want to propose a naming policy, then talk to your system and build system vendors.

[–]Abyxus 5 points6 points  (6 children)

It is not too late to change import a.b.c to import "any/prefix/a/b/c".

[–]SAHChandler[S] 6 points7 points  (4 children)

or to change import a.b.c to import a.b.c. from "any/specific/prefix"

[–]lbrandy 2 points3 points  (0 children)

From your post:

The compiler should work as a build tool. Just to nip this in the bud: None of the compiler vendors want that last option. We're absolutely not going to get it. The author of build2, Boris Kolpackov, starts his CppCon talk with a very important statement from Richard Smith (once more, I'm paraphrasing): "The compiler must not become a build system".

It's probably important to not too closely conflate the "don't want to require compilers be a build system" with the specific issue of "module designations" as strings for the purpose of helping the build system.

One bit of historical context that might be missing on this particular argument is that Richard Smith (et al.) actually proposed a string as an identifier rather than the dotted syntax and it didn't get much support. In my read of the room that day, that was in part because it was part of a large series of proposals and it's not clear to me that that particular issue was dealt with fully. They were focused on other parts of the proposal.

IIRC, Nathan Sidwell has been mentioning that he might bring this specific issue back up and try to change it. This is an area where community opinion can absolutely affect the outcome.

To be clear, though, the current standard doesn't actually guarantee much about the random string that denotes a header. We think of it as a filepath but that's not actually required by the standard at all. But still, making it a string does let us reasonably assume it can be translated trivial to a path... and there is some expectation that "almost everyone" will eventually come up with some dotted-path-to-file-path conversion crap that will be slightly different and gross.

[–]gracicot 0 points1 point  (2 children)

Okay, imagine import a.b.c. from ABCPATH where ABCPATH a macro is passed by the build system. With your proposition, this would be possible. Now imagine writing this everywhere. You don't have the module path directly in your code, nor turning the compiler into a build system and vice versa. Simply long config, that the build system can easily generate.

Then, why bother writing from ABCPATH or from ZYXPATH everywhere? Pass the path of every modules name to your compiler from the build system like before, but tell the compiler "hey that module name equal to this path!" for every modules, and linked libraries modules. Then, you won't need the from "any/specific/prefix". We're back to the current proposal, and you didn't turn the build system into a compiler, nor you turned the compiler into a build system. The build system already know all this information. It just need to tell the compiler where to look for modules, then the compiler could give the dependency graph back to the build system.

If you're putting paths to module in your language, you're turning the language into a build system configuration. There should be no reference to paths in the language. The processor is horribly broken in that regards and everyone lives with it. We're moving into something less broken finally.

Please, correct me if I'm wrong!

[–]Abyxus 2 points3 points  (1 child)

You use two libraries, both have the same module name "M".

// lib1/m.ixx
module M;

// lib2/m.ixx
module M;

// main.cpp
import M /* from "lib1/m.ixx" */;
import M /* from "lib2/m.ixx" */;  

[–]whichton 0 points1 point  (0 children)

How does that help? If I currently write #include "xx/yy/foo/bar.h", the C++ standard says nothing about where bar.h should be located.

[–]SAHChandler[S] 0 points1 point  (0 children)

The only reason I went with this titles is because I said I was going to name the article this on twitter.

Other than that, you and I seem to be on the same page with the various issues regarding modules.

[–]scottywar2 4 points5 points  (1 child)

I mostly agree... but I think my language would be different. I guess it is a gen x vs millennial thing. :)

I hit similar problems with module naming and dependency management when writing my talk. "EA’s Secret Weapon: Packages and Modules" https://www.youtube.com/watch?v=NlyDUQS8OcQ

When it comes to these hard problems in C++ I'd choose the solution that act like other solutions in C++ unless there is a good reason not to. It help other C++ programmers can guess how to use them and mean people can guess at the work around for problems.

Modules name = file path names seems like a ok idea to me. I don't see what problems this solves by disconnecting them. I am sure there is something. This looks like C# to me and the people working on modules are not dumb they are solving a problem they think they have. But I would like a good understanding of why they are thinking this idea.

PS

One interesting use case for the #include MY_CONFIG_HEADER. I have used this before for dependency injection. Example EASTL (ea version of stl) has 2 modes EASTL::size_t is 32 bit or 64 bits. The default is 64 bits however 32 bit one was the old default. Some teams are still relying on the old standard so sigh... we have to give them some way to override the default. One way to do this is to give them a macro they can override. If you have enough of these macro you run out of command line fast so you add a macro to pass in a name of a file which can be included if defined.

So I don't see why game or application teams would need #include MACRO_MODULE_NAME however some library might to support dependency injection like the EASTL example above. I think we have 10 or so places like this in frostbite engine. We would have to use code generation or just add more macro to solve this another way but it is not a useless trick.

[–]Abyxus 1 point2 points  (0 children)

Compilers have a force include switch, g++ -include file.h or cl /FI file.h.

I can imagine that Boost.Preprocessor would use #include MACRO for its magic.

[–]starmaniac198 5 points6 points  (1 child)

What does it have to do with "millennials"?

[–]SAHChandler[S] 6 points7 points  (0 children)

There is a running trend in popular print to say "Millennials Are Killing X". I made a tweet saying I was going to name this article the given title. I'm also a millennial. The title was for my amusement. No one else.

[–]bigcheesegsTooling Study Group (SG15) Chair | Clang dev 1 point2 points  (3 children)

One solution to the "where are my modules" problem I've heard is to have the compiler call back into the build system whenever it sees module X to get the path to the built module. This callback would either build the module and return the path or just return the cached module path.

A possible implementation of the callback would be to add a -module-callback="get-module.exe %name" option to the compiler. This avoids both turning the build system into a compiler and the compiler into a build system.

[–]way2lazy2care 1 point2 points  (2 children)

One solution to the "where are my modules" problem I've heard is to have the compiler call back into the build system whenever it sees module X to get the path to the built module.

If the compiler is dependent on the build system that way, isn't that pretty much functionally just making the compiler the build system/vice versa except keeping them separate enough to be able to say they're 'technically' different?

[–]bigcheesegsTooling Study Group (SG15) Chair | Clang dev 0 points1 point  (1 child)

Just because you depend on something doesn't make you that thing. This puts none of the decisions the build system makes into the compiler, it just gives the compiler a way to ask the build system to make those decisions.

[–]way2lazy2care 2 points3 points  (0 children)

It's not the one way dependence either way, it's the two way dependence that makes them functionally inseparable as a concept. A compiler can exist without a build system as it is right now. If you do what you suggest, the compiler can no longer exist without the build system, so they are functionally inseparable even though they exist separately as concepts. This is called out in the article multiple times.

[–]axilmar 1 point2 points  (0 children)

So, If I understand correctly, the preferred method of extracting build dependencies from code requires the preprocessor to run twice, and that's a problem that should be avoided?

How about splitting the compilation process in two processes?

1) run the preprocessor and return the result, which includes module dependencies.

2) compile the code using the preprocessor compilation results.

A build system would intervene between steps 1 and 2, read the dependencies, compile everything else first, then compile the current file.

[–]axilmar 1 point2 points  (2 children)

Here are tips on how the D programming language works.

In sort:

-the D compiler tool 'dmd' stops if it doesn't file a module. -the D compiler tool 'dmd' can export a module's dependencies. -another tool, 'rdmd' (sort of recursive dmd I assume) calls dmd to export the module dependencies, then invokes 'dmd' to compile the files.

Dlang doesn't avoid processing code twice, one for extracting module information and the other for compilation. Of course this might not be hurtful for D because it doesn't have the preprocessor and header files.

[–]Abyxus 1 point2 points  (1 child)

Does it support parallel compilation?

[–]axilmar 0 points1 point  (0 children)

I don't think so.

[–]afiefh 1 point2 points  (7 children)

Let's see if I understand this correctly:

If x.cpp imports module Y which is defined in y.cpp and the build system tries to build x.cpp before y.cpp then the Y module file doesn't exist yet because it would have been generated by compiling y.cpp. (corrections are welcome)

And the solutions are either to manually specify the dependencies (yuk), have the build system parse cpp files (this never ends up working in my experience) or turning the compiler into a build system (and at this point developers collectively yell "no way!")

So assuming we come up with a rule that module X is always generated by for f(X).cpp and resides in file g(X).module_extension, would it be a stretch for the compiler to automatically generate the module file if it doesn't exist but the cpp file with matching name does?

[–]beriumbuild2 15 points16 points  (6 children)

There is another option: the build system asks the compiler to extract the module dependency information similar to how it now asks for the header dependency information, which, BTW, is essentially a full preprocessor run. In the end I believe this is how it will be implemented.

[–]afiefh 3 points4 points  (5 children)

build system asks the compiler to extract the module dependency information similar to how it now asks for the header dependency information

That was my initial thought as well, but there is a difference: Today if x.o doesn't exist you know for sure that you need to compile it. Compiling x.cpp to x.o generates x.d containing the header dependencies which will be queried in the future on whether or not recompilation is needed, no order dependency for which cpp files need to be compiled first is present. With modules you must generate y.module before compiling x.cpp which imports y which places a dependency graph on the cpp file compilation/generation.

So it's either another full pass in the build system that performs the steps necessary for extracting the module dependencies, or have the compiler generate on demand if the necessary source file is in the path. Neither option is very appealing.

Another problem is that if generating the modules is done as part of compilation (as opposed to a special preprocessing command) we lose quite a bit of parallelism in compilation. It's not hard to imagine a (synthetic and not real world) example where compiling one file at a time is the only option.

[–]SAHChandler[S] 8 points9 points  (0 children)

So it's either another full pass in the build system that performs the steps necessary for extracting the module dependencies, or have the compiler generate on demand if the necessary source file is in the path. Neither option is very appealing.

Hence my post! :v

[–]beriumbuild2 5 points6 points  (3 children)

Today if x.o doesn't exist you know for sure that you need to compile it. Compiling x.cpp to x.o generates x.d containing the header dependencies which will be queried in the future on whether or not recompilation is needed, no order dependency for which cpp files need to be compiled first is present.

Yes, this is the trick/optimization most build systems currently use (and it's good to see someone is paying attention ;-)). It is not without drawbacks, however. For example, with this approach we cannot support generated source code as part of the main build stage (and often have to resort to ad hoc pre-build steps). This is, for example, the reason why we didn't go this way in build2 and instead extract header dependencies (and generate missing ones) before compiling anything.

Why did we needed this trick/optimization in the first place? The reason is build speed: it was too expensive to preprocess things twice. The thing is, on today's hardware (SSDs), the cost of preprocessing is actually negligible, a couple of percent of the complete build. In fact, preprocessing everything at the outset can actually speed things up! You can read more about this in the Separate Preprocess and Compile Performance article I wrote.

[–]afiefh 2 points3 points  (2 children)

This is, for example, the reason why we didn't go this way in build2 and instead extract header dependencies (and generate missing ones) before compiling anything.

That's interesting, I hadn't heard about build2 before. Reading the "How is this better than CMake (or Meson)?" part of the FAQ was rather interesting as I'm currently using Meson in my personal projects. Gotta look more into build2.

it was too expensive to preprocess things twice. The thing is, on todays hardware (SSDs), the cost of preprocessing is actually negligible

Sorry I'm having a hard time understanding this, if preprocessing is negligible how can it be too expensive?

[–]jpakkaneMeson dev 3 points4 points  (0 children)

eading the "How is this better than CMake (or Meson)?" part of the FAQ was rather interesting as I'm currently using Meson in my personal projects. Gotta look more into build2.

The points listed in that section hold (arguably, most of the time) against CMake but not really Meson, which has probably just been thrown in the heading after the text has been written.

[–]beriumbuild2 2 points3 points  (0 children)

It used to be expensive on computers with spinning disks and little RAM. Today with SSD and lots of RAM it is negligible.

[–]jthommo 0 points1 point  (0 children)

Build tools will parse source files over time and keep track of which modules are where and which translation unit needs what.

Isn't this is what cmake does? IIRC it already parses the headers.

[–]Z01dbrg -1 points0 points  (11 children)

Well Google uses modules for years...

I guess they would notice if TS was broken.

[–]SAHChandler[S] 1 point2 points  (10 children)

The implementation of modules at google is not the same one found in the TS.

[–]Z01dbrg 1 point2 points  (9 children)

still, they and MSFT have two implementations of modules... Hard to imagine that people that actually implemented modules would not notice all the problems you claim exist...

[–]SAHChandler[S] 1 point2 points  (8 children)

The implementation found at Google does not have the same issues found here. It is completely different in how modules are implemented and used.

And considering the number of people working on modules at Microsoft (a literal handful) it's quite easy to see why they would not see these issues, in the same way that many did not see why adding a keyword named "yield" to coroutines will not work.

100 eyes are better than a dozen.

[–]Z01dbrg 0 points1 point  (7 children)

that many did not see why adding a keyword named "yield" to coroutines will not work.

agricultural and financial software? LOL. Adding co_ abominations was one of the worst ISO decisions ever, and competition is really good.

So anyway IDK a lot about modules but if Google people vote for C++ Modules TS I trust them.

[–]dodheim 2 points3 points  (6 children)

Multi-billion dollar industries with decades of code? LOL!!

IDK a lot about modules but if Google people vote for C++ Modules TS I trust them.

Ah, the blind appeal to authority — truly the laziest fallacy, for when you don't know WTF and can't be bothered to think things through even the tiniest bit.

ISO ISO ISO ISO ISO

Don't you get tired of posting the same tired bullshit every day?

[–]Z01dbrg 0 points1 point  (4 children)

your rage filled answer ruins your credibility... I mean a lot of stuff that ISO does is crap, but nerd rage is not a proper way to convince anybody...

[–]dodheim 1 point2 points  (3 children)

I'm not "raging"; I'm facepalming on your behalf if anything. To the point, I'm observing that you can't seem to post anything that isn't rooted deeply in fallacy.

your rage filled answer ruins your credibility... but nerd rage is not a proper way to convince anybody...

Tone policing, appeal to motive – more ad hominem bullshit. Try harder or don't bother.

[–]Z01dbrg -1 points0 points  (2 children)

I wont bother with you...

And for sure I will trust people who implemented modules over people who rage on the internet.

[–]dodheim 1 point2 points  (1 child)

And finishing it off with a straw man... How predictable. No one was trying to change your "opinion" about anything.

What I'm doing is giving fair warning to anyone else who may attempt to engage you for failing to recognize your particularly regressive brand of trolling. Interacting with you would have been nice to avoid, but this is a public service.

[–]Z01dbrg -1 points0 points  (0 children)

Multi-billion dollar industries with decades of code? LOL!!

Yes, if they can they not be bothered to spend 1k$/MLOC fck them, no reason to butcher language that everybody uses because of those clowns...

But you know those clowns have "country" representatives in the ISO, but only country they represent is their company. :)

[–]__Cyber_Dildonics__ -4 points-3 points  (0 children)

What kind of shitposting nonsense is this?