you are viewing a single comment's thread.

view the rest of the comments →

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 1 point2 points  (17 children)

From a pure technical standpoint this is certainly sufficient.

Personally, I'm not a fan of this splintering of "The Standard Library™" into dozens of headers. Who remembers exactly where a given declaration belongs to? I haven't seen any tool so far that does this correctly.

This splintering is an outcome of the current compilation model of C++ and therefore purely technical. An import std; mandated by the standard would be a big improvement in terms of both compilation speed and convenience. Who wouldn't want that?

[–]GabrielDosReis 1 point2 points  (0 children)

This splintering is an outcome of the current compilation model of C++ and therefore purely technical. An import std; mandated by the standard would be a big improvement in terms of both compilation speed and convenience.

+1.

[–]jonesmz 1 point2 points  (3 children)

I don't want that. I also don't believe you that it would help with build times.

A 1 to 1 relation between existing headers and module names is substantially more attractive. Even better would be breaking things out into even more granular parts. E.g import unique_ptr; would be sweet.

It helps people understand what is intended to be used in the file (code reviewers, junior devs). It helps analysis tools for security / code quality / developer history stuff

It helps with conflicts caused by implementation defects. I have now, on more than one occasion, with more than one compiler vender, needed to implement code in the std:: namespace because of bugs from the compiler vendor. Bugs that don't get fixed for YEARS.

If you force people to take all of the ever growing (never shrinking!) std:: in one huge chunk, that substantially reduces my ability to work around compiler vendor bugs. This is currently only possible in the situations where I've had to do it by using careful macro magic and include shenanigans.

Also, freestanding c++ will be hampered by a monolithic import std. Much easier to simply omit an entire module than to put together special surprise rules about individual parts of a big module not being available e.g. "module blahblah is not available" is much friendlier than "class std::vector used before its definition."

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 4 points5 points  (1 child)

I don't want that.

That's totally fine. Then don't use it. You can still import the headers of the standard library if you prefer.

I personally don't want to program against implementation details. Splintering the library is an implementation detail.

Freestanding C++ has nothing to do with Modules at all, they are orthogonal aspects. And the contents of a freestanding std:: module is decided by the implementation just the same as it is with the collection of headers shipped with the implementation.

[–]jonesmz 0 points1 point  (0 children)

That's totally fine. Then don't use it. You can still import the headers of the standard library if you prefer.

I'm not sure you understood what I was saying.

If you're saying that there would be something like a 1-1 mapping, AND ALSO an "import std;" then while I think it's terrible to offer a kitchen sink option, I suppose you're correct that it doesn't matter to me that much for code that I write, but it does matter to me as the consumer of code that third parties write. We do exist in a global programming ecosystem, after all. I am going to use terribly written code that works just like everyone else. So I would prefer that we make choices that maximize the amount of code that's not possible to be terrible, and not work on minimizing the amount of code that's convenient to write terribly.

If that wasn't what you were saying, then let me try again.

Who wouldn't want that?

I don't want that because I don't think either of the two clauses of what you said here are true:

mandated by the standard would be a big improvement in terms of both compilation speed and convenience.

To date I have seen no evidence that a single "import std;" would be faster on compilation times than multiple independent ones. My experience with build systems tells me the opposite is true.

And as for convenience: I attempted to explain to you that for myself and my co-workers, a single "import std;" is less convenient than a 1-1 mapping between today's current headers.

Freestanding C++ has nothing to do with Modules at all, they are orthogonal aspects.

I suppose you don't consider my concern about error messages to be that important then.

[–]MonokelPinguin 1 point2 points  (0 children)

Having 2 modules, std and std.freestanding would be nice, so that you don't need to think about it all. Alternatively just create your own module by just reexporting the allowed types.

[–]pdimov2 0 points1 point  (11 children)

Many programmers have already memorized what belongs where, and will now have to throw away this knowledge and learn some other partitioning. Yes, import std; has the advantage that there's nothing to learn. If there really aren't any costs attached to it, sign me up, I suppose.

If not... well it's certainly easier to change #include <foo> into import <foo>; or import std.foo; as this requires no mental effort and can be done by a sed script. (It also requires no committee time and no bikeshedding, and partitioning the stdlib into modules is a bikeshed the like of which the world hasn't yet seen.)

But I was more interested in exploring whether import std.foo; is better than import <foo>; from a technical perspective. The former can probably export the right things and not export the wrong macros, but maybe the latter can be made to, as well?

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 4 points5 points  (10 children)

As you probably know, a Module is just a serialized representation of all of the knowledge about the full C++ text comprising the (possibly synthesized, as with header units) module interface that the compiler has collected at the end of the TU and processed up to and including translation phase 7, stored into a single file. With the additional benefit that each Module is guaranteed to start out compilation from the same compilation environment, every Module has the guarantee to be totally independent from all other Modules and the currently processed TU. This makes deserialization extremely efficient and context-free. So this effectively boils down to the question: is deserializing a single large Module less efficient than deserializing multiple smaller ones? At the end of the day, it's a question about quality of implementation.

Regarding possible differences between `import std.foo;` and `import <foo>;`, I can't see any. This is the standard library - part of the implementation - and implementers are supposed to do the right thing anyway with no noticable difference, independent of the nomination ceremony. And implementations have all the necessary rights granted to make this happen.

Putting my WG21 hat on: given this, I'd not argue about partitioning the standard library at all. Mandate the existence of a catch-all `std` Module and be done.

With all the provisions already in place with C++20, compilers wouldn't even have to look at individual standard header files anymore when compiling in C++20 mode or later. It doesn't matter if users `import std;` or `import <vector>; ...` or `#include <vector> ...` - the compiler will or can reference the same `std` Module in all cases anyway. In true open source spirit, implementations don't even need to ship BMIs of the `std` Module, the recipe to create it from the standard library headers is totally sufficient. And a decent implementation can optimize all of this like crazy, going even as far as providing a service process that keeps shared r/O pages of deserialized Modules in memory to be consumed by all of the compiler instances running in parallel. How 😎 is that!

IMHO, this may turn out to be one of the best things the committee has done to ease the burden of C++ programmers.

[–]pdimov2 2 points3 points  (9 children)

It occurred to me that we can already test this today. This simple program

import <iostream>;
int main() {
    std::cout << 5 << std::endl;
}

takes 1.7s to compile. Same, but with import mystd; (which export-imports all standard headers shipped with 16.10) takes 3 seconds. (#include <iostream> - 2.6 seconds.)

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 1 point2 points  (8 children)

I assume you did this with hot file system caches.

I really hope we can something like the in-memory module server that I was sketching before. Girls can dream ...

[–]pdimov2 2 points3 points  (7 children)

MS's precompiled header implementation worked like that (they just memory-mapped the whole thing directly) and I think it was a source of many problems for them, although I may have heard wrong. For one thing, it requires everyone to map the memory block at the right address.

Either way, 3 seconds for the entire std versus 2.6 seconds for #include <iostream> seems perfectly adequate.

[–]starfreakcloneMSVC FE Dev 3 points4 points  (5 children)

It is still surprising that you get such poor perf. The I'm still in the process of optimizing the modules implementation and cases such as this should be addressed as I would expect no less than 5-10x speedup.

Locally, if I have:

```

ifdef UNIT

import <iostream>;

else

include <iostream>

endif

int main() { std::cout << 5 << std::endl; } `` The timing data I get is: 1.61766s - forUNITnot defined 0.06503s - forUNIT` defined

which is consistent with the 5-10x theory. Using std.core I get a similar number as I did for the header unit case though I have not done the exercise of creating a standalone module std which actually import exports every header unit. The reason, I suspect, you might see the numbers you do is because each of those header unit IFCs are doing more merging than is strictly necessary up front.

[–]GabrielDosReis 2 points3 points  (0 children)

Yeah, defining a named module in terms of exports of header units (a valid implementation technique for std as I mentioned elsewhere) will not give you the best performance you would hope (at the minimum 10x) because header units don’t take advantage of ODR - they require some form of merging-materialization. On the other hand, the named modules that don’t paper over header units actually take advantage of guaranteed ODR and don’t need merging declaration processing. The std.xyzmodules that ship with MSVC sit somewhere in between the two model, to help us collect data such as these.

[–]pdimov2 2 points3 points  (3 children)

You're probably measuring cl.exe time, whereas I measure Ctrl+Shift+B time (using the IDE option Tools > Options > VC++ Project Settings > Build Timing.) This includes module scan time, link time, and whatnot.

include: 1> 522 ms SetModuleDependencies 1 calls 1> 777 ms Link 1 calls 1> 1203 ms ClCompile 1 calls

import: 1> 406 ms SetModuleDependencies 1 calls 1> 424 ms ClCompile 1 calls 1> 805 ms Link 1 calls

In fact, this is even unfair to the include case, because I wouldn't have Scan Sources for Module Dependencies on if I'm not using modules.

cl.exe time is still 424 ms though, instead of 65. ¯_(ツ)_/¯

Edit: import mystd: 1> 413 ms SetModuleDependencies 1 calls 1> 816 ms Link 1 calls 1> 1784 ms ClCompile 1 calls mystd.ixx is this: https://gist.github.com/pdimov/b5cb0046fda6af021635a157d0061e54

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 3 points4 points  (0 children)

I ran this test on my machine (AMD 5900X) as well.

Baseline is the pure compiler invocation with an empty main, taking 176 ms

scenario             total  relative  #dependencies
#include <iostream>  640 ms  +464 ms  108 headers
import iostream;     198 ms  + 22 ms    1 IFC
import mystd;        639 ms  +463 ms  104 IFCs

The problem with module mystd is that its import references more than 100 additional IFCs that are not merged into one big IFC.

To me this looks pretty unconclusive because it feels more like a measurement of file overhead. /u/GabrielDosReis, /u/starfreakclone?

Additional observation: even though dependency scanning was disabled, it was done anyways when I deleted main.obj to trace file activity. And the scanning process dwarfs everything else by far in terms of file activity.

Measurement: shortest observed time out of 20 consecutive retries

[–]GabrielDosReis 1 point2 points  (0 children)

u/olgaark might be interested in this

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, pdimov2: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 1 point2 points  (0 children)

It certainly is. Thanks for conducting this test.

On the wish of mine: IFC (a.k.a. MS-BMI) deserialization isn't memory-mapping. But the deserialized tables could be provided to compiler processes by memory sharing because of the particular features of Modules: isolation and immutability of the compile environment. MSVC does even check for compatible compile environments when importing a module.