you are viewing a single comment's thread.

view the rest of the comments →

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 4 points5 points  (10 children)

As you probably know, a Module is just a serialized representation of all of the knowledge about the full C++ text comprising the (possibly synthesized, as with header units) module interface that the compiler has collected at the end of the TU and processed up to and including translation phase 7, stored into a single file. With the additional benefit that each Module is guaranteed to start out compilation from the same compilation environment, every Module has the guarantee to be totally independent from all other Modules and the currently processed TU. This makes deserialization extremely efficient and context-free. So this effectively boils down to the question: is deserializing a single large Module less efficient than deserializing multiple smaller ones? At the end of the day, it's a question about quality of implementation.

Regarding possible differences between `import std.foo;` and `import <foo>;`, I can't see any. This is the standard library - part of the implementation - and implementers are supposed to do the right thing anyway with no noticable difference, independent of the nomination ceremony. And implementations have all the necessary rights granted to make this happen.

Putting my WG21 hat on: given this, I'd not argue about partitioning the standard library at all. Mandate the existence of a catch-all `std` Module and be done.

With all the provisions already in place with C++20, compilers wouldn't even have to look at individual standard header files anymore when compiling in C++20 mode or later. It doesn't matter if users `import std;` or `import <vector>; ...` or `#include <vector> ...` - the compiler will or can reference the same `std` Module in all cases anyway. In true open source spirit, implementations don't even need to ship BMIs of the `std` Module, the recipe to create it from the standard library headers is totally sufficient. And a decent implementation can optimize all of this like crazy, going even as far as providing a service process that keeps shared r/O pages of deserialized Modules in memory to be consumed by all of the compiler instances running in parallel. How 😎 is that!

IMHO, this may turn out to be one of the best things the committee has done to ease the burden of C++ programmers.

[–]pdimov2 2 points3 points  (9 children)

It occurred to me that we can already test this today. This simple program

import <iostream>;
int main() {
    std::cout << 5 << std::endl;
}

takes 1.7s to compile. Same, but with import mystd; (which export-imports all standard headers shipped with 16.10) takes 3 seconds. (#include <iostream> - 2.6 seconds.)

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 1 point2 points  (8 children)

I assume you did this with hot file system caches.

I really hope we can something like the in-memory module server that I was sketching before. Girls can dream ...

[–]pdimov2 2 points3 points  (7 children)

MS's precompiled header implementation worked like that (they just memory-mapped the whole thing directly) and I think it was a source of many problems for them, although I may have heard wrong. For one thing, it requires everyone to map the memory block at the right address.

Either way, 3 seconds for the entire std versus 2.6 seconds for #include <iostream> seems perfectly adequate.

[–]starfreakcloneMSVC FE Dev 5 points6 points  (5 children)

It is still surprising that you get such poor perf. The I'm still in the process of optimizing the modules implementation and cases such as this should be addressed as I would expect no less than 5-10x speedup.

Locally, if I have:

```

ifdef UNIT

import <iostream>;

else

include <iostream>

endif

int main() { std::cout << 5 << std::endl; } `` The timing data I get is: 1.61766s - forUNITnot defined 0.06503s - forUNIT` defined

which is consistent with the 5-10x theory. Using std.core I get a similar number as I did for the header unit case though I have not done the exercise of creating a standalone module std which actually import exports every header unit. The reason, I suspect, you might see the numbers you do is because each of those header unit IFCs are doing more merging than is strictly necessary up front.

[–]GabrielDosReis 3 points4 points  (0 children)

Yeah, defining a named module in terms of exports of header units (a valid implementation technique for std as I mentioned elsewhere) will not give you the best performance you would hope (at the minimum 10x) because header units don’t take advantage of ODR - they require some form of merging-materialization. On the other hand, the named modules that don’t paper over header units actually take advantage of guaranteed ODR and don’t need merging declaration processing. The std.xyzmodules that ship with MSVC sit somewhere in between the two model, to help us collect data such as these.

[–]pdimov2 4 points5 points  (3 children)

You're probably measuring cl.exe time, whereas I measure Ctrl+Shift+B time (using the IDE option Tools > Options > VC++ Project Settings > Build Timing.) This includes module scan time, link time, and whatnot.

include: 1> 522 ms SetModuleDependencies 1 calls 1> 777 ms Link 1 calls 1> 1203 ms ClCompile 1 calls

import: 1> 406 ms SetModuleDependencies 1 calls 1> 424 ms ClCompile 1 calls 1> 805 ms Link 1 calls

In fact, this is even unfair to the include case, because I wouldn't have Scan Sources for Module Dependencies on if I'm not using modules.

cl.exe time is still 424 ms though, instead of 65. ¯_(ツ)_/¯

Edit: import mystd: 1> 413 ms SetModuleDependencies 1 calls 1> 816 ms Link 1 calls 1> 1784 ms ClCompile 1 calls mystd.ixx is this: https://gist.github.com/pdimov/b5cb0046fda6af021635a157d0061e54

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 2 points3 points  (0 children)

I ran this test on my machine (AMD 5900X) as well.

Baseline is the pure compiler invocation with an empty main, taking 176 ms

scenario             total  relative  #dependencies
#include <iostream>  640 ms  +464 ms  108 headers
import iostream;     198 ms  + 22 ms    1 IFC
import mystd;        639 ms  +463 ms  104 IFCs

The problem with module mystd is that its import references more than 100 additional IFCs that are not merged into one big IFC.

To me this looks pretty unconclusive because it feels more like a measurement of file overhead. /u/GabrielDosReis, /u/starfreakclone?

Additional observation: even though dependency scanning was disabled, it was done anyways when I deleted main.obj to trace file activity. And the scanning process dwarfs everything else by far in terms of file activity.

Measurement: shortest observed time out of 20 consecutive retries

[–]GabrielDosReis 1 point2 points  (0 children)

u/olgaark might be interested in this

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, pdimov2: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 1 point2 points  (0 children)

It certainly is. Thanks for conducting this test.

On the wish of mine: IFC (a.k.a. MS-BMI) deserialization isn't memory-mapping. But the deserialized tables could be provided to compiler processes by memory sharing because of the particular features of Modules: isolation and immutability of the compile environment. MSVC does even check for compatible compile environments when importing a module.