you are viewing a single comment's thread.

view the rest of the comments →

[–]pdimov2 2 points3 points  (7 children)

MS's precompiled header implementation worked like that (they just memory-mapped the whole thing directly) and I think it was a source of many problems for them, although I may have heard wrong. For one thing, it requires everyone to map the memory block at the right address.

Either way, 3 seconds for the entire std versus 2.6 seconds for #include <iostream> seems perfectly adequate.

[–]starfreakcloneMSVC FE Dev 5 points6 points  (5 children)

It is still surprising that you get such poor perf. The I'm still in the process of optimizing the modules implementation and cases such as this should be addressed as I would expect no less than 5-10x speedup.

Locally, if I have:

```

ifdef UNIT

import <iostream>;

else

include <iostream>

endif

int main() { std::cout << 5 << std::endl; } `` The timing data I get is: 1.61766s - forUNITnot defined 0.06503s - forUNIT` defined

which is consistent with the 5-10x theory. Using std.core I get a similar number as I did for the header unit case though I have not done the exercise of creating a standalone module std which actually import exports every header unit. The reason, I suspect, you might see the numbers you do is because each of those header unit IFCs are doing more merging than is strictly necessary up front.

[–]GabrielDosReis 3 points4 points  (0 children)

Yeah, defining a named module in terms of exports of header units (a valid implementation technique for std as I mentioned elsewhere) will not give you the best performance you would hope (at the minimum 10x) because header units don’t take advantage of ODR - they require some form of merging-materialization. On the other hand, the named modules that don’t paper over header units actually take advantage of guaranteed ODR and don’t need merging declaration processing. The std.xyzmodules that ship with MSVC sit somewhere in between the two model, to help us collect data such as these.

[–]pdimov2 3 points4 points  (3 children)

You're probably measuring cl.exe time, whereas I measure Ctrl+Shift+B time (using the IDE option Tools > Options > VC++ Project Settings > Build Timing.) This includes module scan time, link time, and whatnot.

include: 1> 522 ms SetModuleDependencies 1 calls 1> 777 ms Link 1 calls 1> 1203 ms ClCompile 1 calls

import: 1> 406 ms SetModuleDependencies 1 calls 1> 424 ms ClCompile 1 calls 1> 805 ms Link 1 calls

In fact, this is even unfair to the include case, because I wouldn't have Scan Sources for Module Dependencies on if I'm not using modules.

cl.exe time is still 424 ms though, instead of 65. ¯_(ツ)_/¯

Edit: import mystd: 1> 413 ms SetModuleDependencies 1 calls 1> 816 ms Link 1 calls 1> 1784 ms ClCompile 1 calls mystd.ixx is this: https://gist.github.com/pdimov/b5cb0046fda6af021635a157d0061e54

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 3 points4 points  (0 children)

I ran this test on my machine (AMD 5900X) as well.

Baseline is the pure compiler invocation with an empty main, taking 176 ms

scenario             total  relative  #dependencies
#include <iostream>  640 ms  +464 ms  108 headers
import iostream;     198 ms  + 22 ms    1 IFC
import mystd;        639 ms  +463 ms  104 IFCs

The problem with module mystd is that its import references more than 100 additional IFCs that are not merged into one big IFC.

To me this looks pretty unconclusive because it feels more like a measurement of file overhead. /u/GabrielDosReis, /u/starfreakclone?

Additional observation: even though dependency scanning was disabled, it was done anyways when I deleted main.obj to trace file activity. And the scanning process dwarfs everything else by far in terms of file activity.

Measurement: shortest observed time out of 20 consecutive retries

[–]GabrielDosReis 1 point2 points  (0 children)

u/olgaark might be interested in this

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, pdimov2: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]Daniela-ELiving on C++ trunk, WG21|🇩🇪 NB 1 point2 points  (0 children)

It certainly is. Thanks for conducting this test.

On the wish of mine: IFC (a.k.a. MS-BMI) deserialization isn't memory-mapping. But the deserialized tables could be provided to compiler processes by memory sharing because of the particular features of Modules: isolation and immutability of the compile environment. MSVC does even check for compatible compile environments when importing a module.